Intel Xeon E5 Version 3: Up to 18 Haswell EP Cores
by Johan De Gelas on September 8, 2014 12:30 PM ESTSimply put, the new Intel Xeon "Haswell EP" chips are multi-core behemoths: they support up to eighteen cores (with Hyper-Threading yielding 36 logical cores). Core counts have been increasing for years now, so it is easy to dismiss the new Xeon E5-2600 v3 as "business as usual", but it is definitely not. Piling up cores inside a CPU package is one thing, but getting them to do useful work is a long chain of engineering efforts that starts with hardware intelligence and that ends with making good use of the best software libraries available.
While some sites previously reported that an "unknown source" told them Intel was cooking up a 14-core Haswell EP Xeon chip, and that the next generation 14 nm Xeon E5 "Broadwell" would be an 18-core design, the reality is that Intel has an 18-core Haswell EP design, and we have it for testing. This is yet another example of truth beating fiction.
18 cores and 45MB LLC under that shiny new and larger heatspreader.
The technical challenge of the first step to make sure that such a multi-core monster actually works is the pinnacle of CPU engineering. The biggest challenge is keeping all those cores fed with data. A massive (up to 45MB) L3 cache will help, but with such massive caches, the latency and power consumption can soar quickly. Such high core counts introduce many other problems as well: cache coherency traffic can grow exponentially, one thread can get too far ahead of another, the memory controller can become a bottleneck, and so on. And there is more than the "internal CPU politics".
Servers have evolved into being small datacenters: in a modern virtualized server, some of the storage and network services that used to be handled by external devices are now software inside of virtual machines (VMware vSAN and NSX for example). In other words, not only are these servers the home of many applications, the requirements of these applications are diverging. Some of these applications may hog the Last Level Cache and starve the others, others may impose a heavy toll on the internal I/O. It will be interesting to see how well the extra cores can be turned into real world productivity gains.
The new Xeon E5 is also a challenge to the datacenter manager looking to make new server investments. With 22 new SKUs ranging from a 3.5GHz quad-core model up to an 18-core 2.3GHz SKU, there are almost too many choices. While we don't have all of the SKUs for testing, we do have several of them, so let's dig in and see what Haswell EP has to offer.
85 Comments
View All Comments
cmikeh2 - Monday, September 8, 2014 - link
In the SKU comparison table you have the E5-2690V2 listed as a 12/24 part when it is in fact a 10/20 part. Just a tiny quibble. Overall a fantastic read.KAlmquist - Monday, September 8, 2014 - link
Also, the 2637 v2 is 4/8, not 6/12.isa - Monday, September 8, 2014 - link
Looking forward to a new supercomputer record using these behemoths.Bruce Allen - Monday, September 8, 2014 - link
Awesome article. I'd love to see Cinebench and other applications tests. We do a lot of rendering (currently with older dual Xeons) and would love to compare these new Xeons versus the new 5960X chips - software license costs per computer are so high that the 5960X setups will need much higher price/performance to be worth it. We actually use Cinema 4D in production so those scores are relevant. We use V-Ray, Mental Ray and Arnold for Maya too but in general those track with the Cinebench scores so they are a decent guide. Thank you!Ian Cutress - Monday, September 8, 2014 - link
I've got some E5 v3 Xeons in for a more workstation oriented review. Look out for that soon :)fastgeek - Monday, September 8, 2014 - link
From my notes a while back... two E5-2690 v3's (all cores + turbo enabled) under 2012 Server yielded 3,129 for multithreaded and 79 for single.While not Haswell, I can tell you that four E5-4657L V2's returned 4,722 / 94 respectively.
Hope that helps somewhat. :-)
fastgeek - Monday, September 8, 2014 - link
I don't see a way to edit my previous comment; but those scores were from Cinebench R15wireframed - Saturday, September 20, 2014 - link
You pay for licenses for render Nodes? Switch to 3DS, and you get 9999 nodes for free (unless they changed the licensing since I last checked). :)Lone Ranger - Monday, September 8, 2014 - link
You make mention that the large core count chips are pretty good about raising their clock rate when only a few cores are active. Under Linux, what is the best way to see actual turbo frequencies? cpuinfo doesn't show live/actual clock rate.JohanAnandtech - Monday, September 8, 2014 - link
The best way to do this is using Intel's PCM. However, this does not work right now (only on Sandy and Ivy, not Haswel) . I deduced it from the fact that performance was almost identical and previous profiling of some of our benchmarks.