Today’s Apple Mac keynote has been very eventful, with the company announcing a new line-up of MacBook Pro devices, powered by two different new SoCs in Apple’s Silicon line-up: the new M1 Pro and the M1 Max.

The M1 Pro and Max both follow-up on last year’s M1, Apple’s first generation Mac silicon that ushered in the beginning of Apple’s journey to replace x86 based chips with their own in-house designs. The M1 had been widely successful for Apple, showcasing fantastic performance at never-before-seen power efficiency in the laptop market. Although the M1 was fast, it was still a somewhat smaller SoC – still powering devices such as the iPad Pro line-up, and a corresponding lower TDP, naturally still losing out to larger more power-hungry chips from the competition.

Today’s two new chips look to change that situation, with Apple going all-out for performance, with more CPU cores, more GPU cores, much more silicon investment, and Apple now also increasing their power budget far past anything they’ve ever done in the smartphone or tablet space.

The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors in 245mm²

The first of the two chips which were announced was the so-called M1 Pro – laying the ground-work for what Apple calls no-compromise laptop SoCs.

Apple started off the presentation with a showcase of the packaging, there the M1 Pro is shown to continue to feature very custom packaging, including the still unique characteristic that Apple is packaging the SoC die along with the memory dies on a single organic PCB, which comes in contrast to other traditional chips such as from AMD or Intel which feature the DRAM dies either in DIMM slots, or soldered onto the motherboard. Apple’s approach here likely improves power efficiency by a notable amount.

The company divulges that they’ve doubled up on the memory bus for the M1 Pro compared to the M1, moving from a 128-bit LPDDR4X interface to a new much wider and faster 256-bit LPDDR5 interface, promising system bandwidth of up to 200GB/s. We don’t know if that figure is exact or rounded, but an LPDDR5-6400 interface of that width would achieve 204.8GB/s.

In a much-appreciated presentation move, Apple actually showcased the die shots of both the M1 Pro and M1 Max, so we can have an immediate look at the chip’s block layout, and how things are partitioned. Let’s start off with the memory interfaces, which are now more consolidated onto two corners of the SoC, rather than spread out along two edges like on the M1. Because of the increased interface width, we’re seeing quite a larger portion of the SoC being taken up by the memory controllers. However, what’s even more interesting, is the fact that Apple now apparently employs two system level cache (SLC) blocks directly behind the memory controllers.

Apple’s system level cache blocks have been notable as they serve the whole SoC, able to amplify bandwidth, reduce latency, or simply just save power by avoiding memory transactions going off-chip, greatly improving power efficiency. This new generation SLC block looks quite a bit different to what we’ve seen on the M1. The SRAM cell areas look to be larger than that of the M1, so while we can’t exactly confirm this right now, it could signify that each SLC block has 16MB of cache in it – for the M1 Pro that would mean 32MB of total SLC cache.

On the CPU side of things, Apple has shrunk the number of efficiency cores from 4 to 2. We don’t know if these cores would be similar to that of the M1 generation efficiency cores, or if Apple adopted the newer generation IP from the A15 SoC – we had noted that the new iPhone SoC had some larger microarchitectural changes in that regard.

On the performance core side, Apple has doubled things up to 8 cores now. Apple’s performance cores were extremely impressive on the M1, however were lagging behind other 8-core SoCs in terms of multi-threaded performance. This doubling up of the cores should showcase immense MT performance boosts.

On the die shot, we’re seeing that Apple is seemingly mirroring two 4-core blocks, with the L2 caches also being mirrored. Although Apple quotes 24MB of L2 here, I think it’s rather a 2x12MB setup, with an AMD core-complex-like setup being used. This would mean that the coherency of the two performance clusters is going over the fabric and SLC instead. Naturally, this is speculation for now, but it’s what makes most sense given the presented layout.

In terms of CPU performance metrics, Apple made some comparisons to the competition – in particular the SKUs being compared here were Intel’s Core i7-1185G7, and the Core i7-11800H, 4-core and 8-core variants of Intel’s latest Tiger Lake 10nm 'SuperFin' CPUs.

Apple here claims, that in multi-threaded performance, the new chips both vastly outperform anything Intel has to offer, at vastly lower power consumption. The presented performance/power curves showcase that at equal power usage of 30W, the new M1 Pro and Max are 1.7x faster in CPU throughput than the 11800H, whose power curve is extremely steep. Whereas at an equal performance levels – in this case using the 11800H's peak performance – Apple says that the new M1 Pro/Max achieves the same performance with 70% lower power consumption. Both figures are just massive discrepancies and leap ahead of what Intel is currently achieving.

Alongside the powerful CPU complexes, Apple is also supersizing their custom GPU architecture. The M1 Pro now features a 16-core GPU, with an advertised compute throughput performance of 5.2 TFLOPs. What’s interesting here, is that this new much larger GPU would be supported by the much wider memory bus, as well as the presumably 32MB of SLC – this latter essentially acting similarly to what AMD is now achieving with their GPU Infinity Cache.

Apple’s GPU performance is claimed to vastly outclass any previous generation competitor integrated graphics performance, so the company opted to make direct comparisons to medium-end discrete laptop graphics. In this case, pitting the M1 Pro against a GeForce RTX 3050 Ti 4GB, with the Apple chip achieving similar performance at 70% less power. The power levels here are showcased as being at around 30W – it’s not clear if this is total SoC or system power or Apple just comparing the GPU block itself.

Alongside the GPU and CPUs, Apple also noted their much-improved media engine, which can now handle hardware accelerated decoding and encoding of ProRes and ProRes RAW, something that’s going to be extremely interesting to content creators and professional videographers. Apple Macs have generally held a good reputation for video editing, but hardware accelerated engines for RAW formats would be a killer feature that would be an immediate selling point for this audience, and something I’m sure we’ll hear many people talk about.

The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors & 432mm²

Alongside the M1 Pro, Apple also announced a bigger brother – the M1 Max. While the M1 Pro catches up and outpaces the laptop competition in terms of performance, the M1 Max is aiming at delivering something never-before seen: supercharging the GPU to a total of 32 cores. Essentially it’s no longer an SoC with an integrated GPU, rather it’s a GPU with an SoC around it.

The packaging for the M1 Max changes slightly in that it’s bigger – the most obvious change is the increase of DRAM chips from 2 to 4, which also corresponds to the increase in memory interface width from 256-bit to 512-bit. Apple is advertising a massive 400GB/s of bandwidth, which if it’s LPDDR5-6400, would possibly be more exact at 409.6GB/s. This kind of bandwidth is unheard of in an SoC, but quite the norm in very high-end GPUs.

On the die shot of the M1 Max, things look quite peculiar – first of all, the whole top part of the chip above the GPU essentially looks identical to the M1 Pro, pointing out that Apple is reusing most of the design, and that the Max variant simply grows downwards in the block layout.

The additional two 128-bit LPDDR5 blocks are evident, and again it’s interesting to see here that they’re also increasing the number of SLC blocks along with them. If indeed at 16MB per block, this would represent 64MB of on-chip generic cache for the whole SoC to make use of. Beyond the obvious GPU uses, I do wonder what the CPUs are able to achieve with such gigantic memory bandwidth resources.

The M1 Max is truly immense – Apple disclosed the M1 Pro transistor count to be at 33.7 billion, while the M1 Max bloats that up to 57 billion transistors. AMD advertises 26.8bn transistors for the Navi 21 GPU design at 520mm² on TSMC's 7nm process; Apple here has over double the transistors at a lower die size thanks to their use of TSMC's leading-edge 5nm process. Even compared to NVIDIA's biggest 7nm chip, the 54 billion transistor server-focused GA100, the M1 Max still has the greater transistor count.

In terms of die sizes, Apple presented a slide of the M1, M1 Pro and M1 Max alongside each other, and they do seem to be 1:1 in scale. In which case, the M1 we already know to be 120mm², which would make the M1 Pro 245mm², and the M1 Max about 432mm².

Most of the die size is taken up by the 32-core GPU, which Apple advertises as reaching 10.4TFLOPs. Going back at the die shot, it looks like Apple here has basically mirrored their 16-core GPU layout. The first thing that came to mind here was the idea that these would be 2 GPUs working in unison, but there does appear to be some shared logic between the two halves of the GPU. We might get more clarity on this once we see software behavior of the system.

In terms of performance, Apple is battling it out with the very best available in the market, comparing the performance of the M1 Max to that of a mobile GeForce RTX 3080, at 100W less power (60W vs 160W). Apple also includes a 100W TDP variant of the RTX 3080 for comparison, here, outperforming the NVIDIA discrete GPU, while still using 40% less power.

Today's reveal of the new generation Apple Silicon has been something we’ve been expecting for over a year now, and I think Apple has managed to not only meet those expectations, but also vastly surpass them. Both the M1 Pro and M1 Max look like incredibly differentiated designs, much different than anything we’ve ever seen in the laptop space. If the M1 was any indication of Apple’s success in their silicon endeavors, then the two new chips should also have no issues in laying incredible foundations for Apple’s Mac products, going far beyond what we’ve seen from any competitor.

Comments Locked

373 Comments

View All Comments

  • blargh4 - Monday, October 18, 2021 - link

    Not quite related to the SoC specifically, but anyone know if Apple's ProMotion refreshes on demand a la gsync, or does it change between fixed refresh rates on the fly?
  • mikegrok - Tuesday, October 19, 2021 - link

    It probably refreshes either when the next frame is ready, or every 1/10 of a second, whichever comes first.
  • abufrejoval - Monday, October 18, 2021 - link

    So they are going all-in on solidered die carrier memory--no expandability: that, together with the large pools of last level SLC gives them GDDRx/HBM like bandwidth with LDDRx like latencies (as well as huge DRAM related power savings), which is great as long as your CPU/GPU workload demands lie right on that linear line of CPU/GPU-core/RAM capacities for their 1x/2x/4x configurations.

    The M1x basically become appliances in the four basic sizes (I guess some intermediate binning related survivers will round out the offer), which I've imagined for some time via a PCIe or InfinityFabric backend using AMD "APU-type" CCDs, HBM/GDDRx and IODs.

    What they sacrifice is the last vestiges of what had me buy an Apple ][ (clone) and let me switch to the PC afterwards: the ability to make it a "personal computer" by adding parts and capabilities where I wanted them throughout its life-cycle (slots!).

    I can see how that wouldn't matter to most, because they can fit their needs into these standard sizes, especially since they may be quite reasonable for mainstream (my own systems tend to have 2x-4x the RAM).

    Of course it would be nice if there still was some way to hang a CXL, IF or PCIe off the larger chips, but Apple will just point out that this type of compromise would cost silicon real-estate they prefer to put into performance and interest only a few.

    Of course they could still come out with server variants sans GPUs (or far reduced parts) that in fact do offer some type of scale-up for RAM and workstation expandability. But somehow I believe I get the message, that their goal is to occupy that productivity niche and leave everything else to "niche" vendors, which now includes x86.

    Well executed, Apple!

    And believe me, that doesn't come easy to someone who's professional career has been x86 since 1984.

    I still don't see myself buying anything Apple, but that's because I am an IT professional who builds his infrastructure taylormade to his needs since decades, not a "user".

    I'd get myself one for curiosities sake (just like I got myself a Raspberry PI as a toy), but at the prices I am expecting for these, curiosity will stop short of getting one that might actually be usable for something interesting (the M1Max), when I get paid for doing things with CUDA on Linux.

    Getting enough machine learning training power into a battery operated notebook is still much futher away than electrical power anywhere I sit down to work. Just like with "phones", I barely use the computational power nor battery capacity of the notebooks I own. My current Ryzen 5800U is total overkill, while I'd happily put 64GB of RAM in it (but it's 16GB soldered). So if I actually do want to run a couple of VMs in a resort, I'll have to pack the other slightly heftier (64GB and it will do CUDA, but not for long on battery).

    I can probably buy two or three more, add 8TB NVMe and double RAM on each and still have money left vs. what Apple will charge.

    Yes, they won't have as much power per Watt taken from the battery, but that does not matter to me... enough to get my Raspberry a fruity compagnon ;-)
  • name99 - Monday, October 18, 2021 - link

    OK, so now that we've all got out of our systems
    - cost too much
    - suck compared to team Intel/AMD/nVidia
    - don't include <weird specialist feature I insist every computer on earth has to include>
    let's try to return to technology.

    Note the blocks (in red) at the very bottom of the M1 Max. On the left-most side we have a block that is mirrored higher up, above SLC and just to the left of the 8 CPUs. Next we have a block that is mirrored higher up above SLC, to the right of the 8 CPUs.
    Apple tell us that with Max we get 2x Pro Res Encoders and Decoders. Presumably those blocks; one minor question of interest is whether those blocks are *only* ProRes or are essentially generic encoders and decoders; ie you may get double the generic media encode/decode on Max, which may be useful for videographers beyond just Pro Res users?

    It certainly also looks like the NPU was doubled. Did I miss that in the event? I don't recall Apple saying as such. (Also looks like the NPU -- or somethingNPU-relevant -- extends beyond the border drawn by Andrei, when you compare the blocks in the two locations).

    Finally we get the stuff at the right of the Max bottom edge, which replicated the area in blue above the NPU. Any suggestions? Is that more NPU support hardware (??? it's larger than what Andrei draws as the NPU). Lots of SRAMs -- just by eye, comparing it to the P cluster L2, it could 16MB or so of cache.

    So this all suggests that
    (a) with the Max you also get doubled NPU resources (presumably to search through more video streams for whatever NPU's search for -- faces, body poses, cats, etc)

    (b) the NPU comes with a fairly hefty secondary cache (unless you can think of something else that those blocks represent). Looking at the M1 die, you can find blocks that look kinda similar near the NPU, but nothing that's a great match to my eyes. So is this an additional M1X change, that it comes with the same baseline NPU as the M1/A14, but augmented with a substantial NPU-specific "L2" (which might be specialized for training or holding weights or whatever it is that people want from NPUs)?
  • abufrejoval - Monday, October 18, 2021 - link

    Well, I love your speculation, but on the Apple shop page, the SoC configuration makes no difference on core count of neural engine, it remains at 16 "cores" for all three variants, M1/Pro/Max.

    You may argue unique differentiation for M1 SoC and how they do RAM with it, but SSD storage is just commodity. And all their cleverness about using DRAM to produce GDDR5 class bandwidth leaves a bad taste when they sell it at HBM prices.

    Ursury around here starts at 20% above market price and Apple is at 200% for SSD and RAM.

    After the minimally interesting config got me beyond €6000, my curiosity died.
  • abufrejoval - Monday, October 18, 2021 - link

    Usury (no bear included, just greed)--net edit!
  • Tomatotech - Monday, October 18, 2021 - link

    Apple used to charge absolute rip-off prices for bog-standard SODIMMs in their models. By comparison, this new pricing of $400 for a 32GB RAM upgrade to 64GB is actually not too bad.

    This is NOT DDR4 RAM. This is running LPDDR5 RAM, and is the first customer laptop / desktop in the world to run LPDDR5. You're paying for that first-adopter advantage. (There are some very recent phones that run LPDDR5 but I think they max at 12GB and use a slower variant)

    Mid-range DDR4 for desktops seems to run at about $5/GB for 2 x 16GB. But go up to 2 x 32GB, and suddenly it's around $10/GB especially on the high end, so you're looking at around $320 for 32GB of fast high-end DDR4.

    The M1 Max runs extremely fast extremely specialist RAM that is a generation faster than DDR4 / LPDDR4, and the 64GB is concentrated down into only 4 on-chip RAM modules.

    Getting that at only $12.50/GB for the extra 32GB is a bit of a bargain at this point in time.

    (I previously said this was DDR5 RAM, I was wrong. As for storage, yes $400/TB is stupidly steep and shouldn't cost so much extra even for fast pcie 4.0 storage. That's more or less a commodity by now.)
  • lmcd - Tuesday, October 19, 2021 - link

    Remember there's probably like 1 or 2 NAND options they've rated their SoC-internal controller for, which means they've probably had to select for top binning on both power and performance due to sharing architecture with the M1 iPad Pro.

    Which, honestly, is less of a sacrifice than I expected from Apple's ARM transition. This whole thing has been disturbingly smooth even considering the software incompatibility lumps.
  • abufrejoval - Tuesday, October 19, 2021 - link

    Thanks for lecture, but I actually already admired their creative super-wide four channel interface for the M1max, which unfortunately sacrifices any external expandability.

    Still, while it's a special and needs to be managed with a complex die carrier and assembly, in overall cost it's relatively normal technology and thus cost, everything high-volume items.

    So they make their typical >200% margin also on the DRAM.
  • dc443 - Thursday, October 21, 2021 - link

    One charges a premium for a halo product. That is simply what one does, it's very a simple economic calculation. I expected the upcharge to go to 64GB to be north of $1k, it was not, it was $400.

Log in

Don't have an account? Sign up now