Comments Page - M4 chips: E and P cores

« Back M4 chips: E and P coreseclecticlight.coSubmitted by ingve 15 hours ago

ulfw 14 hours ago
Very interesting to see. Efficiency (E) cores use only 7% of the energy that Performance (P) cores do performing the same task and take about 4x as long to do it.
So about 13.5x (23 J when run on P cores, and less than 1.7 J when run on E cores) the power to do about 4x the performance
- undefined 14 hours ago
  [deleted]
- jbverschoor 13 hours ago
  So BeOS has a place in this universe
- cubefox 14 hours ago
  This may come largely from clock speed needing disproportionately more/less energy the higher/lower it goes.
  This answer (based on an old source) even says power consumption increases with the cube of the clock speed: https://physics.stackexchange.com/posts/61937/revisions
  Though this would mean a 4x in clock speed would consume 4^3=64 times as much energy, which is more extreme than what is observed here in the Apple chip. So either the clock speed/power relation is different now or the P cores do not actually have a 4x scaling in clock speed. Cache size etc may also play a role in performance.
  huijzer 13 hours ago
  Isn't that called Dennard scaling [1]?
  [1]: https://en.wikipedia.org/wiki/Dennard_scaling
  cubefox 13 hours ago
  No, the "cube law" is related to varying clock speed rather than to varying transistor size.
  formerly_proven 11 hours ago
  Per the article, the clock speed difference is much smaller than the ~4x performance difference (4.5 GHz vs 2.6 GHz, i.e. 1.7x). So more than half of the performance advantage of the P cores has to come out of the uarch difference (wider structures etc.). Meanwhile there will be other factors besides clock frequency, e.g. the P cores might use a different cell library than the E cores.
  cubefox 11 hours ago
  Makes sense. This would suggest the difference in power draw may not mainly come from the clock frequency, since (1.7x)^3=5x, which is significantly less than the 13.5x in power draw.
- raverbashing 14 hours ago
  I wonder what changes? In-order vs OOO? Less int/fp units? Are they fully instruction set compatible?
  nsbk 14 hours ago
  From the article about the instruction set:
  > This is believed to be identical to ARMv9.2-A without Scalable Vector Extension (SVE) supported by M4 P cores, enabling the same threads to be run on either core type.
  It also explicitly mentions half of the processing units per core and lower clock speeds.
  bayindirh 13 hours ago
  No they are not. "Efficiency" cores are generally tailored to do simple stuff well. Less floating point, more integers. Like file parsing, serving web pages, responding to network events, whatnot.
  When you need heavy computation (encoding, scientific, etc.) P cores are your only choices.
  As a result, server ecosystem will be fragmented a bit. For HPC and calculation stuff, P-Core heavy processors will be sold. For cloud and CRUD systems, E-Cores will dominate.
  undefined 7 hours ago
  [deleted]
  JohnBooty 10 hours ago
  From the article:
  "[The E cores'] instruction set is the same as M4 P cores, ARMv9.2-A without its Scalable Vector Extension (SVE)"
  bayindirh 8 hours ago
  I mean, not having the SVE do not make them run all the workload of P cores, and this is what I said already? They are not of course different ISAs at the core, but they're not the same cores per se. When you are missing extensions, you can't address these cores when there are these instructions present. Forcing it, would kill your application with an illegal instruction error.
  So heavy computational stuff is not the target of E cores, you need P cores for that.
  aseipp 7 hours ago
  The quoted sentence is poorly worded. The P and E cores are fully instruction set compatible. It isn't possible to meaningfully know ahead of time if the instructions will be used on any given core, and trapping along with a migration is expensive and needless. The M4 as a whole does not support the SVE/SVE2 extensions anywhere, which is what the article is saying in the given quote.
  The M4, on all cores, does support the SME extension, which includes a subset of SVE instructions at a wider vector length (512-bits), optimized for throughput. SME instructions are handled by a separate accelerator coprocessor unit attached to each cluster, shared by a number of cores, and don't "exist" in the normal instruction pipeline (where e.g. 256-bit SVE2 instructions would be handled.) This was all true of the proprietary Apple AMX extensions in previous cores as well, as far as I'm aware.
  formerly_proven 11 hours ago
  Curiously we had this argument before, roughly two decades ago.
  bayindirh 11 hours ago
  Doesn't this happen in cycles? A specialized hardware appears, then it gets integrated into processor to make way for an even more specialized variant of the thing, rinse and repeat.
  This external thing doesn't have to be "more powerful" per se. So, E cores are lower power helpers which are implemented back into the CPU in a slightly altered form.
  Who or which prevents them from being dedicated to processing a network stream or just handling network thread of a service, making them "efficient accelerators" in a sense?
  saagarjha 14 hours ago
  No, yes, yes
  whereismyacc 14 hours ago
  Isn't basically every modern cpu core OOO?
undefined 13 hours ago
[deleted]