A fascinating peek into the fairly deep past (sigh) is Abrash's The Zen of Assembly language. Time pretty much overtook a planned Volume 2 but the Volume 1 is still a pretty fascinating read for a time when tweaking optimization for pre-fetch queues and the like was still a thing.
> (Intermediate)1. Adding to memory faster than adding memory to a register
I'm not familiar with Pentium but my guess is that memory store is relatively cheaper than load in many modern (out-of-order) microarchitectures.
> (Intermediate)14. Parallelization.
I feel like this is where compilers come into handy, because juggling critical paths and resource pressures at the same time sounds like a nightmare to me
> (Advanced)4. Interleaving 2 loops out of sync
Software pipelining!
What's a good resource like this for modern CPUs (especially ARM)?
Looks like this was written in 2004, or thereabouts.
I was wondering why it said P4. That's an old processor.