• ghaff 2 hours ago

    A fascinating peek into the fairly deep past (sigh) is Abrash's The Zen of Assembly language. Time pretty much overtook a planned Volume 2 but the Volume 1 is still a pretty fascinating read for a time when tweaking optimization for pre-fetch queues and the like was still a thing.

    • mshockwave 2 hours ago

      > (Intermediate)1. Adding to memory faster than adding memory to a register

      I'm not familiar with Pentium but my guess is that memory store is relatively cheaper than load in many modern (out-of-order) microarchitectures.

      > (Intermediate)14. Parallelization.

      I feel like this is where compilers come into handy, because juggling critical paths and resource pressures at the same time sounds like a nightmare to me

      > (Advanced)4. Interleaving 2 loops out of sync

      Software pipelining!

      • optymizer an hour ago

        What's a good resource like this for modern CPUs (especially ARM)?

        • fwip 3 hours ago

          Looks like this was written in 2004, or thereabouts.

          • nickelas 2 hours ago

            I was wondering why it said P4. That's an old processor.