Comments Page - Writing memory efficient C structs

« Back Writing memory efficient C structstomscheers.github.ioSubmitted by aragonite a day ago

_rend a day ago
For completeness, this description of alignment is misleading:
> Well, dear reader, this padding is added because the CPU needs memory to be aligned in sets of 4 bytes because it’s optimized in that fashion.
> ...
> Remember: since structs are aligned to 4 bytes, any padding is therefore unnecessary if the size of the struct is a multiple of 4 without the padding.
Individual data types have their own alignment (e.g., `bool`/`char` may be 1, `short` may be 2, `int` may be 4, `long` may be 8, etc.), and the alignment of a compound type (like a struct) defaults to the maximum alignment of its constituent types.
In this article, `struct Monster` has an alignment of 4 because `int` and `float` have an alignment of 4 for the author's configuration. Expanding one of the `int`s to a `long` could increase the alignment to 8 on some CPUs, and removing the `int` and `float` fields would decrease the alignment to 1 for most CPUs.
- sumtechguy a day ago
  Also keep in mind that is also all very CPU and compiler specific. Had one compiler where it packed everything at 4/8, usually 8. Not the 1/2/4/8 you would expect. That was because the CPU would just seg fault if you didnt play nice with the data access. The compiler hid a lot of it if you set the packing with offsets and mem moves and shifting. It was clever but slow. So they by default picked a wide enough packing that removed the extra instructions at the cost of using more memory. x86 was by far the most forgiving while at the time I was doing it. ARM was the least forgiving (at least on the platform I was using). With MIPS being OK in some cases but not others.
  jandrewrogers 18 hours ago
  Some of the Cray hardware was basically pure 64-bit. The systems largely didn’t recognize smaller granularity. I learned a lot of lessons about writing portable C by writing code for Cray systems.
  neeeeees 8 hours ago
  On one of these less forgiving architectures, how does one write programs that read some bytes off the network, bitcast them into a struct, and do something based on that?
  On x86 you would use a packed struct that matches the wire protocol.
  Wouldn’t this require extra copying if member reads were forced to be aligned?
  sumtechguy 8 hours ago
  yep exactly that. I had that exact issue. Junk coming in from a tcp/ppp connection then had to unpack it. Tons of garbage moves and byte offsetting and then making sure you keep the endianness correct too. On the platform I was using luckily memcpy could do most of what I needed. Not the best way to do it but the wildly out of date branch of gcc could do it. Got pretty good at picking junk out of random streams shifting and and/or whatever was needed. Totally useless skill for what I work on these days.
  gavinsyancey a day ago
  Um ... isn't alignment generally dictated by the platform ABI so that programs compiled by different compilers can be linked together?
  plorkyeran a day ago
  The widely used platforms with multiple compilers generally have one or more written down ABIs that the compilers all follow, but more niche platforms frequently have exactly one compiler (often a very out of date fork of gcc) that just does whatever they felt like implementing and may not even support linking together things built by different versions of that one compiler.
  sumtechguy 8 hours ago
  We had that exact thing. Our target at the time was about 6 different platforms. 2 of them had very picky compilers/ABI. We were trying to keep it to one codebase with minimal if-def callouts. Learned very quickly not all compilers are the same even thought they may have the same name and version number. Then the std libs are subtly different enough from each other you really have to pay attention to what you are doing.
  bobmcnamara 19 hours ago
  Ideally yes, but practically there are at least a dozen just for x86.(there's like 3 big ones).
  undefined a day ago
  [deleted]
- flohofwoe 15 hours ago
  AFAIK alignment doesn't even matter anymore (for CPU data at least) since the 'word size' of a modern CPU is the size of a cache line (32 or 64 bytes?), e.g. unaligned accesses within a 32 or 64 byte block are not different than aligned accesses.
  (technically there is still an advantage of items aligned to their size in that such an item can never straddle adjacent cache lines though)
  And there's also still tons of different alignment requirements when working with GPU data - and interestingly those alignment requirements may differ from C's alignment rules, so you may need to explicitly use packed structs (which are still not a standard C feature!) with manual padding.
  birn559 15 hours ago
  My understanding is that C++ compilers still add padding by default for performance reasons. CPU will have to spend a few cycles to reorganize data that is not aligned in chunks of 4 bytes.
  flohofwoe 15 hours ago
  Daniel Lemire did some measuring ~~recently~~ (oops, in 2012):
  https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...
  TL;DR: 10% difference on what in 2012 was a low-end CPU, no difference on "new in 2012" CPUs. So my guess is that by now it really doesn't matter anymore :)
  birn559 10 hours ago
  Wasn't aware of that, thanks for the link!
  xxs 13 hours ago
  > CPU will have to spend a few cycles to reorganize data that is not aligned in chunks of 4 bytes.
  That's not true for quite a lot of CPUs. Pretty much all x64 and stuff don't care
- kazinator 20 hours ago
  I.e. the author is wrong; struct s { char x; }; is not required to be four-byte-aligned. It can have a 1 byte size, and thus alignment.
- boomlinde 10 hours ago
  > Individual data types have their own alignment (e.g., `bool`/`char` may be 1, `short` may be 2, `int` may be 4, `long` may be 8, etc.), and the alignment of a compound type (like a struct) defaults to the maximum alignment of its constituent types.
  I will add that this is implementation defined. IIRC the only restriction the standard imposes on the alignment of a struct is that a pointer to it is also a pointer to its first member when converted, meaning its alignment must practically be a multiple of that of its first field.
  NooneAtAll3 10 hours ago
  implementation-defined means your specialized platform can be supported without needing to conform - it does not mean that common knowledge is false for common users
  boomlinde 10 hours ago
  "Implementation-defined" means that there is nothing to conform to as far as the standard is concerned. I have not claimed that "common knowledge is false for common users" or anything to that effect. My comment is additive, which should have been clear to anyone reading the first three words of it.
- bobmcnamara 19 hours ago
  And bool may be 4, and char may be 2(m68k)!
  flohofwoe 15 hours ago
  Not unless you're stuck in the 90s ;) These sizes have all been standardized to 1 byte since C99.
  bobmcnamara 6 hours ago
  Also, "sizeof(bool) is not required to be 1."
  I've seen register-sized bool on systems without free conversion between register sizes.
  bobmcnamara 10 hours ago
  Not the sizes, the alignments.
  flohofwoe 6 hours ago
  Same thing in C though, the primitive types have the same alignment as their size.
  bobmcnamara 6 hours ago
  That is true on some systems, but not a portable assumption by any means.
  Edit:
  st7: sizeof(uint32_t)=4, aligned(uint32_t)=1
  msp430: sizeof(int)=2, alignof(int)= 1
  Real old ARM: sizeof(double)=8, alignof(double)=4
  OG M68k sizeof(char)=1, but struct align(char)=2
david2ndaccount a day ago
The author relies on the compiler fitting the bitfield in unused padding bytes after “speed”, but that is implementation-defined behavior (almost everything about bitfields is implementation defined). MSVC will align the unsigned bitfield to alignof(unsigned), whereas GCC and clang will use the padding bytes. To portably pack the struct, use unsigned integers and use flags (monster.flags & CAN_FLY instead of monster.can_fly).
See https://c.godbolt.org/z/K7oY77KGj
- cryptonector 21 hours ago
  Bitfields have other problems. Say you have two bitfields each of `uint8_t` type and totaling 16 bits: well, you might think that's _two_ fields, but the compiler is allowed to treat them as _one_ whenever it wants to, and _that_ can be a problem if you're accessing one with atomics and the other without because the other can yield unsynchronized accesses to the one that needs synchronization.
  Bitfields in C leave way too much behavior to the compiler or undefined. It's really intolerable.
  atiedebee 13 hours ago
  Even worse: bit fields can only be applied to int, signed int and unsigned int (maybe char as well but i dont remember)
  Even crazier is the fact that an int bitfield's signedness is implementation defined
- arjvik a day ago
  Why can't there be a standard defined for bitfields in future C releases? This is a long-discussed drawback of a feature that I really really want to be able to use in production code.
  petermclean a day ago
  I've been working on an open source library that, while it doesn't solve bitfields, can provide convenient ergonomic access to bit-granular types:
  https://github.com/PeterCDMcLean/BitLib
  You can inherit from bit_array and create a pseudo bitfield:
  class mybits_t : bit::bit_array<17> { public: // slice a 1 bit sized field // returns a proxy reference to the single bit auto field1() { return (*this)(0, 1); } // slice a 3 bit sized field // returns a proxy reference to the 3 bits auto field2() { return (*this)(1, 4); } }; mybits_t mybits(7); assert(mybits.field1() == 1); mybits.field1() = 0; assert(static_cast<int>(mybits) == 6); //no implicit cast to integers, but explicit supported
  There is minimal overhead depending on how aggressively the compiler inlines or optimizes*
  writebetterc 12 hours ago
  C++ has bitfields built in : https://en.cppreference.com/w/cpp/language/bit_field.html
  PaulDavisThe1st 21 hours ago
  different/more than std::bitset<SIZE> ?
  petermclean 17 hours ago
  Only similar by a little bit. Bitset misses the mark in so many ways.
  bit_array can be compile time storage (ala bitset) or dynamic at construction time. There is also bit_vector which is entirely dynamic
  Critically though, is that the type/library is built on a proxy iterator/pointer that lets you do certain things with STL or the bit algorithms that you cannot otherwise do with bitset.
  But back to the bitfields idea. The bit array backing storage is configurable through templates so you will be able to know for certain exactly how much stack or heap your type will take up
  gizmo686 a day ago
  There is. I'm not sure when it was added, but looking at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf, page 106, note 13
  > An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
  In practice, I've been using this feature for ages (along with __attribute__((packed))). There comes a point where you can start depending on compiler specific features instead of relying solely on the standard.
  wahern a day ago
  The portion you quoted derives word-for-word from the original ANSI standard (C89/C90): https://port70.net/~nsz/c/c89/c89-draft.html#3.5.2.1
  nomel a day ago
  Can this be standardized? It seems it would have accommodate the most limited architecture, in terms of memory access, probably making it practically useless. Seems best left to the compiler.
  wahern a day ago
  The quoted text is from the C standard. The poplar wisdom is that anything goes wrt bit-fields, but it's an exaggeration.
  The original (?) DNS library used bit-fields to read and write the packet header status and flags fields, and that original code is still used today: https://github.com/openbsd/src/blob/master/include/arpa/name... It does rely on implementation-defined behavior--it swaps the order of fields based on whether the machine is little-, big-, or middle-endian--but even that behavior has remained remarkably consistent, at least on Unix ABIs.
  I think the wisdom comes from dealing with niche compilers, especially from years ago, where bit-fields were one of the areas where standard adherence was less consistent; and in embedded and kernel contexts, where the implementation-defined and (implicit) undefined behavior, such as wrt atomicity or the intersection of bit-fields and (extension) packed pragmas, counseled against any usage of bit-fields.
  ajross a day ago
  There is, but it's part of the platform ABI and not the C language standard. The latter specifies syntax and behavior, the former is what's concerned with interoperability details like memory layout.
  I happen to have a copy of AAPCS64 in front of me, and you can find the specification of bit packing in section 10.1.8. The sysv ABI for x86/x86_64 has its own wording (but I'm pretty sure is compatible). MSVC does something else I believe, etc...
munificent a day ago
This is a good article, but it bugs me that it sort of only tells half the story.
Why are you making your structs smaller. Probably for "performance". If your goal is literally just reducing memory usage, then this is all fine. Maybe you're running on a small microcontroller and you just straight up don't have much RAM.
But these days, in most areas, memory is cheap and plentiful. What's expensive is CPU cache. If you're optimizing memory use for that, it's really about runtime speed and less about total memory usage. You're trying to make things small so you have fewer cache misses and the CPU doesn't get stuck waiting as much.
In that case, the advice here is only part of the story. Using bitfields and smaller integers saves memory. But in order to access those fields and do anything with them, they need to get loaded into registers. I'm not an expert here, but my understanding is that loading a bitfield or small int into a word-sized register can sometimes have some bit of overhead, so you have to pay attention to the trade-off.
If I was optimizing for overall runtime speed and considering packing structs to do that, I'd really want to have good profiling and benchmarks in place first. Otherwise it's easy to think you're making progress when you're actually going backwards.
- ack_complete 21 hours ago
  > If you're optimizing memory use for that, it's really about runtime speed and less about total memory usage. You're trying to make things small so you have fewer cache misses and the CPU doesn't get stuck waiting as much.
  The complication is that cache is a global resource. Code that has larger data structures, even if it runs faster, can contribute to a larger working set and a slightly higher cache miss rate in other code. This can lead to a death-by-thousand cuts scenario and it's hard to get solid data on this when profiling.
  You're right, though, that there are a number of ways that smaller structs can be slower, either directly by execution time or indirectly by larger code causing more misses in the instruction cache. Some other factors include whether the compiler can coalesce accesses to multiple fields grouped together, or whether the struct hits a magic power of two size for faster array indexing or vectorization. Which means you can end up speeding up code occasionally by making the struct bigger.
- Rayhem 7 hours ago
  If you're really accessing those fields it's overwhelmingly likely that you're accessing the same field for all of your instances. Your access pattern is going to be
  monsters[0].health = //calculation monsters[1].health = //calculation monsters[2].health = //calculation
  and not
  monsters[0].health = //calculation monsters[1].name = //update monsters[2].is_poisoned = //check
  Striding over your monsters array means you have to load each monster to access one field which just blows your cache to hell. Much better is to pack all of your health values together in a structures-of-arrays layout as opposed to the usual arrays-of-structures approach here. Entity-component-system architectures are a brilliant tool for managing this with some pretty performant results.
- ljchen 14 hours ago
  I most agree, only except that perf is determined by a myriad of facors. Even if this piece of data fits into a cache line, it may have no or negative effect on the following code. For complex softwares, even if it's faster after the change, it's hard to tell what's the contributing factor really is. I once read a system paper (forgot the name) from a recent system research conference on how changing the code layout, e.g., moving a block of code, randomly may have unexpected impacts on perf.
  Personally, I only pay attention to the cache line and layout when the structure is a concurrent data structure and needs to sync across multiple cores.
- ryao a day ago
  If you are using an intrusive data structure, keeping the key on the same cache line as the pointers can make traversals faster. Reducing padding can make this easier to do.
  In general, keeping commonly accessed together things on the same cache line is a good idea and the work needed to remove excessive padding overlaps with the work needed to do this.
- bluGill 20 hours ago
  I pack structs because the packed layout matches some hardware or network structre and thus I get direct access . This isn't a common case but it exists.
  yellowapple 17 hours ago
  That case is much more common in embedded/bare-metal programming. A decent chunk of the code I've written so far for getting Zig running on a Nintendo 64 is a bunch of packed structs representing various registers (and a bunch of methods on those structs representing common uses of those registers).
  It's also a reasonably common case for parsing binary data formats (e.g. a file's header has fields at exact byte offsets, and it's much faster and easier to just grab all the bytes of that header and interpret those bytes as a packed struct).
- voidnap 15 hours ago
  > memory is cheap and plentiful. What's expensive is CPU cache.
  CPU cache is memory. Also memory isn't cheap, it is relatively expensive to access. CPU cache is way cheaper. You have it backwards.
  mort96 10 hours ago
  The obvious and charitable read of their post is that they use "memory" to mean main memory/RAM, "CPU cache" to mean CPU cache, and "memory is cheap and plentiful" to mean "memory has a low dollar cost and most systems have a whole lot of it".
  Your read -- that they use "memory" to mean "anything which stores bytes, including caches", and that they use "memory is cheap" to mean "accessing RAM has a low latency" -- is uncharitable, contrived and even literally self-contradictory ("CPU cache is memory" + "memory is expensive to access while CPU cache is cheap to access" is a contradiction which can only be resolved through using different definitions of "memory" in those two sentences).
  writebetterc 12 hours ago
  They're talking about dollars
- zeusk a day ago
  The real issue with packed structs and bitfields happens when concurrency gets involved. Majority of modern CPU caches are private and only allow one core to hold the cache line - so it creates more false dependencies when cores are trying to alter information that was compacted into a cache line on another core.
  charleslmunger 19 hours ago
  Avoiding false sharing is a separate problem best solved by explicitly aligning the struct or relevant members to std::hardware_destructive_interference_size.
- jcelerier a day ago
  > But these days, in most areas, memory is cheap and plentiful
  and that's why I am contemplating freaking 128G of ram for my next machine, I have 64 right now and OOM regularly, 32 is unuseable
  zahlman 20 hours ago
  What software are you running? I'm humming along without a problem on 8.
  jcelerier 18 hours ago
  just an instance of GCC on a larger file with some complex C++ will easily eat multiple gigabytes of RAM
  davrosthedalek 9 hours ago
  You are not supposed to reimplement DOOM with templates.
  munificent 5 hours ago
  You're not the boss of me!
  davrosthedalek 19 minutes ago
  Are you sure?
- wbl a day ago
  Everything is microarch dependant but typically the memory subsystem can handle subword loads and writes. What it usually can't do is bits.
rwmj a day ago
No mention of pahole / dwarves? https://lwn.net/Articles/335942/ It's the standard tool used by the Linux kernel developers.
- petergeoghegan a day ago
  Recent versions of clangd show pahole-like struct size/alignment padding overhead annotations. I find this built-in support a lot more convenient than using pahole.
- uticus a day ago
  for somebody starting out, where do you go for a list of tools like this?
- ryao a day ago
  pahole is an awesome tool.
kazinator a day ago
Generally speaking, sort the members in descending order by alignment, if you want the tightest packing without resorting to nonportable compiler extensions. This descending order eliminates padding between members producing the smallest offset for the last byte of the last member, and that then minimizes the padding required at the end of the struct.
In C, alignment for basic types is linked to size. A type may have an alignment requirement that is as much as its size: e.g. an 8 byte data type might be aligned as strictly as addresses divisible by 8.
The alignment of an array aggregate is that of its element type.
The alignment of a struct is that of its most strictly aligned member.
In portable coding, we don't exactly know, but we can make the worst-case assumption that every basic type has the strictest possible alignment---i.e. its size. We also don't know the sizes of all the types, but we know that integers of higher ranks are at least as large as lower ranks; i.e. sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long), etc.
Things are tricky when you have a mixture of floating-point, pointers and integers. E.g. pointer could be the same size as a float, or bigger.
Here is a trick you use: combine together clumps of like types, and try to make combinations that are powers of two. For instance, if your structure has two or four pointers to things, try to make them adjacent. E.g.:
```
  struct s {
    void *p1, *p2, *p3, p4;
    short s1, s2, s3, s4;
    float f1, f2, f3, f4;
  };
```
Even though float is almost certainly larger than short and may have stricter alignment, its moot because I have four of everything. Even if this is on a tiny system where pointers are 16 bits wide, four of them will make 8 bytes. So when the f1 member is being placed into the struct, the offset is at least 8 byte aligned: we don't expect padding.
If you aggregate enough members of the same type in a power-of-two sized clump, you can easily create alignment which meets or exceeds the strictest alignment in the system.
So those are the main principles: group together members of the same type. Then keeping those groups together, sort by descending alignment strictness.

o11c a day ago

Note that although alignment and size are platform-specific, in practice padding can be minimized simply by using a fixed order:

  (various vector types)
  int128_t
  long double
  long long, int64_t, int_fast64_t, double
  off_t
  time_t (can be 64 bits on 32-bit platforms only if off_t also is)
  register_t, int_fast16_t, int_fast32_t (64-bit on e.g. x32)
  void *, void(*)(), intptr_t
  ptrdiff_t (theoretically wrong if fully aligned and bigger than pointer)
  size_t, ssize_t
  long (32-bit on win64; generally has 16-bit alignment if bigger than pointer)
  int32_t, float
  int (can be 16-bit)
  short, int16_t
  char, bool, int8_t, int_fast8_t
  (a trailing array member must come last; note when calculating combined size you should use offsetof instead of sizeof to not waste padding)

Besides this, `int_leastN_t` goes with `intN_t`, and all unsigned types go with their signed types. Note that types like `id_t` are hard to place however.

Although there do exist many platforms where this is not in order of size, the bigger types generally have smaller alignment on such platforms. Despite this I have split out rows occasionally despite not knowing of any such platform.

Rayhem 8 hours ago
> The health of a monster is currently represented by a 4 byte signed integer, meaning that the health can range from -2^31 to 2^31-1 (2^31 ~= 2 million). We can already see that half of the integers potential is unused because the health of a monster should never be negative, so an unsigned int would fit way better.
If you're strictly optimizing for memory size out of necessity, then yes. It's dangerous to use an unsigned type in any context where you might even consider subtraction, though, and it's pretty effortless to see that new_health = health - damage is going to be used somewhere. Congrats! Your 100hp lvl3 monster just got resurrected with sixteen thousand hp.
variadix a day ago
Some other things you can do to improve packing are using integer indices/offsets/relative pointers instead of absolute pointers (especially on 64-bit machines), overlapping fields for unions (as an example you may have a tag that can share some bits with another field in each member of the union), using a structs position in memory to encode information (for example, if you have a binary tree of array allocated nodes, you can specify that the right child always follows the parent in memory, then you only need to store the index of the left child), and of course structure packing pragmas/attributes.
cogman10 a day ago
Nice little article.
The next thing to really explore is how the struct is being used. Smaller is always nice, but sometimes you need it to be fast. To do that, you'll want to group together fields that are commonly read. For example, x, y, and speed are likely always read together. So, it'd make sense to group those fields together to optimize for cache reads.
The next thing to question is if your Monster is being interacted with as an individual thing or as a group of things. If it's as a group of things then it might make sense to create a "Struct of arrays" of monsters. Imagine, for example, you have a system which removes all the dead monsters. It'd be faster to iterate through an `isAlive` array rather than an array of `Monster` structs.
And now you are on your way to reinventing an ECS :)
foota a day ago
Some related thoughts (more for C++ than C).
You can eliminate the cost of a wrapping class with no members by taking advantage of the empty base class optimization (or rather: by avoiding members in a base class you will be taking advantage of EBCO).
You can use an untagged union in C++ (taking care to avoid undefined behavior) to implement small object optimization. In it's general form, the idea is you can store some smaller struct in a union (as a char[], since C++ doesn't have native unions) with a larger structure, and use a sentinel in the larger structure to determine which it is. E.g., if large struct has some field that must not be zero, then make it zero in the smaller struct and you can distinguish between the two without a tag (like would be required with std::variant).
I'm also a huge fan in general of "make invalid states unrepresentable" and following this can often reduce redundancy and eliminate extra fields.
_mocha a day ago
I'm somewhat surprised to see this on the front page of HN. I distinctly remember this material being covered in a single lecture during our freshman CS course 20 years ago. I doubt this would've reached the front page in HN's early days, and it makes me wonder if we've moved away from teaching these fundamentals due to the heavy reliance on abstraction and ai agents in today's industry.
- asa400 a day ago
  I've never taken a lick of CS in a formal setting, and I feel like that's increasingly common as programming has broadened its employment base. Most of the people I work with haven't done formal CS, either. Consequently I've had to learn all of this stuff on my own. There are communities out there that value education this kind of education. I can vouch for the Rust community, which has helped me learn _a ton_ about this kind of "lower level" stuff over the last 5 years despite starting my career doing Ruby.
- bigstrat2003 14 hours ago
  I have a degree in CS from around the same time frame, and we never covered topics like this. So I think it varies widely, and is not necessarily a modern thing for people to not learn about struct sizing.
- jerf a day ago
  One of the neat things about programming is that you can develop a high degree of skill in it without any formal education at all.
  But it does mean that we've always had those programmers who have had characteristic holes in their education as a result.
  Personally I enjoy having a mix of self-taught and formally-taught around; I think there quite a few things the self-taught are often better at than the formally educated as a group. I respect both sides. I'm just saying that there has always been a market for this sort of thing on HN because the self-taught have always been around.
  nomel a day ago
  > I think there quite a few things the self-taught are often better at than the formally educated as a group.
  Would you mind expanding on this a bit?
  undefined 21 hours ago
  [deleted]
  jerf 8 hours ago
  First, there are people who manage the best of both worlds, so bear in mind I'm talking about the ones who only manage the one side or the other.
  The self-taught ones are often much better at diagnosis of problems in my experience. "Hey, the servers are blowing up, own the process from diagnosis to deploying the fix" is something I can generally throw at a self-taught dev faster than a school-taught dev. Often because even if they've never owned a server they've already been through the "my graphics driver is crashing my desktop when I plug in my USB drive" dance on Linux a number of times and know how do go through diagnosis processes.
  They're also often better out-of-the-gate at being handed a large task and knowing how to break it down quickly without much direction. Sometimes they should be getting more direction than they do; this is often the source of bits of the code base that seem to be completely foreign in style, library usage, etc. Gotta watch the PRs.
  On the other hand, the self-taught will sometimes blunder into some critical O() mistakes because they haven't been taught to think that way, and may never have worked at any real scale before. They can also end up doing too much without using a library because they're used to just doing it themselves. (Not that school teaches you to "use libraries" exactly, but the school-taught are just generally less fluent and as such more likely to look for easier solutions... which is often exactly what we want!)
  Where the holes really develop is when we get into heavy data structures, or a compiler-like code base (though sadly many school-taught people escape without compilers), or heavily recursive code, which doesn't just include "recursion" per se but also things like working with DOMs that are recursive tree structures. A lot of time the self-taught are great at straight-line "do this, then do that, then do that" code bases but struggle to build longer-term maintainable code that has good reuse.
  And I'll say it again, these are all absolutely, positively, generalities and they do not apply to everyone. Some self-taught people are great at creating high quality, well-architected code bases. Some people escape school not just without compilers, but without any discernible skills whatsoever. (Those are some sad interviews.) These are just broad patterns I've observed.
miguel_martin a day ago
>“… Note that I'm not an expert, in anything, so take everything I write about with a grain of salt I guess :).”
>”I’m by far an expert and I could be wrong in some specifications”
You’re missing a “not” on that 2nd sentence ;)
In all seriousness, you don’t need this disclaimer, you’re selling yourself short.
There’s nothing wrong with your post to make the disclaimer.
And if there is something wrong with the post, let the comments flood in and tell you that rather than yourself.
Retr0id a day ago
I don't think I've ever seen struct bitfields used in real code, it's almost always manual bit-masks against a numeric type, often via macros. (I assume due to portability)
- ziml77 a day ago
  The Windows API uses bitfields like that in some places. I know because I was working with the display configuration API recently and found the bitfields very annoying to use through generated FFI bindings (especially because they were used in anonymous structs inside anonymous unions).
- gizmo686 a day ago
  I see it decently often in embedded code for memory mapped IO registers. Other than that, I don't think I've ever seen it either.
  I'm not sure portability is the reason though. I've seen it in places that have no problem spamming #define __GNU_SOURCE
  mort96 10 hours ago
  I would be scared of using it for IO... With a struct bit field, I don't know what bit offset a field ends up with, I don't know how it gets padded, and I don't know how large the generated load and store becomes. I'd rather do the bit operations and store myself.
  gizmo686 7 hours ago
  __attribute__((packed)) is pretty much required for this use case.
  mort96 7 hours ago
  That gets rid of the padding issue, but still does little to tell me which bit corresponds to which field.
  nomel a day ago
  This is the only place I've seen it used, and the only placed I've used it.
  It's absolutely wonderful for IO.
- dirt_like a day ago
  I have seen bit fields used in real code. Long ago. The reason I know it so well is that bit field updates are NOT atomic, and this caused a rare crashing bug under heavy load. Unless you read the header the code looks atomic.
  S->buffer_free = 1;
  This is a very serious limitation if the structure/class is being updated in a multi-threaded environment.
- tom_ a day ago
  In terms of the underlying bit assingments, bitfields have been pretty safe for a very long time now, as you can assume little-endian. Just a question of bearing in mind the packing rules, not too difficult given stdint.h. I went full bitfield shortly after the time I went full size_t, about 15 years ago now.
  Big-endian systems sort-of still exist (I do indeed have an Amiga, and an Atari STe, and an Xbox 360, and they all work, so they must count) - but they're only slightly less weird from a practical C programming perspective than stuff like the DSPs with CHAR_BIT==24, and Harvard architecture microcontrollers with 2-byte ints. And so you can ignore them. People that use still actually these systems will be used to it.
- ack_complete 21 hours ago
  There are some other annoyances, like not being able to inline initialize a bitfield prior to C++20, and sometimes having to use unnatural typing to get optimal packing depending on the ABI. But I've seen them used, compilers have gotten pretty good at optimizing them and can coalesce writes or tests against adjacent bitfields.
- abnercoimbre a day ago
  Yep. If I need a boolean for a struct I know I’ll want more than one. So I’ll stick an integer type that lets me OR multiple enum values (and so on).
- anonnon 17 hours ago
  I have, e.g., in Julia's UTF-8 library (utf8proc).
Kilonzus 6 hours ago
There seems to be a typo in the 4th paragraph (directly under the struct memory table). I'm counting a padding of 7 bytes. 96 (size of struct) - 89 (size of primitive types) = 7. The first padding consist of 3 bytes to correctly align the 4 byte int after a 1 byte bool. Second padding is 2 bytes due to a 4 bytes int following two 1 byte bools and third padding is 2 bytes as described in the blog, 3+2+2=7 bytes of padding.
bhawks 15 hours ago
Optimizing struct layouts is cool. but if your motivation is to pack more monsters in your game definitely consider transposing your row oriented monster struct into a column oriented entity using Entity Component System.
Example framework:
https://github.com/skypjack/entt?tab=readme-ov-file
It has benefits besides memory/cache optimizations. It organizes logic in a much more composable way that is friendly for reuse.
herf a day ago
Sometimes struct-of-arrays (in-memory column store) is the right solution, and you get automatic packing that way. C makes it so much easier to store an array of structs that people don't do this often enough, but column stores prove it has its uses.
iamcreasy a day ago
This reminds of an amazing talk by Mike Acton on Data oriented design: https://www.youtube.com/watch?v=rX0ItVEVjHc
And there is a related talk by Andrew Kelley implementing these ideas on Zig toolchain: https://www.youtube.com/watch?v=IroPQ150F6c
- anitil 21 hours ago
  I just watched that Andrew Kelly one yesterday, and really enjoyed it. I think Zig needs to be my next language to play with.
signa11 a day ago
and i raise you this: https://www.catb.org/esr/structure-packing/
- rzzzt a day ago
  Hmm, the certificate for this site has a billion SANs but catb .org is not one of them...
cjensen a day ago
This is good stuff to have "in the toolbox," however habitually organizing structs this way is premature optimization. Much better to sensibly organize the fields and ignore alignment entirely unless you have a really good reason to do this. Good reasons could be that you know you will create a lot of these objects, or it has become a performance issue due to cache misses this would improve.
teo_zero a day ago
Neither TFA nor any comment mention using a struct of arrays instead of an array of structs. This technique alone eliminates all padding bytes!
sim7c00 a day ago
definitely interesting points for C structs. regarding CPU it might be interesting to look a little further still regarding optimal sizes for structs regarding cachcing, figuring out hot and cold areas and either including padding or putting cold values behind pointer to a substruct.
like the examples though, very clear what the changes are and the specific impact. always a good reminder to review for these things after a session hacking in new things!
zzo38computer a day ago
I would expect many C programmers would know these things, because I do like that when writing programs in C, too (although for bit fields, I will use a single field and then named bits by macros). There are some other considerations as well though, such as static vs dynamic allocation, and the requirements of how the specific program is working.
hubadu 13 hours ago
I would suggest to never use unsigned int for values that will involve any sort of calculations on them (health, damage, speed). This is considered bad practice due to overflow and should be avoided, with the exception of values that are used for UIDs, indexes, bitfields, mask and flags.
- ChrisRR 12 hours ago
  Nah it's just whatever is most appropriate for your implementation. Overflow and wrap around is possible whether you use signed or unsigned.
  As I say to some more junior devs, check whether the maths is correct and then do it, don't do it and then check if it was wrong
  hubadu 11 hours ago
  My suggestion is straight from the C++ guidelines by Bjarne Stroustrup and Herb Sutter (ES.102: Use signed types for arithmetic), for reasons that you don't seem to grasp.
uticus a day ago
for another good intro to the fun with structs, Beej's guide is always a good goto:
https://beej.us/guide/bgc/html/split-wide/structs-ii-more-fu...
...goes into offsets, padding bytes, bit-field, unions, etc etc.
- degenoah a day ago
  I feel like this page would have definitely helped me when I used to try reverse engineering things I had lying around on my computer.
TinkersW a day ago
Could improve it by using enum MonsterName :u8, moving it down near the other byte sized item, and specifying the exact type of the bitfield(u8). Then it would be 16 bytes.
jesse__ a day ago
Minor nitpick, 32bit max value is ~4 billion, not 4 million
fattah25 21 hours ago
Remind me about Andrew Kelley conf "data-oriented design" He had explained about this topic.
bullen a day ago
Adding to this you want the struct to be exactly 64 bytes, even on Apple MX processors which have 128 bytes of cache.
warmwaffles a day ago
This is also applicable outside of C as well.
- mcraiha a day ago
  "The x_position and y_position can unfortunately not get a smaller type nor can a float be unsigned." If you have latest C++, you can use float16_t or bfloat16_t to get smaller type https://en.cppreference.com/w/cpp/types/floating-point.html
  david2ndaccount a day ago
  C23 has _Float16
andrewmcwatters a day ago
The article casually ignores the fact that more production code uses bitmasking than bit fields.
- 1718627440 12 hours ago
  Why, though?
jakey_bakey a day ago
Loved this, thanks!
4gotunameagain a day ago
Unpacked structs are also very useful for parsing network data. You take a package, cast it to an unpacked struct, and can access everything nicely.
Many a spacecraft telemetry stacks work like that.
- rkangel 13 hours ago
  I think you mean "packed" structs. Otherwise you've got padding in your communication protocol.
  If you do this you also need to be careful of byte order.
  4gotunameagain 8 hours ago
  Yeah you're right, I mixed it up with unpadded !
  And you're right about the second point as well, typically there will be network order to cpu order macros to ensure portability
- JonChesterfield 20 hours ago
  The alias analysis pass in your C or C++ compiler will savage you for doing that unless you're careful with annotations or compiler flags though.
  4gotunameagain 8 hours ago
  Yeap, can definitely be a foot gun !
  Even worse, sometimes you won't even get warnings about it, and then scratch your head why the binary seemingly has a mind of its own.. don't ask me how I know !
westurner a day ago
Is it possible to apply these optimizations to Arrow?
Arrow's struct is called StructArray. Fields of StructArray have a StructType.
StructType has .bit_width and .byte_width attrs in Python and probably the other implementations too: https://arrow.apache.org/docs/python/generated/pyarrow.Struc...
Arrow supports bitfields with BooleanArray, and enums with categoricals but
"BUG: Categorical columns using the PyArrow backend requires 4x more memory" (open) https://github.com/pandas-dev/pandas/issues/58062 :
> On disk Parquet appears to store the category data as logical type String which is compressed with snappy and encoded
Arrow Flight RPC handles nested structs with enums over the wire somehow too FWIU