Comments Page - Losing language features: some stories about disjoint unions

« Back Losing language features: some stories about disjoint unionsgraydon2.dreamwidth.orgSubmitted by Bogdanp 4 days ago

MarkusQ 15 hours ago
Back in the day, when memory wasn't as cheap as it is now, there was a strong belief that forcing the user to "waste bits" on a proper sum type was a non-starter for a "real" language. It was widely assumed that the reason you were "sharing memory" between two fields was to conserve space, because you were clever enough to have recognized that they couldn't both be used at the same time. But doing so, it was generally assumed, meant that you were space constrained and so anything that took away your precious savings was bad.
I'm not saying this was "right" in any sense, but it wasn't just foolish old timers not recognizing that a "better" solution was possible. When you grew up having every single bit of memory threaded by hand (and costing macroscopic amounts of money), you think about memory efficiency differently.
- adrian_b 6 hours ago
  This kind of unsafe union has already been present in the first FORTRAN version from 1956, as the statement EQUIVALENCE.
  Presumably this was inspired by existing practices in the assembly-language programs.
  Many early programming languages have followed the example of FORTRAN.
  In October 1964, John McCarthy (which had not only been leading the development of the first LISP, but he had also been a major contributor to ALGOL) has proposed the keyword "union" and he had advocated for safer implementations of disjoint union types. The first such implementation has been in ALGOL 68, a few years later.
  Unfortunately, the C language has taken the "union" keyword from ALGOL 68, but instead of implementing proper disjoint union types, the C "union" has been made essentially identical with the FORTRAN EQUIVALENCE from 1956.
  The "variant record" of Pascal, was not really better, so that can also be counted as a failure to implement proper union types.
  For a long time Pascal and C have been among the most popular programming languages, creating bad habits in the use of unions.
  nyrikki 13 minutes ago
  > Unfortunately, the C language has taken the "union" keyword from ALGOL 68, but instead of implementing proper disjoint union types, the C "union" has been made essentially identical with the FORTRAN EQUIVALENCE from 1956.
  Many word address machines, explicitly the PDP-7, it was only the instructions that changed, with ADD being one's complement and YAD being two's
  Remember we only got B/C because the PDP-7 didn't have enough ram to implement FORTRAN.
  Similar reasons C case switch falls through, the PDP-7 SAD instruction forced it. Then they abused that fall through to support lower case on the PDP-11, which would have been powerful enough for the type of union you are talking about.
  Midas/Macro assembly is tangential related but it is really a side effect of word addressable, accumulator/index machines.
  IIRC, Lisp is a good example for the difference between equality by value, reference, or even predicated equality.
  If you want to think about just how limited the PDP-7 was look at the instructions
  www.bitsavers.org/pdf/dec/pdp7/PDP7_Instruction_list.pdf
  pklausler an hour ago
  `EQUIVALENCE` was (and is) a storage overlay of variables. It's not part of the type system, and one cannot define a derived type in standard Fortran that contains components with overlaid storage.
- cbsmith 14 hours ago
  Yeah, I remember using untagged unions because the code "knew" which type was the right one to use at the right time and you wanted to save space (and wanted your structures to fit well in registers --and later cache lines).
  setr 12 hours ago
  I suppose that’s what cobol copybooks are; untagged unions, everywhere, all the time, all at once
- rwmj 5 hours ago
  struct page in Linux is this taken to its logical conclusion.
  https://github.com/torvalds/linux/blob/89be9a83ccf1f88522317...
  Edit: A bit of background: https://lwn.net/Articles/565097/
  tux3 5 hours ago
  Fun fact: With the advent of folios, struct page is starting to undergo some major changes, with the eventual goal of shrinking it into a single 64bit number.
  But not to worry, the unreadable mess of C unions is not going away. struct folio will eventually absorb all those fields, and more. The only difference is there's a single folio for a whole set of pages, so moving the data there will save a significant amount of memory.
  https://github.com/torvalds/linux/blob/89be9a83ccf1f88522317...
- NooneAtAll3 10 hours ago
  looking at VSCode and browsers eating RAM as if it's nothing makes me think modern approach isn't that good either
  eru 10 hours ago
  Maybe, but it's not sumtypes that are to blame.
jchw 15 hours ago
Lack of sum types is probably one of the worst things about working in Go, and I think it is a much bigger problem than the lack of generics ever was. Sadly, though, I don't think you can really just bolt sum types onto an already complete programming language design.
- jerf 2 hours ago
  The major problem with "bolting on" sum types to Go is that it is way too redundant with interfaces and the interactions get weird, because this already works:
  type Color interface { isColor() } type Red struct {} func (r Red) isColor() {} type Green struct {} func (g Green) isColor() {} type RGB struct { Red byte Green byte Blue byte } func (r RGB) isColor() {} func PrintColor(color Color, text string) { switch c := color.(type) { case Red: // print in red here case Green: // print in green here case RGB: // handle the RGB } }
  This doesn't prevent the zero value of a Color being 'nil' but it does make "a value of type Color" effectively a sum type because you can't do anything useful to it without unpacking it in a type-aware way. There is no way in Go to have a Red but act like you have an RGB, values always have their types.
  (There is a common misconception among Go programmers that this doesn't seal the type because you can implement a "isColor" method on your own types in other packages, but it won't work. Try it if you want to be sure.)
  One could argue they've effectively already been bolted on, all that is missing is some syntax gloss like
  sum Color { Red{} Green{} RGB{ Red byte Green byte Blue byte } }
  that would simply unsugar to the above.
  jchw 18 minutes ago
  Yes, this is all true. However, this approach has more downsides than just nil and cumbersome syntax imo. It takes more memory and pointer-chasing than "proper" sum types would require, and while you can "seal" the type, there's no exhaustiveness checking.
- andrewflnr 12 hours ago
  I wonder if a version of OCaml that had a better concurrency story 10-15 years ago could have taken Go's place.
  dvdkon 7 hours ago
  Sadly I don't think so, just because it looks "too functional".
  If comments on the web are anything to go by, being familiar to C and JS programmers is one of the main reasons for Go's success. I think it has plenty of its own specifics, it's not like e.g. C# which really did start as a straightforward Java clone, but OCaml has even more differences.
  pjmlp 6 hours ago
  Go only enjoys widespread adoption thanks to Docker and Kubernetes success in the industry, had it not been the case, we would be talking about it just like we talk about Limbo and Oberon-2, its influences.
  The kind of folks that dropped Python and Java for Go, would not have picked up OCaml even if it had a better concurrency story 10-15 years ago.
  andrewflnr 2 hours ago
  I guess part of the hypothetical is, "a version of OCaml supported and pushed by Google". But the same idea probably applies, Rob Pike et al (he was involved right?) probably never go for a functional language, even a relatively pragmatic one.
  pjmlp an hour ago
  That would be Kotlin, Google's C#, given how history goes.
  I has enough ML influence on it, as a simplified Scala, and also can compile to native in various ways.
  On the other hand you will notice that the only thing Android team only cared about Go, was the Soong build syste, and parts of the GPU debugger.
  noelwelsh 8 hours ago
  I believe it could have, and OCaml didn't because of some bad predictions by the core team (when asked about better concurrency, the response was along the lines that we'll all have tens of CPUs soon, so better in process support wasn't necessary), and the restrictions of academia (there is little value given to engineering work).
  jchw 9 hours ago
  Maybe, but I think focusing on concurrency is the wrong idea.
  I think Go + sum types could be good. Maybe. But, honestly, it's hard to say for sure. First-order effects are great: we have sum types and can use them to model problems. Second-order effects get muddy: We have sum types and they are the ideal solution for a bunch of problems, but without other features they're awkward to use. For example... now you can do a Result type... but if you want to return multiple values in a Result, you need a set/tuple type too. If you do that, is Go's multiple return values concept still a good idea? I could probably go on but hopefully this illustrates the point.
  I think a lot of people don't acknowledge why Go is truly so successful. Well OK, first elephant in the room, it's obviously successful because it's backed by Google, a company who can and did throw immense resources at making the implementation and standard library pretty damn good as well as helping to push it, but that alone wouldn't have propelled it to where it is today (I mean, compare it to Dart.)
  But beyond that, Go is successful because Go code is very simple to write and leaves the programmer with relatively few things to be concerned about, and yet in spite of that, Go code generally runs reasonably fast, uses relatively small amounts of memory and is generally not very crash-prone. (Some people would happily debate that, but I trust software like restic, esbuild, rclone and Syncthing every day without fail, among other smaller Go programs. I'm OK with making that assertion.)
  If you put in the effort to make really good Rust code, the effort is not wasted, but it is a lot of effort when the stupid Go code often does the trick. Consider Discord's presence service: they switched from Go to Rust and massively cut costs and improved latency. Rust wins, Rust better? But... most services will never scale to the point where marginal improvements in latency, throughput or RAM are going to be worth a lot of man-hours worth of programming and ongoing maintenance. Usually throwing a couple more VMs at the problem is just going to be the path of lesser resistance. This was always the case when comparing lower-level to higher-level, but Go amplifies it because the performance gap isn't as big, but the complexity gap remains very large or maybe gets larger.
  Is writing Rust code really that hard? No, not at all, it's more that writing good Go code is so relatively easy, the language is simple and the standard library is loaded with useful tools. Go's CLI flag parser is pretty unimpressive, but so many projects just need a very basic flag parser and it works totally fine for that, you just don't need to overthink it 99.99% of the time. Same for net/http, the built-in image and compression codecs, TLS stack, and more. Meanwhile, designing and maintaining good high-quality Rust crates is just relatively hard. You've got to worry about async vs sync, various cfg options like nostd and noalloc, and dealing with a lot of wrinkles in the language. Want to use a dyn trait? You'll need to make sure the trait doesn't have any functions with generic parameters or async; makes perfect sense, but adds tension; you want to avoid unnecessary runtime costs in the ideal case but still have the flexibility to use a dyn trait in other cases. Not to mention the borrow checker and how it interacts with a lot of these design choices. Go code is much dumber, but often times it's sufficient.
  And that's the crux of it. Go is simple in a stupid way, rather than an elegant way, but it really makes the most of it. If you want to supplant Go at what it does best, trying to make a better programming language overall may be a bit misguided. It's probably not that hard to come up with a programming language design that is "better" than Go. What's hard, IMO, is coming up with a programming language where the cognitive load increase relative to Go is met with a pay-off that people using Go would consider genuinely worth it. Something like guaranteed data race safety is definitely good enough if it's something someone needs or at least strongly wants for a given use case. Sum types, on the other hand, are very nice thing to have that make modelling data easier and less error-prone, but not having them isn't exactly the end of the world... In Go, people sometimes emulate them with interfaces and type-switches, and it's not fantastic, but it's usually sufficient.
  Ocaml probably could/should be more successful, but I'm not sure it competes with Go, I think they're in entirely different wheelhouses. Ocaml feels like it doesn't want to compete with Go, but then again, I only have relatively surface-level knowledge of Ocaml, so I can't really say for sure.
  noelwelsh 8 hours ago
  I feel OCaml is as easy to write as Go, though perhaps that is not the case for someone with a typical imperative programmer background. I feel a good part of Go's success is that the core language is very similar to Python, which is obviously very widely known.
  Most of the issues you discuss with Rust are not issues in OCaml, as OCaml has GC. The language is simpler, in that programs are concerned with fewer concepts (e.g. no lifetimes), but less expressive in that they cannot formally talk about these concepts (though see the Jane Street work to add in some Rust-like concerns: https://blog.janestreet.com/oxidizing-ocaml-locality/)
  alain_gilbert 8 hours ago
  Check out this toy project I made.
  It's basically a fork of the Go lexer/parser that adds Result/Option/Tuple/Set... propagation operators (and more)
  and it compiles down to Go code.
  https://github.com/alaingilbert/agl
  delifue 8 hours ago
  Go with sum type: Borgo
  https://github.com/borgo-lang/borgo
- eru 10 hours ago
  > Lack of sum types is probably one of the worst things about working in Go, and I think it is a much bigger problem than the lack of generics ever was.
  Maybe, but once you have eg an Option or Either (a.k.a. Result) type, you typically really want to have some functions that work generically on all versions of them. (Though you could probably get away with using Go's version of void*, the empty interface, in a lot of cases?)
  jchw 9 hours ago
  Basically, anything that makes it possible to do an Etiher/Result type blows the whole damn thing up. It calls everything into question, e.g. whether multiple returns really makes sense. It's kind of a shame, because I would really like to be able to model things better (and ideally more efficiently too) when working in Go.
  eru 8 hours ago
  The 'multiple return values' thing is really stupid in any case, even without sum-types: it's like "we have tuples, but only for this one special case."
  Just add first-class tuples to the language.
  dontlaugh 4 hours ago
  It’s an actual liability, since you can’t uniformly compose the results of functions with different numbers of return values.
  aatd86 an hour ago
  I think it could be retroffited but would require a syntax to have a variable marked as being a tuple.
  If we look behind the curtains, go functions somehow return tuples that are automatically unrolled into respective variables, sometimes. That's why if you have f(a, b, c){} and g() (b,c){} you can't do f(a, g()) currently.
  It does not unroll it automatically for some reason here. The result remains in "tuple" form ?
  I think it may leave some space to have tuples if needed.
  In any case, I don't see it as technically infeasible.
  eru 43 minutes ago
  > I think it could be retroffited but would require a syntax to have a variable marked as a being tuple.
  Like Perl's sigils? Or PHP's distinction between 'normal' variables with a $ and variables of function type which have no marking but are case insensitive?
  dontlaugh an hour ago
  Go functions don’t actually semantically return tuples, multiple return is part of the calling convention.
  It can be changed more easily than in other languages because everyone statically links, but it’s not trivial.
  aatd86 22 minutes ago
  They are not first class in the language but I think I remember the compiler having them as a representation for some language objects.
  Maybe https://go.dev/src/go/types/tuple.go
- pjmlp 6 hours ago
  Personally, by now I would already be happy with Pascal like enumerations, instead of the stupid iota/const hack, yes it is an hack.
  type StatusCodes = (Success, Ongoing, Done)
  or if you prefer, while bikeshedding what keyword to use,
  type StatusCodes enumeration (Success, Ongoing, Done)
  Naturally, it is so much better to write
  type StatusCodes int const ( Success StatusCodes = iota Ongoing Done )
ivanjermakov 16 hours ago
I'm surprised how many modern languages lack first-class sum type support, considering the amount of domain use cases for them.
- js8 5 hours ago
  Not only programming languages, SQL as well. Sum types are also kinda awkward to represent in relational algebra (it can be done similar to inheritance, your type constructor will be the ancestor relation and for each data constructor you have a descendant relation - this is because of the categorical duality between sum and product types, foreign key constraints are arrows).
Taniwha 11 hours ago
Wirth was on the Algol68 committee - I'm sure he understood how those sorts of unions worked.
He also avoided a lot of the more advanced features of Algol68, he thought it too complex, when he designed Pascal
- adrian_b 6 hours ago
  I doubt that he understood how unions must work, because the "variant record" he has put in Pascal in 1970 was really bad, worse than even the initial proposal of John McCarthy, while the "union" of ALGOL 68 was pretty decent.
  Implementing correctly disjoint unions, i.e. allowing them to be used only exactly like an enumeration in the variable tested by a select/case/switch statement, and then only as the correct type in each of the alternatives, introduces a little overhead in a compiler, but it is certainly not a serious source of bloat in comparison with most other things that must be done by a compiler.
  If the programming language designer had clearly in mind how disjoint union types must work, they would have been easy to implement even for the minicomputers and microcomputers of the seventies.
  pjmlp 2 hours ago
  Niklaus Wirth eventually started designing for minimalism, while I greatly enjoy some languages designed by him, the ultimate minimalism that kept himself busy in Oberon-07 was clearly not my point of view.
  Even Go v1.0 type system is more advanced than Oberon-07 final form.
  adrian_b an hour ago
  I agree.
  In my opinion, the most important contributions of Niklaus Wirth to programming languages have been in his early work on Euler, PL360 and ALGOL W, which introduced various features that were novel at that time.
  Starting with Pascal in 1970, his programming languages remained reasonably good for teaching programming and how to implement a compiler, due to their simplicity, but all of them were seriously behind contemporaneous languages.
  While Mesa of Xerox was a nice and innovative language, that cannot be said about Modula, Wirth's attempt to reimplement similar features after his sabbatical at Xerox, which was only mediocre.
  On the other hand the early languages of Wirth were very innovative, e.g. Euler was one of the first 2 languages with pointers, the other being CPL. In contrast with CPL which had implicit pointer dereferencing, Euler had explicit address-of and indirection operators, but it got their syntax right, not like in C, where the indirection operator has been mistakenly defined as prefix, instead of postfix.
Buttons840 10 hours ago
> But another thing Muratori points out is that is that Dahl and Nygaard copied the feature in safe working form into Simula, and Stroustrup knew about it and intentionally dropped it from C++, thinking it inferior to the encapsulation you get from inheritance. This is funny! Because of course C already had case #3 above -- completely unchecked/unsafe unions, they only showed up in 1976 C, goodness knows why they decided on that -- and the safe(ish) std::variant type has taken forever to regrow in C++.
This seems like a mistake. At the end of the day, a bunch of code and logic has to be written somewhere, and I think it's better done outside the data object, at least some of the time.
Imagine you have the classic Shape class / interface and someone wants to write some code to determine whether a Shape is happy or sad, based on their synesthesia, what are they suppose to do? I guess just add a happy_or_sad() method to the interface? Like, we're just going to pile--err, I mean, "encapsulate"--every possible thing that can be done with the data into the data object?
The OOP way is probably some Shape class hierarchy with a Shape superclass and a bunch of specific Square, Circle, Triangle, subclasses. So I guess you go and modify a dozen subclasses to add your happy_or_sad() method. And you're definitely going to have to fork the code because nobody wants to upstream your personal feelings about which Shapes are happy or sad.
It's better to have a sum type for your Shape and then everyone can put all their code and logic outside of the Shape itself, and the type system will ensure, at compile time, that no Shape variants have been missed, so refactoring is assisted by the type system.
- taeric an hour ago
  Amusingly (to me, at least), I took a stab a long time ago to show how a visitor could get you most of this. As I state in the post, I make no claim that this is as convenient or powerful. It does do what you basically describe, though. https://taeric.github.io/sum-types.html
  Note instead of my worse Optional class, what you are describing is a Shape class where there is a different function for each supported shape. If you add a new shape, everywhere that made a shape visitor would have to be updated do deal with the new shape. (A sibling post described this.)
  Amusingly, I recall coworkers that did not want to use the "acceptVisitor" method and would force cast to call over to the methods directly. Caught me incredibly off guard.
- eru 10 hours ago
  Inheritance can simulate sum-types. You can also simulate sum-types via what OOP people would probably call a visitor pattern: you hand me call-backs for what to do in all the different cases, and some trusted piece of code that you implement once for the type, does the case distinction and calls the right call-back. (You can either hand over the call-backs as functions / function pointers, or you can implement them as member methods pure OOP style.)
  They are all equivalent in principle, but some of them are a lot more annoying to work with, especially when you want to do a pattern matching over multiple values at the same time, or match on nested patterns.
  pjmlp 2 hours ago
  The way it goes is that sum types are closed for extension, while enjoying the confort of pattern matching, whereas OOP approach is open for extension, with the associated boilerplate in the visitor logic.
  In languages of ML linage, we can combine both approaches with a mix of sum types and functors/type classes.
  eru an hour ago
  > [...] whereas OOP approach is open for extension, with the associated boilerplate in the visitor logic.
  Yes, by default. But doesn't eg Java let you mark classes as final or something like that?
  > In languages of ML linage, we can combine both approaches with a mix of sum types and functors/type classes.
  I think Rust allows something similar, but I'm not sure whether you'd call it 'ML linage'?
  pyrale 3 hours ago
  > You can also simulate sum-types via what OOP people would probably call a visitor pattern
  You can also simulate it by using if/else everywhere.
  At this point, that means you have zero language feature supporting the use case, and type safety is up to the developers implementing patterns correctly everywhere.
  Buttons840 9 hours ago
  If I write my happy_or_sad() callback and pass it to the Shape, it would be nice if I could get some exhaustiveness checking, but like the blog says, C++ and other popular languages intentionally left it out.
  layer8 7 hours ago
  With the visitor pattern, you have to implement an interface with one method for each choice, so (in statically typed languages) that implicitly does an exhaustiveness check, because if you forget to implement one of the interface methods, instantiating your implementing class won’t compile.
  One nice thing about the visitor pattern is that it doesn’t have to match the type hierarchy. For example, you could have a visitor interface method that is invoked for blue shapes, even if there is no BlueShape type. Similarly, the same type hierarchy can support multiple visitor interfaces, so that you can perform different case distinctions on the same value. This is something that sum types can’t do.
  eru 7 hours ago
  > Similarly, the same type hierarchy can support multiple visitor interfaces, so that you can perform different case distinctions on the same value. This is something that sum types can’t do.
  Haskell's pattern synonyms should be able to handle this?
  The problem with the visitor pattern is that it doesn't compose well. So if you want to match on two values at once, or match deeper into a value, that works well for most pattern matching, but is annoying to piece together with the visitor pattern.
  layer8 2 hours ago
  I’m not really familiar with Haskell’s pattern synonyms, but maybe yes. What the visitor pattern gives you is that in principle the implementation structure can be orthogonal to the visitor case distinction. You can have objects encapsulate (and possibly hide) one structure while exposing different structural views via visitor. This also allows to re-arrange the implementation of the alternatives without breaking client code. In functional languages, you’d probably rather have functions converting the source value to a different (sum) type to perform the case distinction on.
  I agree that syntactic sugar for visitor ”matching” would be nice, and it’s something a language could add. The visitor pattern itself doesn’t prevent adding such syntax.
  pyrale 3 hours ago
  A gard on your pattern match would be enough I guess.
  eru an hour ago
  It depends on what you are trying to do. Eg guards can really replace or-patterns.
  eru 8 hours ago
  In what I described, you would need to hand callbacks for all possibilities, because no parameters are optional.
  greener_grass 6 hours ago
  The designer of the type must manually ensure exhaustive checking but then consumers will be forced to do exhaustive matching.
  It's not that bad in practice.
eru 10 hours ago
Alas, this one gives a 403 Forbidden.
https://archive.is/oTbMW works though.
ahaferburg 5 hours ago
C++ is a gun with five triggers, one for each finger that holds the grip. That's what makes it so versatile and powerful.
- voidUpdate 2 hours ago
  The gun is also permanently attached to your leg, pointing downwards
- dontlaugh 4 hours ago
  And most of the triggers blow up the gun, unless you’re pulling multiple in a specific combination. Most combinations also blow up the gun.