• RodgerTheGreat a day ago

    I wrote a blog post some time ago which contrasts the "Schwartzian transform" with the slightly clearer alternative used by most APL-family languages and (in my opinion) an even cleaner approach which is possible in languages with a first-class table type: https://beyondloom.com/blog/rankingoffruits.html

    • inopinatus 20 hours ago

      That was a lovely vignette, thanks. Much earlier in my career I had exactly the sense suggested that “something’s off” about the Schwartzian, even though it became a useful and idiomatic tool at the time. I only really understood the disquiet much later, when working with columnar data, and realising the positive qualities of declaring structs-of-arrays rather than arrays-of-structs, with subsequently excellent mechanical sympathy, and so on from there.

      • throwanem 14 hours ago

        Unfortunately, the only such language to which most workaday devs have access is some dialect of SQL. Fortunately, at least there's that.

      • oofabz 20 hours ago

        I find it interesting that the transform was controversial in the '90s.

        Today, it seems like a normal solution to the problem to me, and the controversy seems silly. I have much experience with the map function from Javascript. It is too simple to be objectionable.

        But in the '90s, I would also have had trouble understanding the transform. Lambdas/closures were unheard of to everyone except Lisp dweebs. Once I figured out what the code was doing, I would be suspicious of its performance and memory consumption. This was 1994! Kilobytes mattered and optimal algorithmic complexity was necessary for anything to be usable. Much safer to use a well understood for loop. I have plenty of experience making those fast, and that's what map() must be doing under the hood anyway.

        But I would have been wrong! map() isn't doing anything superfluous and I can't do it faster myself. The memory consumption of the temporary decorated array is worth it to parse the last word N times instead of N log N times. Lisp is certainly a slow language compared to C, but that's not because of its lambdas!

        • ubercore 16 hours ago

          I believe I first came across this in the early xslt days, when I had a manager that wanted to write all software with an XML requirements spec, then use xslt to simultaneously generate code AND documentations.

          1. It did not work because of course it didn't work 2. This meant that all data output of the system would done in XML and transformed, so I got to know xslt quite well 3. It did give me a fun moment at a conference where a speaker asked "Who knows the Schwartzian transform or has used it?" and only me and my coworkers raised their hands

          • mont_tag a day ago

            Python's key-functions nicely encapsulate the whole process.

            • zippyman55 20 hours ago

              Thank you for posting! I always felt like such a badass in the late 1990s when I used this.

              • Sharlin 14 hours ago

                > Some people want Perl to be accessible at first glance to someone who doesn’t know the language.

                …oh boy.

                In retrospect it doesn’t seem like these people prevailed.

                • ygritte 19 hours ago

                  I'm so grateful for the Lisp translation. Perl is such a write-only language, my hair curls up just from looking at it.

                  • heresie-dabord 14 hours ago

                    > X is such a write-only language, my hair curls up just from looking at it.

                    That would be a powerful language feature, but alas

                    X.isWriteOnly() for X in aLanguagesIdonotUse

                    is merely akin to

                    X.hasTooManyRules() for X in aHumanLanguagesIdonotSpeak

                    • spauldo 17 hours ago

                      To be fair, this isn't what normal Perl usually looks like. Perl allows you to write like this but most people don't. This is more like Duff's Device in that it works and it's clever and you'd can anyone who put it in production code.

                      That's not to say that idiomatic Perl doesn't have its quirks and oddities - Perl idioms do throw off people who aren't used to them. But this is a bit far even for Perl.

                      • dale_glass 17 hours ago

                        I've definitely seen it in production code, but it was a good deal after 1994.

                        By then it was a commonly known technique, so it was a pattern a Perl dev would be expected to recognize.

                      • throwanem 14 hours ago

                        Eh. It's not worse than Ruby, and vice versa.

                      • jiggawatts 16 hours ago

                        I was curious to see what other languages do by default (or not), so I quickly tested a few:

                        C# evaluates the lambda only once per item, in input order: https://dotnetfiddle.net/oyG4Cv

                        Java uses "Comparators", which are called repeatedly: https://www.jdoodle.com/ia/1IO5

                        Rust has sort_by_key() which is encouraging... but also calls the key transformation many times: https://play.rust-lang.org/?version=stable&mode=debug&editio...

                        C++20 with std::range also calls the lambda repeatedly: https://godbolt.org/z/raa1766PG

                        Obviously, the Schwartzian Transform can be manually implemented in any modern language elegantly, but it's interesting that only C# uses it implicitly!

                        PS: The fancy solution would be a temporary "Hashset<T,K>" used to map values to the transformed sort keys, so that the mapping Foo(T t): K would only be called once for each unique input value 't'. That way the transformation could be invoked less than O(n) times!

                        • ynzoqn 14 hours ago

                          Inspired by your behavior, I looked up the Wikipedia article on Schwartzian transform[1]. The article mentioned that Rust not only provides sort_by_key, but also sort_by_cached_key. :D

                          [1]: https://en.wikipedia.org/wiki/Schwartzian_transform

                          • jiggawatts 14 hours ago

                            I keep saying to colleagues that simply knowing that something exists in a library somewhere is 90% of the skill of modern programming.

                            I suspect the profession of software archeology form the Vernor Vinge science fiction novels will soon become a reality, especially now that old languages can be uplifted mechanically with AI.

                            • Sharlin 14 hours ago

                              This one is particularly easy to chance upon, though, by either browsing the docs of slices’ `sort*` methods (because they’re all clumped together on the page), or even simpler, noticing it when you type `foo.sort` and your editor suggests autocompletions.

                          • Sharlin 14 hours ago

                            The reason Rust has a separate `sort_by_cached_key` is, of course, that it doesn’t make sense to cache cheap projections which are the 95% use case.

                            The implementation is somewhat more sophisticated than just the naive transform:

                            > The current implementation is based on instruction-parallel-network sort by Lukas Bergdoll, which combines the fast average case of randomized quicksort with the fast worst case of heapsort, while achieving linear time on fully sorted and reversed inputs. And O(k * log(n)) where k is the number of distinct elements in the input. It leverages superscalar out-of-order execution capabilities commonly found in CPUs, to efficiently perform the operation.

                            > In the worst case, the algorithm allocates temporary storage in a Vec<(K, usize)> the length of the slice.