• PaulHoule 6 months ago

    You can write an external memory spell checker with a tiny amount of RAM: something like

       - sort the words in the document
       - eliminate unique words (they sort together)
       - merge the sorted words with the sorted dictionary and keep only the missing words
    
    I saw this in BASIC in Creative Computing and got it working in on my TRS-80 Color Computer which had much less than 32k of available RAM, so that was the first thing I thought when I saw the headline.

    Now this blew people away when it came out

    https://winworldpc.com/product/turbo-lightning/1x

    it had a compressed dictionary that would fit together with the other programs you were running on a PC and spell check as you typed; there was a 640k limit for the PC but it could only use a fraction of that so as not to interfere and in the early days of the PC you couldn't actually afford to fill it out.

    • probably_wrong 6 months ago

      Worth noting that the article mentions this alternative as their first PoC and its drawbacks: "Because of its simplistic implementation, it was not very accurate, and also slow because of dictionary lookups on the disk."

      • cb321 6 months ago

        I have a feeling the algorithm used was not the smart merge mentioned in the grandparent. That "slower" spell was in v6 Unix { https://en.wikipedia.org/wiki/Spell_(Unix) } which came out in 1975 and by 1973 there were already Winchester drives doing 885 kB/s with 25ms seeks { https://en.wikipedia.org/wiki/History_of_IBM_magnetic_disk_d... }. A 250e3 word dictionary with average word length of 10 bytes would have only taken about 2500e3/885e3 = 2.8 seconds to scan and the unique words in most documents in practice would have easily fit in RAM (as mentioned in Doug's 1982 paper). Not great, but not so bad, either. People didn't spell check on every key press for another 10 years. ;-)

        Someone probably could look all this up in the "unified version control of all Unix" history, but the way Doug describes it in the 1982 paper linked in the article, it sounds like the v6 spell did a "doc word at a time loop" instead of "sort all doc words at once and merge against a pre-sorted dictionary", and in fact it sounds like Johnson selected a small dictionary to facilitate that instead of "merging".

        • PaulHoule 6 months ago

          It's easy to compress a sorted dictionary by turning something like

            a
            ab
            abc
          
          to

            a
            1b
            2c
          
          where the prefix is the number of characters shared with the last word (and might be coded as a byte as opposed to a digit like you're thinking there. This would be a simple form of compression to code up in BASIC unlike Huffman or most LZW variants which involve bit twiddling and maintaining tree data structures... Though I do remember writing SQ and USQ [1] in BASIC)

          [1] https://techtinkering.com/articles/compression-and-archiving...

          • cb321 5 months ago

            In fact, I think this smart merge technique would have been faster (and exact, with no approximations!) than what McIlroy 1982 describes on 1975..1980 hardware in spite of not fitting in memory. The end of that article says it took about 65..77 seconds of CPU on a PDP 11/70 to check a 5500 word document.

            Meanwhile, the approach of the below 3 files (which would need only tiny adaptations to 1975 C) should have taken about 4 seconds at 200 kB/sec IO time (see below). This kind of technique was quite common in the 1960s database community also. So, it's a bit weird for the Bell labs guys to not have known about it, and it shows off the C/Unix system just as well. Here is an example implementation:

            delta.c:

                #include <stdio.h>  /* stdin->out: delta encode sorted lines */
                #include <string.h>
                struct { unsigned nPfx:4, nSfx:4; } __attribute__((packed)) nPS;
                
                int main(int ac, char **av) {
                    char word[64], prev[64];
                    int  nPrev = 0;
                    while (fgets(&word[0], sizeof word, stdin)) {
                        int n = strlen(word), nPfx;
                        int m = n < nPrev ? n : nPrev;
                        for (nPfx = 0; nPfx < m; nPfx++)
                            if (prev[nPfx] != word[nPfx]) break;
                        nPS.nPfx = nPfx;
                        nPS.nSfx = n - nPfx - 1;
                        fwrite(&nPS, 1, 1, stdout);
                        fwrite(&word[nPfx], 1, n - nPfx - 1, stdout);
                        nPrev = n - 1;
                        memcpy(prev, word, n - 1);
                    }
                    return 0;
                }
            
            words.c:

                #include <stdio.h> /* Emit all words in stdin, 1 to a line */
                #include <ctype.h> /*~tr -cs A-Za-z \\n|tr A-Z a-z|grep -vE .{16,} */
                
                int main(int ac, char **av) {
                    char wd[15];
                    int  n, c;
                #define PUT fwrite_unlocked(wd,1,n,stdout); putchar_unlocked('\n')
                    while ((c = getchar_unlocked()) >= 0) { /* accum word chs */
                        if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')) {
                            if (n < sizeof wd)          /* make smarter & .. */
                                wd[n++] = tolower(c);   /*..contraction aware. */
                            else
                                fprintf(stderr, "too long: %15.15s...\n", wd);
                        } else if (n > 0) { /* non-word c & have data */
                            PUT; n = 0;     /* put & reset */
                        }
                    }
                    if (n > 0) { PUT; }     /* put any final word */
                    return 0;
                }
            
            notIn.c:

                #include <stdio.h>  /* Emit stdin words not in /tmp/dict */
                #include <string.h> /* Both sources must be delta-encoded */
                struct { unsigned nPfx:4, nSfx:4; } __attribute__((packed)) nPS;
                
                int main(int ac, char **av) {
                    FILE *fI = stdin, *fD = fopen("/tmp/dict", "r");
                    char  wI[16], wD[16];       /* Word buffers; then lens */
                    int   nI=0, nD=0, oI, oD;   /* Flag saying last read was Ok */
                #define GET(T) /* update [wno][ID] with its next word */ \
                    if (o##T = fread(&nPS, 1, sizeof nPS, f##T)==sizeof nPS) { \
                        n##T = nPS.nPfx + nPS.nSfx; \
                        o##T = o##T && fread(&w##T[nPS.nPfx], 1, nPS.nSfx, f##T);}
                #define PUT { fwrite(wI, 1, nI, stdout); putchar('\n'); GET(I) }
                    GET(I); GET(D); /* set theory "not in": ordered merge impl */
                    while (oI && oD) {
                        int c = memcmp(wI, wD, nI < nD ? nI : nD);
                        if (c == 0) c = nI - nD;
                        if (c < 0) PUT                   /* I<D: emit&advance I */
                        else if (c == 0){GET(I);GET(D);} /* I==D: advance both */
                        else GET(D);                     /* I>D: advance D */
                    }
                    while (oI) PUT                       /* flush tail out */
                } /* (echo a;cat $SOWPODS) | delta>/tmp/dict # DE UN-abridged dict
                     words < Input | sort -u | delta | notIn # Extract,DE,Check */
            
            To back up my 4.2 seconds at 200 kB/sec, you can find any SOWPODS dictionary and it compresses to about 837 KiB with that delta.c. 837/200 = 4.185.

            If the rejoinder is "that SOWPODS stuff had/has licensing trouble" then no problemo - just use whatever in house dictionary they used and stemming / auto-suffix junk and use that to synthesize an exact dictionary. Then you can correct it as you go and et voila. In fact, if you wanted to be even faster than this 15..20X faster and then make accuracy-perf trade-offs, then you could probably shrink IO by generating the delta-encoded stream directly from the suffix rules.

            I'd recommend staying exact, though. In this case, it seems a bad algo idea led them to be both inaccurate and slower and the writing was on the wall that hardware was getting faster. And honestly, my SOWPODS dictionary that seems 15X faster may well have better coverage than what they did which means an at-the-time apples to apples might have been 20..25x faster.

            As a kind of data-compression / next optimizations side-note, the 267751 SOWPODS I compressed to 857065 bytes this way can only be Zstd -19'd down to about 588040 bytes. It's all mono-case and with simply a array-of-5-bits encoding, you could honestly get that 857065 down to (857065-267751)5+26775110 bits = 703010 bytes, less than 1.2X bigger than Zstd -19, but only 1.21X better than the more naive encoding above. So, you know, simple delta encoding works like gangbusters on a sorted spelling dictionary, has a very simple set membership algo (as above), and seems like it was at least an order of magnitude faster than what they actually did instead. I'd be a bit surprised if no one pointed this out at the time.

            • cb321 5 months ago

              Actually, @probably_wrong was actually right above. It took a while, but I found Bentley 1986 Programming Pearls Column 13 describes Kernighan & Plaugher 1981 as describing the original v6 spell in more detail as indeed the smart merge technique:

                  concat ..|translit A-Z a-z|translit ^a-z @n|sort|unique|common -2 wds
              
              For the curious, in modern parlance that Unix v6 spell is:

                  tr A-Z a-z | tr -c a-z \\n | sort -u | comm -2 -3 - someDict
              
              So, A) mea culpa & this corrects the record and B) my measurements above suggest that @Paul_Houle's delta encoding idea alone would have given ~4X IO scan reduction on unabridged with no word stemming accuracy trade-offs. { Delta coding is, of course, combinable with word stemming in the same way as McIlroy's ideas since "notIn" is just a set operator. You just would have needed to sort -u after stemming.. maybe a stem.c to replace the two trs. } So, I still don't think you needed large Random Access Memory.
              • cb321 5 months ago

                I was curious if "approximate was even faster than exact" at the time in 1975..78. So, I learned a bit how to get a v7 booting on a SIMH pdp11 simulator. I actually got an ordered merge against @Paul_Houle's delta-against-the-prefix encoded format mention to run faster than the v7 spell. Here is a little transcript:

                    $ time zk
                    Total: 500400000
                    real        2.0 user        2.2 sys         0.1
                    $ echo suw = 1st 5500 sorted-unique words of Tale Of Two Cities
                    $ wc < suw
                       1531   1531   11043
                    $ time comm -23 - sowpodsAIO < suw >/n
                    real     3:04.0 user     2:28.2 sys        35.4
                    $ time ni0 sowpodsAIO.pd < suw.pd > /n
                    real     2:01.0 user     1:48.5 sys        11.4
                    $ time ni1 sowpodsAIO.pd < suw > /n
                    real     1:41.0 user     1:29.1 sys        11.3
                    $ time ni2 sowpodsAIO.pd < suw > /n
                    real     1:26.0 user     1:14.1 sys        11.5
                    $ time ni3 sowpodsAIO.pd < suw > /n
                    real     1:04.0 user       53.2 sys        10.3
                    $ time ni4 sowpodsAIO.pd < suw > /n
                    real     1:04.0 user       53.3 sys        10.1
                    $ time ni5 sowpodsAIO.pd < suw >/n
                    real       45.0 user       34.4 sys        10.1
                    $ time spell < suw > /n
                    /bin/spell: /usr/dict/spellhist: cannot create
                    real       46.0 user        1.4 sys         1.4
                    $ wc answer.spell
                         78     78     604 answer.spell
                    $ wc answer.ni2
                         35     35     225 answer.ni2
                
                zk is Bob Supnik's classic PDP-11 performance test. `comm -23` is the slow part of the v6 Unix spell. `ni0.c` is the same as the above `notIn.c`. `ni1.c` specializes to have stdin be non-prefix-delta but the dictionary still prefix-delta and does the simple optimization of not recomparing already compared prefix bytes. `ni2.c` does the remaining simple (also not English-specific) optimization of skipping over long words part of an "increasing run" like "abbreviate abbreviated abbreviates abbreviating .." on the way to "abide", say. `ni3.c` ditches stdio buffering for the dictionary in favor of its own to play with the buffer size since v7 could only do BUFSIZ=512 bytes. `ni4.c` does the same for the stdin stream and also eliminates the need for a follow-on strlen (like the modern getline/getdelim). Finally, `ni5.c` manually inlines the character reading loop into the macro to get the next dictionary word. In real life, there would probably be 2-4 seconds of IO time added for all but the `comm` variant (which needs 2.7 MB IO not 870 kB).

                For posterity, that ni5.c outperforming v7 spell (not counting IO) in 1979 v7 C is this:

                    #include <stdio.h>  /* Emit stdin words not in /t/dict.pd */
                    extern int errno;   /* ni dict.pd < sort-uWds */
                    #ifndef Z
                    #define Z 2048
                    #endif
                    char buf[Z]; char *b = buf; int n = 0;
                    fillbuf(fd) int fd; {
                        int m = read(fd, buf, Z);
                        if (m > 0) { b = buf + 1; n = m - 1; return buf[0]; }
                        return -1;
                    }
                    #define chGet(fd) (n-- > 0 ? *b++ : fillbuf(fd))
                    char bufI[Z]; char *bI = bufI; int In = 0;
                    filbufI() { /* symbols must be <= 7 chars long */
                        int m = read(0, bufI, Z);
                        if (m > 0) { bI = bufI + 1; In = m - 1; return bufI[0]; }
                        return -1;
                    }
                    #define chGetI (In-- > 0 ? *bI++ : filbufI())
                    readw(wI, max) char *wI; int max; {
                        register int  i;
                        register char c;
                        for (i = 0; i < max; i++) {
                            c = chGetI;
                            if (c != '\n' && c != -1)
                                wI[i] = c;
                            else
                                break;
                        }
                        return i;
                    }
                    main(ac, av) int ac; char **av; {
                        int  fD = open(av[1], 0);
                        char wI[64], wD[16], nPS;   /* Word buffers, lens, then.. */
                        int nI=0, nD=0, oI=1, oD=1; /* ..a flag sez read was Ok. */
                        int m, nS, nP, i;   /* Running num. matching prefix chars */
                        if (!fD){fprintf(stderr, "%s: %d\n", av[1],errno);return 1;}
                    #define DGET    /* Update [wno]D with its next word */ \
                        if ((oD = oD && (nPS = chGet(fD)) != -1)) { \
                            /* pcc needs & 255 here| , but runs ~8% slower */ \
                            nP = ((unsigned)nPS)>> 4; nS = ((unsigned)nPS) & 15; \
                            nD = nP + nS; \
                            for (i = nP; i < nD; i++) wD[i] = chGet(fD); }
                    #define IGET    /* Update [wno]I with its next word */ \
                        m = 0;      /* New inp word => reset cmp to start */ \
                        oI = oI && (nI = readw(wI, sizeof(wI)));
                    #define PUT { fwrite(wI, 1, nI, stdout); putchar('\n'); IGET }
                        IGET DGET   /* Set Theory "not in": ordered merge impl */
                        while (oI && oD) {
                            register int c = 0, lim = nI < nD ? nI : nD;
                            if (nP < m) m = nP;
                            for (/**/; c == 0 && m < lim; m++) c = wI[m] - wD[m];
                            m--;
                            if (c == 0)
                                c = nI - nD;
                            if (c < 0) PUT                  /* I<D: Emit&AdvanceI */
                            else if (c == 0) { IGET DGET }  /* I==D: advance both */
                            else {                          /* I>D: advance D to..*/
                                if (nD < nI) {              /*..a maybe <= word. */
                                    DGET    /* Loop & compare more */
                                } else {    /* Cannot possibly be <= while nP > m */
                                    do { DGET } while (oD && nP > m);
                                } /* The above trickery elides like 95% memcmp */
                            }
                        }
                        while (oI) PUT                      /* flush tail out */
                        return 0;
                    }
                
                It's actually not so bad of a C program, especially if you compare it to Doug McIlroy's A) much larger and B) approximate and C) very English-specific program. While I'm sure there is a little bit of time in pre-processing for v7spell that could be elided, I feel I've confirmed the idea (mentioned in McIlroy1982, actually, at the end of HASHING & before SUPERIMPOSED CODES) that an exact approach could have been fine.

                McIlroy 82 seemed aware of this, mentioning the Morris-Thompson way as [11], but I'm not sure he was aware of the pretty severe trade-off between more compacting data compression ideas and how fast they are to decode. As you can perhaps see from this program the prefix-delta fits very naturally into the actual data flow of the algorithm and even informs how to intelligently skip a lot of data comparison. It might be possible to do that even better - I didn't think too long about that.

                Of course, at some scale dictionary and some scale inputs the loop-over-words will beat whole-document-at-once, but 5500 words (chosen to match a statement in McIlroy82) is not that big. This post alone is about 1250. And 270e3 words is actually a large unabridged dictionary, bigger in fact than the "100-200K words" McIlroy speculated upon at the time.

                Since run time is clearly pretty directly proportional to dictionary size, it's fair to say a 150kw dictionary would have been about twice as fast as v7spell and exact and human-language adaptable (at the 5500 word scale or moving the break-even to a lower 2750-ish word document size). Had this idea actually been used at the time, I imagine there would have been pressure to shrink that 870kB encoded size to fit on a 360K floppy drive which would have been 43% the size (and the time, when rehoused on a Winchester drive, though a 2..3 floppy install would also not have been too crazy). Even in the modern age over at https://github.com/wolfgarbe/SymSpell it seems 82,765 words or 31% the size (and run time) are deemed acceptable. So, either triple the v7 spell speed@5500 or an 1800 word break-even seem feasible. (EDIT: And, of course, the preprocessor could always select which of the two is fastest, if you want, based on how many unique words there are)

      • jandrese 6 months ago

        I guess you really used the fact that most words are repeated to keep the byte count in check? On the old C=64 I had it was a bit of a problem not to blow out the memory with just the text of the document once you started using it for more than a 1 or 2 page paper. Keeping a second sorted copy seems almost luxurious.

        I guess you could save the working copy to disk first, then do the sort, then compare, then reload the working copy. I think the C=64 developers probably avoided that strategy because the disk interface was so damn slow.

        • tczMUFlmoNk 6 months ago

          I may be wrong, but from "external memory" in the description I think the idea is that each of those steps can be done on disk, not RAM. An external merge sort is a pretty standard database primitive, and the other two only require purely sequential access, so are friendly to spinning disks and the like.

          • PaulHoule 6 months ago

            Knuth's Sorting and Searching book talks a lot about that sort of algorithm.

            The old IBM 360 had a 16MB address space at best, but in 1968 there were very few installations anywhere near filling it. This kind of tape

            https://en.wikipedia.org/wiki/9-track_tape

            could have a capacity of 80 MB or so, which is (1) much larger than RAM, and (2) mainly sequential (it could fast forward for a bit and try to find a particular block, but it was pretty slow) Contrast this to the floppy drives in the 70-140 kb range which the 8-bit micros had. Thus there was a lot of literature on external memory algorithms that would work on tapes in the early days -- though similar methods are still of interest today for "big data" as RAM is faster by a lot when you access it sequentially, you want to minimize round trips in distributed systems, etc.

            (It was funny though when I talked to computer scientists around 2004 and was told to forget about external memory algorithms because 'main memory' was the thing; people realized, for instance, that Google Maps could store all the tiles in RAM in half a rack of 1U servers and if utilization was high it was much cheaper than serving the tiles off disk; Hadoop came along and was a blast from the past that itself was obsolete in just a few years)

        • undefined 6 months ago
          [deleted]
          • bongodongobob 6 months ago

            Link seems to be broken.

            • fsckboy 6 months ago

              works4me

          • dalke 6 months ago

            > At this time, Bloom filter was not even called Bloom filter. In his paper, Douglas calls it a “superimposed code scheme”.

            A Bloom filter is a specific type of superimposed code.

            Calvin Mooers developed random (1) superimposed coding in his Master's thesis at MIT back in the 1940s, directly influenced by Shannon's work.

            Bourne's superb 1963 book "Methods of Information Handling" gives details of the mathematics.

            I've no doubt Douglas knew about the broader technique, which, for example, the author of "The Large Data Base File Structure Dilemma" (1975) at http://dx.doi.org/10.1021/ci60001a005 described as "an old technique called super-imposed coding".

            (1) The "random" is an important qualifier because there were superimposed codes predating Mooers, but they were not mathematically interesting or all that practically important.

            • sitkack 6 months ago

              Way too smart for worse is better. Think Worser!

              The main memory bandwidth and the disk bandwidth were about the same, a little of 1MB/s.

              I would have done this in multiple passes (but still used the Bloom Filters, those are cool).

              https://github.com/arnoldrobbins/v10spell

              https://code.google.com/archive/p/unix-spell/

              The original paper is great https://www.semanticscholar.org/paper/Development-of-a-Spell...

              It is hosted on his web page https://www.cs.dartmouth.edu/~doug/

              https://en.wikipedia.org/wiki/Douglas_McIlroy

              If you are a word nerd, you will have found obovate and there, this chart.

              https://upload.wikimedia.org/wikipedia/commons/e/e8/Leaf_mor...

              • LeoPanthera 6 months ago

                I can't remember the name of the product but in the 80s there was a hardware spell checker for the IBM PC. It was a box that connected between your keyboard and your PC, and if you ever typed a string of letters that it did not recognize as a dictionary word, it would beep to let you know.

                • jadamson 6 months ago

                  Xerox PC Type Right

                  https://vintageapple.org/pcworld/pdf/PC_World_8711_November_... page 237 has a review (big PDF warning)

                  • eurleif 6 months ago

                    >You can even set the device to recognize your password and beep if you mistype it, thereby eliminating a bevy of incorrect sign-on messages.

                    Wow, what a thing for a tech publication to suggest doing! Different times.

                • eichin 6 months ago

                  One of the things that got me intrigued by Unix was an early 1980s(ish) Byte article which walked through building a (trivial example, not the "real" one) spell checker out of a split/sort/comm pipeline, something like 7 commands? 8-bit PCs didn't have anything like that, and yet it didn't look like it needed that much sophistication...

                  • MarkSweep 6 months ago

                    One a similar note, there is this period video where Brian Kernighan shows how to construct a spell checker using a UNIX shell one-liner:

                    https://youtu.be/tc4ROCJYbm0?t=4m56s

                  • svat 6 months ago

                    Just finished reading the article finally (thanks!). The crux of it for me was:

                    - They had a "dictionary" of 30000 words, and accepting a ~1/4000 rate of false positives meant that if they hashed each word to a 27-bit string (integer), they could throw away the dictionary and the problem reduces to storing a set of 30000 27-bit strings.

                    - Somewhat surprisingly, information theory tells us that 30000 27-bit strings can be stored using not 27 but just ~13.57 bits per word. I understand the math (it's straightforward: https://www.wolframalpha.com/input?i=log_2%282%5E27+choose+3... ) but it will probably take me a while to stop finding this counterintuitive, as 30000 is so small compared to 2^27 (which is ~134 million) that it is hard to see where the gains come from.

                    - To encode this 30000-sized subset of 27-bit hashes, they used hash differences, which turn out to be geometrically distributed, and a coding scheme tuned for geometrically distributed input (Golomb coding), to actually achieve ~13.6 bits per word.

                    I've tried to think of how one could do better, even in principle and with infinite time, along the lines of “perfect hashing” — maybe there should be a function that will take an alphabetic word, do some transformations on it, and the resulting hash will be easy to verify for being in the good set vs not. But thinking about it a bit more, the fact that we need that false-positive rate (non-dictionary words shouldn't get mapped to anything in the "good" set) requires us to use at least 27 bits for the hash. What they did seems basically theoretically optimal? Or can there exist a way to map each word to a 27-bit integer, such that the good strings are those with values less than 30000, say?

                    • eichin 6 months ago

                      For perspective, in 1983 or so, Grammatik on CP/M ran in under 64k and did "grammar checking" (spell checking, plus a bunch of expert system rules) on an 8-bit system. (It sticks in my memory because of the time spent poking at the really interesting part: that it was so compact because it was in Forth, and there was enough of an outer interpreter in the product that with a little hex editing you could just use it as a Forth interpreter - with a very specialized set of functions preloaded :-)

                      • stevekemp 6 months ago

                        The Wordstar editor I run on my own CP/M system, with 64k of RAM, contains the 2023-byte long "SPELL.COM" spell-checker.

                        I've not decompiled it to see how it works, but it's small and fast, and works well.

                    • Scaevolus 6 months ago

                      I wonder what common typos this misses thanks to the hashing?

                      Related, a contest about compressing the Wordle dictionary: http://golf.horse/wordle/

                      • WaitWaitWha 6 months ago

                        in the mid 80's i ran into something similar. Fast is relative.

                        I had a lot of data, 640KB RAM, 64KB of heap, and 64KB of stack. I had hundreds of megabytes that I had to search extract data from and then combine some of them.

                        I experimented with data index structured into ternary trees. Conceptually it made sense, but implementation-wise the relationships and paths were still too big to keep in 64KB.

                        Instead of compression, I did swapping. I wrote a TSR (think service), a piece of code that would process a chunk of the data, extract the results, store it n the stack, dump the original data, make an interrupt call to the TSR, which in turn destroy the heap, and read in the next chunk from storage, return control to the program, process, combine with stack data, and continue until finished the entire process.

                        Originally this process took about a week for three data entry persons (think about a dozen 3" ring binders filled with tables), and an specialist combining the information. The program completed the work in just a few hours. It was amazingly "fast".

                        This was on a single threaded system.

                        [0] https://en.wikipedia.org/wiki/Terminate-and-stay-resident_pr...

                        • Theodores 6 months ago

                          I can remember using UNIX spell with the '-b' option, because I am British. There were only two language options, and now I want to know what the decision making was behind that, how the code catered for that and where the respective dictionaries came from. Did Australians and New Zealanders use British spelling or American?

                          UNIX spell was the 'ZX81 1K chess' of spelling, and, on home computers, we did not have a lot of spell checking going on until MS Word for Windows 3.1. Before then, in offices, secretaries did the typing with WordPerfect. They were human spell checkers for their respective managers and teams.

                          Meanwhile, at home, with our dot matrix printers and flickery screens, we were winging it with paper dictionaries for all of those early years of computing. I can't remember spell checking as being that important back then as everyone could spell. I was in a school of a thousand and there was only one kid that claimed to be dyslexic, a plausible excuse for not being able to spell. Maybe the 1980s was literacy's golden age with there being a clear start date for the decline in our spelling ability, that being the day UNIX spell was written.

                          I like to play Scrabble. Although a very different problem to spell checking, the process shares some steps with UNIX spell. Common word prefixes and suffixes are identified and bolted together in the rack or on the board with other components. Then a Scrabble dictionary is a bit like UNIX spell as it is just a big dictionary of words with no meanings provided. All that matters is whether a given word is in the book or not. It also has a few special look up tables such as the 102 two letter words.

                          • MatthiasWandel 6 months ago

                            I remember spell checking my essays for high school on the commodore 64, using Paperclip 64, in 1984, Before there was ANY Microsoft windows. Spell check took a few minutes, because it read the dictionary from disk as it checked, and after that you could go thru all the words that it couldn't match.

                            • Theodores 6 months ago

                              Was your copy of Paperclip 64 pirated?

                          • Koshkin 6 months ago

                            64kB was huge. The first version of UNIX needed 24kB (half of which was taken by the kernel).

                            • daantje 6 months ago

                              Reading about this and similar techniques in Programming Pearls (Second Edition) by Jon Bentley left the younger me spellbound. Similar to the evolution of linkers up to mold.

                            • msephton 6 months ago

                              How about 39kB for a video game with physics, dynamic graphics, two music tracks, sound effects, online high scores, and built-in instructions? https://news.ycombinator.com/item?id=38372936

                              • teamonkey 6 months ago

                                The original Elite was 22k on tape running on a machine with 32k of RAM

                                • GuB-42 6 months ago

                                  Sizecoding is a thing. .kkrieger is a rather famous 96kB FPS game. There is even an entire demoparty called Lovebyte that is dedicated to it, the biggest category is 1k, but all demoscene events I can think of have a sizecoding competition of some kind.

                                  And it is a completely different thing. In general, it is more about procedural generation and tricks then good packing. Runtime packers are used, like crinkler and kkrunchy, but actually they use a lot of RAM, like hundreds of MB, which is a bit surprising considering that the decompressed executable is in the tens of kB. But that's because they use very powerful but slow compression algorithms.

                                  Sizecoding usually doesn't care about RAM, unless the platform requires it, the only think that matters is the size of the executable file and its data. For that 39kB Playdate game, I guess that's the same idea. The Playdate has 16MB of RAM, I bet the game took full advantage of it.

                                  • fallat 6 months ago

                                    Not the same when the machines in question are not on the same level...

                                  • vandyswa 6 months ago

                                    This spell check must have come later. The original spell check burst the words of the document, and sort -u'd them:

                                    makewords sentence | lowercase | sort | unique | mismatch

                                    • 1vuio0pswjnm7 6 months ago

                                      "In order to secure funding for Unix, Ken Thompson and Dennis Ritchie pitched Unix as a text processing system for the patents department to AT&T. Naturally, a text processing system needed a spell checker as well."

                                      I still use UNIX every day primarily for text processing.

                                      • bell-cot 6 months ago

                                        Ah, the Good Old Days. Back when a bit of memory cost more than most butter churns, and the ultimate data transfer technology was a sharp Express rider with his saddlebags full of punch cards...

                                        Back in the early 1980's, I recall futzing around a bit with dictionary compression algorithms. Very interesting - but it didn't take me long to conclude that outperforming `spell` wouldn't be easy. Nor worth the effort.

                                        • noja 6 months ago

                                          B is bytes. b is bits.

                                          • MarkusWandel 6 months ago

                                            Marginally related... has anyone ever ported the "typo" program to modern C?

                                          • theodric 6 months ago

                                            > "...developed a compression algorithm that came within 0.03 bits of the theoretical limit of possible compression."

                                            Since when can we subdivide bits!?

                                            • fortran77 6 months ago

                                              I had spelling checkers on the Apple ][ that ran in 48K!

                                              • jhbadger 6 months ago

                                                Exactly. 64K only seems tiny to people who didn't use home computers in the 1980s.

                                                • dahart 6 months ago

                                                  Do you know how many words it had, how accurate it was, and whether it used the disk for lookup? Do you know if it was a port of unix spell?

                                                • paulg2222 6 months ago

                                                  Cool. Now move forward.

                                                  • hulitu 6 months ago

                                                    > Cool. Now move forward.

                                                    They are trying, but only seem to go backwards. See Windows, Android, iOS, Teams for examples.

                                                  • ronjakoi 6 months ago

                                                    > But 27-bit hash codes were too big: with 2^15 words, they needed 2^15 * 27 bits of memory, while the PDP-11 had only 2^15 * 16 bits (64kB) of RAM—compression was essential.

                                                    I'm frustrated when people put this kind of typography on the web. HTML can do superscript.

                                                    • hnlmorg 6 months ago

                                                      Then you should enjoy this article because nearly all the expressions are presented as proper mathematical equations bar the few places where those expressions are pseudocode