Comments Page - Smallest Possible Files

« Back Smallest Possible Filesgithub.comSubmitted by yread 4 days ago

arexxbifs a day ago
The 42 byte transparent GIF saw ample use in web development a quarter century ago, when it was used to create pixel perfect <table> layouts. Some things have changed for the better.
https://x42.com/test/gifdot.shtml?abcdef
- JimDabell a day ago
  The smallest GIF is still useful because it is the smallest possible valid favicon. This means you can stuff it into a data: URI to prevent useless requests showing up when you are working on something:
  <link rel="icon" href="data:image/gif;base64,R0lGODlhAQABAAAAADs=">
  zamadatix a day ago
  If you're just wanting to shut the request up and aren't actually trying to display a certain favicon you can do:
  <link rel=icon href=data:>
  With the bonus you've probably already remembered how to reconstruct this on demand just by reading this comment. It is "invalid" data but so is your example on Safari and Firefox instead of Chromium based browsers. It doesn't matter as much because that problem is local and silent in the logs, unlike the request.
  JimDabell a day ago
  Thanks! I’m pretty sure I tried this ages ago and it didn’t work at the time, but I tried this again now and it does the job.
  zamadatix 16 hours ago
  The key is to keep up through "data:" since any shorter (even just dropping the ":") and it gets treated a relative link instead.
  vbezhenar a day ago
  You can also make an actually useful and readable SVG favicon this way:
  <link rel="shortcut icon" href='data:image/svg+xml,%3csvg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">%3ccircle cx="25" cy="50" r="20"/>%3ccircle cx="75" cy="50" r="20"/>%3c/svg>' />
  JimDabell a day ago
  Good to know! My goal is simply to stop a 404 popping up during development in the simplest way possible, so the smallest amount of code is best for me.
- gudzpoz a day ago
  A use case: https://news.ycombinator.com/s.gif (43 bytes) (use for comment indentation)
  rollcat a day ago
  It's kinda cool than HN looks OK even in simple browsers like Dillo:
  <https://imgur.com/a/Seu8rYT>
  However it's pretty bad on narrow screens. I wish there was some progressive enhancement via modern CSS, or at least just dark mode.
ayaros a day ago
Reminds me of https://github.com/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee/eeeeeeee...
- adzm 20 hours ago
  I really appreciate the .gitignore file there https://github.com/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee/eeeeeeee...
- DaSHacka a day ago
  I love how even though the entire repo is essentially a shitpost, it still uses a CoC.
  You know, to ensure cordiality in any of the various riveting PRs and discussions.
- rollcat a day ago
  Kinda. Empty files for so many languages, it would be interesting to see at least an exit(0) or so.
- vitorfrois 21 hours ago
  yes what about the biggest possible files
  jerf 20 hours ago
  Many of them are infinite, so you'd have to provide them as functions rather than files. There's obvious ones like plain text, but some less obvious ones, like, PNGs are defined as a series of chunks, but there's no chunk count in the header, so you can keep appending chunks forever: https://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html
  This sort of thing is not just a funny question, it's something you think about when you're writing scanners. For instance, another "biggest possible file" is the zip file that decompresses to itself[1], which is in some sense also an infinite file. Many a scanner has been written that will fill the disk then crash if presented with that file, which is actually more pathological behavior than would be experienced if the scanner isn't there.
  [1]: https://research.swtch.com/zip
user32489318 a day ago
Reminded me of a major “data”/“AI” platform that stripped all empty files when deploying the code. Because of “security” you were not allowed to list files on the deployed instance, nor review the deployment pipeline code or logs (“it just a works/batteries included”).
The most brilliant way to screw all Python developers I’ve ever seen.
Later learnt that the docker container run the code as root, so basically you could destroy the platform from within. Good times.
- RainyDayTmrw 6 hours ago
  For context, this is because Python uses __init__.py files to indicate which directories are modules. They can contain contents, but quite often are empty placeholders with meaning. Removing those would make the corresponding Python modules invalid and invisible to the Python module loader.
ks2048 19 hours ago
Pretty cool. But as everyone is pointing out, empty files aren't that interesting. 31/137.
```
    $ find . -name ".git" -prune -o -name "README.md" -prune -o -type f -print | wc -l
    137
    $ find . -name ".git" -prune -o -name "README.md" -prune -o -type f -empty -print | wc -l
    31
```
I suppose if you wanted minimal, non-empty examples, you'd end up with a "hello, world" collection, of which there are many, but nice that this handles file formats as well as programming languages.
- aidenn0 18 hours ago
  The traditional minimal bourne-like shell script has a single ":" in it. This is because, when looking at an executable[1], bourne-alikes may try to detect if the file is binary to prevent executing a binary file. I don't know for a fact that some sh implementations will refuse to execute an empty file, but it seems likely.
  1: If you try to run a program binary from a bourne-like shell and execl() signals ENOEXEC, then (if it believes it to be a text file) it will try to run it as a shell script; this makes shebangs optional for programs executed only from a shell. You can try it yourself (tested on bash, dash, ksh, fish, zsh, and osh):
  $ echo 'echo hi' > foo.sh $ chmod +x foo.sh $ ./foo.sh
LegionMammal978 19 hours ago
Some of these files are very much nonstandard, even when the standard leaves no leeway (unlike HTML). E.g., every PDF standard requires an %%EOF, startxref offset, and an xref table (or an xref stream in the later versions), but this PDF file is missing those, among other oddities, like the page object missing a /Type and /MediaBox. Too bad the author doesn't specify which implementation these are supposed to work in.

dmd 13 hours ago

For people who enjoy this sort of thing, vaguely related is this puzzle: https://dmd.3e.org/a-shell-puzzle/

xelxebar 8 hours ago

Oh, you're the author! I didn't notice and sent you an email, but will repost here:

    $ for i in 3 4 5; do f=puzzle.$i; echo $f: $(head -1 $f | wc -c); tail -$((i-1)) $f; ./$f; done
    puzzle.3: 1
    futz
    futz
    ./puzzle.3: line 3: futz: command not found
    puzzle.4: 1
    futz
    futz
    futz
    ./puzzle.4: line 4: futz: command not found
    puzzle.5: 1
    futz
    futz
    futz
    futz
    ./puzzle.5: line 5: futz: command not found

Does this count?

RandallBrown a day ago
There must be some interesting code golf stuff hidden in here, but it seems like it's mostly empty files.
- JimDabell a day ago
  The linked blog post about the smallest possible valid (X)HTML documents is noteworthy, if only for the fact that a surprising amount of people adamantly refuse to believe that they are valid. Even when you think you have gotten through to them with specifications and validators, a lot of people will still think “yeah, but it’s relying on error handling though”. I’m not sure why “HTML explicitly permits this” will not be tolerated as a thought and somehow transforms into “HTML doesn’t permit this but browsers are lenient”. It’s a remarkably unshakeable position. And even the people who are eventually convinced that it’s valid still think that it is technically incorrect in some unspecified way.
  jerf 20 hours ago
  "if only for the fact that a surprising amount of people adamantly refuse to believe that they are valid... And even the people who are eventually convinced that it’s valid still think that it is technically incorrect in some unspecified way."
  Speaking from my personal experience, if your idea of "valid HTML" was created in the late 1990s or early 2000s, it's worth a spin through the current HTML standard. HTML has always de facto been permissive, but de jure it had certain requirements. However, HTML 5 essentially works by reifying a very, very well-specified algorithm for how to handle HTML "loosely" (even though it is very strictly specified), and then refactors away effectively every requirement it possibly can and defers them to that algorithm instead.
  Technically speaking, as long as you put down the correct doctype, you can elide almost anything nowadays and get a functional document; for instance, "<!DOCTYPE html><title>Hello</title>" is fully standards compliant now (push it through [1]). Only thing the validator gives is a warning that you might like to specify a language in the doctype. It isn't just "browsers will pretty much do the 'right thing'" with that, which has been true for a long time... that's actually standards-compliant HTML now.
  What a lot of old hands don't understand is that HTML 5 was a seismic shift in how HTML is specified. Instead of specifying a rigid language and then pretending the world is complying and it's super naughty of them not to, it defines a standard for extracting a DOM tree from effectively any soup of characters you can throw at it, compliance is loosened as much as is practical, and even when things don't comply there's a specification on exactly how to pick up the pieces. HTML 5 has a completely different philosophy than HTML 4 and before.
  (Relatedly, the answer to the frequently-asked question "What is the BeautifulSoup equivalent for $LANGUAGE", at least as far as parsing, is effectively now "Find an HTML 5-compliant parser", which they all have now. Beautiful Soup's parsing philosophy was enshrined into the standard.)
  [1]: https://validator.w3.org/nu/#textarea
  JimDabell 6 hours ago
  It’s fair to point out the big difference in parsing philosophy between HTML 2–4 and HTML 5, but what I’m talking about happened before HTML5 as well. Some people can’t handle the fact that HTML intentionally has implied elements.
  > <!DOCTYPE html><title>Hello</title>" is fully standards compliant now
  Sure, but switch the doctype and put a <p> on the end, and it’s fully standards compliant HTML 4.01 Strict too. And yet so many people are adamant that it can’t be. That it’s invalid (even though a validator says it’s valid). That it’s relying on error handling (even though the spec. says otherwise). That some browsers parse it wrong (but they can never name one). That the DOM ends up broken (when browser dev tools show a normal DOM). That you need <html> and <body> elements (even though it already has both). That there’s something wrong with it at a technical level (even though they cannot describe what).
  The concept “This is correct HTML that works everywhere with no error handling” is very difficult for some people to grasp, to a genuinely surprising degree.
  currysausage a day ago
  This is especially ironic, considering the same people will gladly use XML syntax and serve it as text/html. Historically, this has only worked because no relevant browser has ever implemented SGML (and NET [1], in particular), as required by HTML standards up to version 4 [2].
  [1] https://en.wikipedia.org/wiki/Standard_Generalized_Markup_La...
  [2] https://www.w3.org/TR/html401/conform.html#h-4.2
  myfonj a day ago
  > Historically, […] no relevant browser has ever implemented SGML […] NET
  I can probably confirm that "relevant" part of this claim for the times spanning from the first decade of 2000s, but I still desperately (in a way) seek information whether ANY even niche and obscure application that consumed "HTML" treated the NET as specified back then. I am quite certain W3C Validator did (that Mathias' article proves that, after all) and that Amaya might have do that, since it was a reference implementation from the same spec body, IIRC, but cannot swear on that.
  Have anybody here have a clearer recollection of that times, or even some evidence?
  I still find it strange such feature had such prominent space in the specs back then, but practically nowhere else.
  JimDabell a day ago
  EMACS/W3 originally supported SHORTTAG NET but was “fixed” to remove support. In practical terms, mainstream browsers couldn’t afford to parse SHORTTAG NET properly because it was very common to leave attribute values unquoted. You can leave some values unquoted, but not ones with slashes in. So the very common error <a href=http://xn--rvg would not get parsed as the author expected if SHORTTAG NET was enabled.
  This is the earliest reference I could locate easily, from the www-html mailing list:
  https://lists.w3.org/Archives/Public/www-html/2002Nov/0057.h...
  You’ll be able to find more if you go trawling through USENET archives of places like comp.infosystems.www.authoring.html from 25–30 years ago, but it was a fairly niche subject even back then.
  I think there were a couple of other niche tools that supported it, but I don’t remember the details after all this time.
  JimDabell a day ago
  I believe this is the exact change where support for SHORTTAG NET was removed from EMACS/W3 in order to support XHTML better:
  https://github.com/emacsmirror/w3/commit/68af7c107dcbe194e30...
  myfonj a day ago
  Thanks! That's actually really valuable insight and seems to be a promising start for a interesting investigation
  I'd even say that from a glance, EMACS ("W3" browser in it) seems like possibly hugely relevant application, actually. Will look into it.
  JimDabell a day ago
  If you really want to, you could check out Evolt’s browser archive:
  https://browsers.evolt.org
  It‘s got over a hundred ancient web browsers. I suspect none of them support SHORTTAG NET though.
  myfonj 21 hours ago
  Good idea. I remember I have done some research about this in the past when I tried to trace historical arguments for the infamous "should there be a space before slash in void tags for the best compatibility"
  <br/> vs <br /> (vs <br>)
  discussion, but didn't get much far then (https://stackoverflow.com/a/30880386/540955).
  JimDabell a day ago
  That’s not quite the whole story. Appendix C of the XHTML 1.0 specification provides HTML compatibility guidelines:
  > This appendix summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents.
  — https://www.w3.org/TR/xhtml1/#guidelines
  And RFC 2854, which defines the text/html media type, explicitly states this is permissible to label as text/html:
  > The text/html media type is now defined by W3C Recommendations; the latest published version is [HTML401]. In addition, [XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01 and which may also be labeled as text/html.
  — https://datatracker.ietf.org/doc/html/rfc2854#section-2
  However even browsers that support XHTML rendering use their HTML parser for XHTML 1.0 documents served as text/html, even though they should really be parsing them as XHTML 1.0.
  But yes, that extra slash means something entirely different to the SGML formulation of HTML (HTML 2.0 to HTML 4.01). HTML5 ditched SGML though, so SHORTTAG NET is no longer a thing.
  currysausage 19 hours ago
  I believe the sentence from the RFC:
  [XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01
  is technically incorrect. While the XHTML 1 compatibility profile was compatible with HTML 4 as implemented by major browsers, that wasn't actually HTML 4. HTML 4 is based on SGML, while what was implemented was a combination of HTML 4 semantics with the tagsoup parsing rules that browsers organically developed. These rules were only later formalized as part of HTML 5.
  The compatibility guidelines do recommend a space between <br and />, but (at least according to https://validator.w3.org/ in HTML 4 mode) this doesn't change anything about <br /> being a NET-enabling start-tag <br /, followed by a greather-than sign.
  Enter this:
  <h1>Hello<br />world</h1>
  and select "Validate HTML fragment", "HTML 4.01", and "Show Outline". This is the result:
  [H1] Hello>world
  (Obviously nitpicking, but that's my point: the nitpickers can be out-nitpicked.)
  JimDabell 6 hours ago
  Haha yes. Appendix C gave compatibility guidelines, but you are right that doesn’t actually result in documents that could be parsed by a parser that implemented SHORTTAG NET.
  Elsewhere in the thread, I posted an example of SHORTTAG NET being removed from a browser to enable parsing of XHTML documents:
  https://github.com/emacsmirror/w3/commit/68af7c107dcbe194e30...
  Nevertheless, the text/html RFC explicitly condones Appendix C, so despite it not being fully reflective of reality, it’s still permissible to use text/html to label XHTML 1.0 documents that follow Appendix C :D
- arexxbifs a day ago
  The Python, Perl, Lua, etc. files are arguably valid quines.
- eru a day ago
  For eg the C examples, it depends a lot on which compiler you are using (and implicitly then also on which standard).
Wowfunhappy 20 hours ago
...I feel like completely empty files shouldn't be allowed. Like, I realize the Python interpreter won't error if you feed it an empty file, but how can you really say that empty file represents a Python script if there is no script there?
However, I can't put my finger on what the correct rule would be.
- ks2048 19 hours ago
  I guess if you can run `python myfile.py` and it finishes with without error (return code 0), you could consider it valid.
  By that measure, there are also 1 byte valid Python programs (e.g. "1").
  Wowfunhappy 11 hours ago
  But (at least for Python) that test also works on empty (0 byte) files, which is presumably why the repository says an empty file is the smallest possible Python program, but which feels wrong to me somehow.
  ks2048 8 hours ago
  Yes, that was my point. And thus “also” for 1 byte programs
chasing 18 hours ago
Okay, but what about the largest possible files?
afeezco 20 hours ago
[flagged]
nivertech 20 hours ago
File size of -∞ is the smallest
- jotux 20 hours ago
  Not if the file size is -∞ - 1.