• wahern 4 hours ago

    From a security perspective, and as a programmer, I've never liked ZIP files precisely because there are two mechanisms to identify the contents, the per-file header and the central directory. When you're defining a format, protocol, or w'ever, ideally there should be a single source of truth, a single valid & useable parse, etc; basically, the structure of the data or process should be intrinsically constraining. There shouldn't be a pathway for multiple implementations to produce different functional results, and ZIP archives are in my mind the archetype for getting this wrong. tar files aren't ideal, but in the abstract (ignoring issues with long file names) they don't have this problem. (tar files don't support random access, either, but better to rely on something suboptimal than something that's fundamentally broken.)

    A similar security problem, though not as fundamentally baked into the format, is MIME parsing. The header section is supposed to be delimited from the body by an empty line (likewise for nested entities). But what if it's not? For better or worse, Sendmail was tolerant of the absence of an empty line and treated as headers everything up to the first line that didn't parse as a header or header continuation.[1] Later systems, like Postfix, originally copied this behavior. But Microsoft Exchange and Outlook are even more tolerant, yet in a much more horrendous way, by parsing as a header anything that looks like a Content-Type or related header immediately after the first empty line. They have similar hacks for other, similar violations. So today, depending on the receiving software, you can send messages that appear differently, including having different attachments. It's a security nightmare!

    I not a Postel's Law hater, but ZIP archives and Microsoft's MIME parsing behaviors are just egregiously wrong and indefensible. And even if you think the Robustness Principle is inherently bad policy, you still have to design your formats, protocols, and systems to be as intrinsically constraining as possible. You can't rely on vendors adhering to a MUST rule in an RFC, unless it's unquestioningly crystal clear what the repercussions will be--everybody else will (because it's the natural and convenient thing to do) reject your output as trash and drop it on the floor immediately so violations never have a chance to get a foothold.

    [1] MTAs don't necessarily need to care about MIME parsing, but Sendmail eventually gained features where parsing message contents mattered, setting the de facto norm (for those paying attention) until Microsoft came along.

    • kevin_thibedeau an hour ago

      The central directory allows zip archives to be split across multiple files on separate media without needing to read them all in for selective extraction. Not particularly useful today but invaluable in the sneakernet era with floppies.

      • canucker2016 37 minutes ago

        I don't think you understand the reason for the ZIP archive file design.

        Back in the late 1980s, backup media for consumers was limited to mostly floppy disks, some users had tape/another hard disk.

        Say you had a variable number of files to compress and write out to a ZIP archive.

        IF you write out the central directory first, followed by all the individually possibly compressed and/or encrypted files, you'd have to calculate all the files to be archived, process them (compress and/or encrypt), write them out, then go back and update the info for the actual compressed values and offsets for the ZIP local entries.

        Now if you wanted to add files to the ZIP archive, the central directory will grow and push the following individual compressed/encrypted files further out and you'll have to update ALL the central directory entries since each entry includes an offset from the beginning of the disk - if the archive does not span multiple disks, this offset is from the start of the ZIP archive file.

        So that's one reason for why the ZIP central directory is placed at the end of the ZIP archive file. If you're streaming the output from a ZIP program, then placing the ZIP central dir at the start of the file is a non-starter since you can't rewind a stream to update the ZIP central directory entries.

        Why do some programs ignore the ZIP central directory as the ONE source of truth?

        Before SSDs and their minimal seek latency, coders discovered that scanning the ZIP local entries to be a faster way to build up the ZIP archive entries, otherwise you're forced to seek all the way to the end of a ZIP archive and work backwards to locate the central directory and proceed accordingly.

        If the central directory in the ZIP archive is corrupted or missing, the user could still recover the data for the individual files (if all the ZIP local entries are intact). In this case, ignoring the ZIP central dir and scanning sequentially for ZIP local entries is REQUIRED.

        The fault here is the security scanners. There's never been any guarantee that the ONLY data in the ZIP archive was only valid ZIP local file entries followed by the ZIP central directory. Between ZIP local file entries, one can place any data. Unzip programs don't care.

        • failbuffer 3 hours ago

          In a similar vein, HTTP header smuggling attacks exploit differences in header parsing. For instance, a reverse proxy and a web server might handle repetition of headers or the presence of whitespace differently.

        • _pdp_ 3 hours ago
          • cyberax 12 minutes ago

            17 years? We played tricks with zip bombs that used this approach during 90-s.

            • tedk-42 an hour ago

              I'm with you.

              I've evaded all sorts of scanning tools by base64 encoding data (i.e. binary data) to and copy pasting the text from insecure to highly secured environments.

              At the end of the day, these malware databases rely on hashing and detecting for known bad hashes and there are lots of command line tools to help get over that sort of thing like zip/tar etc.

            • avidiax 5 days ago

              This is sometimes used non-maliciously to concatenate zipped eBooks to a JPEG of the cover art. 4Chan's /lit/ board used to do this, but I can't find any reference to it anymore.

              https://entropymine.wordpress.com/2018/11/01/about-that-jpeg...

              https://github.com/Anti-Forensics/tucker

              • theoreticalmal an hour ago

                I think the reason it’s not used anymore is because it was used maliciously to share CSAM on other boards and 4chan banned uploading anything that looks like a concatenated zip

              • Jerrrrrrry 41 minutes ago

                Remember, secure encryption, good compression, and truely random data are indistinguishable.

                It's best to paste that encrypted payload into a JPG with some bullshit magic headers and upload that to a trusted Exfil pivot instead.

                Or, to get SuperMarioKart.rom to work with your chromeApp-XEMU emulator to play during down-time at work, just rename it to SMB.png, and email it you yourself.

                • Retr0id 3 hours ago

                  Related, my two favourite ZIP parser issues:

                  https://bugzilla.mozilla.org/show_bug.cgi?id=1534483 "Ambiguous zip parsing allows hiding add-on files from linter and reviewers"

                  https://issues.chromium.org/issues/40082940 "Security: Crazy Linker on Android allows modification of Chrome APK without breaking signature"

                  The big problem with the ZIP format is that although the "spec" says what a ZIP file looks like, it does not tell you in concrete terms how to parse it, leading to all sorts of ambiguities and divergent implementations. Someone needs to write a "strict ZIP" spec that has explicit and well-defined parsing rules, and then we need to get every existing ZIP implementation to agree to follow said spec.

                  • wyldfire 2 hours ago

                    Or: better yet, just use an archive format for archival and a compression layer for compression. Don't use zip at all.

                    • Retr0id 23 minutes ago

                      What non-compressing archive format would you suggest? tar doesn't support random access which is a non-starter for many use cases.

                  • nh2 4 hours ago

                    WinRAR does it right, 7zip and Windows Explorer do it wrong according to https://en.m.wikipedia.org/wiki/ZIP_(file_format)

                    > only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives)

                    • kencausey 5 days ago
                      • 486sx33 5 days ago

                        Encrypted ZIP files have long been a way to evade any sort of malware detection during transmission.

                        • slt2021 4 hours ago

                          you dont need to even encrypt zip, since encrypted ZIP file can trigger tripwires during transmission.

                          unencrypted zip, but using .docx or .xlsx format is the way to go (the best way is to hide inside one of the openxml tags or xml comments )

                          • Jerrrrrrry an hour ago

                            encode it with enough filler as to reduce its "entropy" :)

                          • mmcgaha 5 hours ago

                            I doubt it still works but things I needed to get through email I would embed in word documents.

                            • cogman10 5 hours ago

                              Would probably still work. There's just too many formats which makes it very hard for a content blocker to really stop.

                              I pity the programmer that has to decode the 1000 versions of xls to find the binary blob that could be a virus.

                              • mplewis9z 5 hours ago

                                A modern word document file (.docx) is literally just a Zip archive with a special folder structure, so unless your company is scanning word document contents I can’t imagine there’s any issue.

                              • 0cf8612b2e1e 5 hours ago

                                Pretty common in corporate world. Email scanner will helpfully drop all manner of useful files between staff, so make an encrypted zip with simple password.

                              • waltbosz 3 hours ago

                                I've done similar stuff. Concat a zip (that keeps throwing false positives) to a jpg and scanners will treat it like a jpg. Then write a script that chops off the jpg to access the zip. All this so I could automate a web a app deploy.

                                • ttyprintk an hour ago

                                  Or attach a zip through Exhange

                                • mtnGoat 2 hours ago

                                  Nice remix of an old technique.

                                  I remember file packing exes together for fun and profit back in the day.

                                  • unnouinceput an hour ago

                                    Quote: "To defend against concatenated ZIP files, Perception Point suggests that users and organizations use security solutions that support recursive unpacking."

                                    That's the worse advice actually. You want the hidden shit to stay there unable to be seen by default programs. That's how you got all the crap in Windows mail starting from 90's when Outlook started to trying to be "smart" and automatically detect and run additional content. Be dumb and don't discover anything, let it rot in there. The only one that should do this is the antivirus, rest of unpackers/readers/whatever stay dumb.

                                    • pjdkoch 4 hours ago

                                      Now?