• wahern 8 months ago

    From a security perspective, and as a programmer, I've never liked ZIP files precisely because there are two mechanisms to identify the contents, the per-file header and the central directory. When you're defining a format, protocol, or w'ever, ideally there should be a single source of truth, a single valid & useable parse, etc; basically, the structure of the data or process should be intrinsically constraining. There shouldn't be a pathway for multiple implementations to produce different functional results, and ZIP archives are in my mind the archetype for getting this wrong. tar files aren't ideal, but in the abstract (ignoring issues with long file names) they don't have this problem. (tar files don't support random access, either, but better to rely on something suboptimal than something that's fundamentally broken.)

    A similar security problem, though not as fundamentally baked into the format, is MIME parsing. The header section is supposed to be delimited from the body by an empty line (likewise for nested entities). But what if it's not? For better or worse, Sendmail was tolerant of the absence of an empty line and treated as headers everything up to the first line that didn't parse as a header or header continuation.[1] Later systems, like Postfix, originally copied this behavior. But Microsoft Exchange and Outlook are even more tolerant, yet in a much more horrendous way, by parsing as a header anything that looks like a Content-Type or related header immediately after the first empty line. They have similar hacks for other, similar violations. So today, depending on the receiving software, you can send messages that appear differently, including having different attachments. It's a security nightmare!

    I not a Postel's Law hater, but ZIP archives and Microsoft's MIME parsing behaviors are just egregiously wrong and indefensible. And even if you think the Robustness Principle is inherently bad policy, you still have to design your formats, protocols, and systems to be as intrinsically constraining as possible. You can't rely on vendors adhering to a MUST rule in an RFC, unless it's unquestioningly crystal clear what the repercussions will be--everybody else will (because it's the natural and convenient thing to do) reject your output as trash and drop it on the floor immediately so violations never have a chance to get a foothold.

    [1] MTAs don't necessarily need to care about MIME parsing, but Sendmail eventually gained features where parsing message contents mattered, setting the de facto norm (for those paying attention) until Microsoft came along.

    • kevin_thibedeau 8 months ago

      The central directory allows zip archives to be split across multiple files on separate media without needing to read them all in for selective extraction. Not particularly useful today but invaluable in the sneakernet era with floppies.

      • thrdbndndn 8 months ago

        Still useful today.

        Try to transmit a 100G file through any service is usually a pain especially if one end has non-stable Internet.

        • rakoo 8 months ago

          That's a very bad way of solving that issue. If transmission is a problem, either use a proper retry-friendly protocol (such as bittorrent) or split the file. Using hacks on the data format just leads to additional pain

          • thrdbndndn 8 months ago

            > or split the file

            Wait, I'm confused. Isn't this what OP was talking about?

            • MereInterest 8 months ago

              Splitting the file doesn’t need to be part of the file format itself. I could split a file into N parts, then concatenate the parts together at a later time, regardless of what is actually in the file.

              The OP was saying that zip files can specify their own special type of splitting, done within the format itself, rather than operating on the raw bytes of a saved file.

              • Twirrim 8 months ago

                > Splitting the file doesn’t need to be part of the file format itself. I could split a file into N parts, then concatenate the parts together at a later time, regardless of what is actually in the file.

                I'm inclined to agree with you.

                You can see good examples of this with the various multi-part upload APIs used by cloud object storage platforms like S3. There's nothing particularly fancy about it. Each part is individually retry-able, with checksumming of parts and the whole, so you get nice and reliable approaches.

                On the *nix side, you can just run split over a file, to the desired size, and you can just cat all the parts together, super simple. It would be simple to have a CLI or full UI tool that would handle the pause between `cat`s as you swapped in and out various media, if we hark back to the zip archive across floppy disks days.

                • hombre_fatal 8 months ago

                  Without knowing the specifics of what's being talked about, I guess it makes sense that zip did that because the OS doesn't make it easy for the average user to concatenate files, and it would be hard to concatenate 10+ files in the right order. If you have to use a cli then it's not really a solution for most people, nor is it something I want to have to do anyways.

                  The OS level solution might be a naming convention like "{filename}.{ext}.{n}" like "videos.zip.1" where you right-click it and choose "concatenate {n} files" and turns them into "{filename}.{ext}".

                  • justsomehnguy 8 months ago

                    > the OS doesn't make it easy for the average user to concatenate files

                    Bwah! You are probably thinking too much GUI.

                        X301 c:\Users\justsomehnguy>copy /?
                        Copies one or more files to another location.
                    
                        COPY [/D] [/V] [/N] [/Y | /-Y] [/Z] [/L] [/A | /B ] source [/A | /B]
                             [+ source [/A | /B] [+ ...]] [destination [/A | /B]]
                    
                        [skipped]
                    
                        To append files, specify a single file for destination, but multiple files
                        for source (using wildcards or file1+file2+file3 format).
                    • thrdbndndn 8 months ago

                      Try to concatenate 1000 files with natural sorting names using `copy`. I did this regularly and I have to write a python script it make it easier.

                      It's much easier to just right click any of the zip part files and let 7-zip to unzip it, and it will tell me if any part is missing or corrupt.

                  • notimetorelax 8 months ago

                    Why would you use manual tools to achieve what ZIP archive can give you out of the box? E.g. if you do this manually you’d need to worry about file checksum to ensure you put it together correctly.

                    • rakoo 8 months ago

                      Because, as said before, zip managing splits ends with two sources of truth in the file format that can differ while the whole file still being valid

                • _0xdd 8 months ago

                  I've had good luck using tools like piping large files through `mbuffer`[1] such as ZFS snapshots, and it's worked like a charm.

                  [1] https://man.freebsd.org/cgi/man.cgi?query=mbuffer&sektion=1&...

                  • meehai 8 months ago

                    couldn't agree more!

                    We need to separate and design modules as unitary as possible:

                    - zip should ARCHIVE/COMPRESS, i.e. reduce the file size and create a single file from the file system point of view. The complexity should go in the compression algorithm.

                    - Sharding/sending multiple coherent pieces of the same file (zip or not) is a different module and should be handled by specialized and agnostic protocols that do this like the ones you mentioned.

                    People are always doing tools that handle 2 or more use cases instead of following the UNIX principle to create generic and good single respectability tools that can be combined together (thus allowing a 'whitelist' of combinations which is safe). Quite frankly it's annoying and very often leads to issues such as this that weren't even thought in the original design because of the exponential problem of combining tools together.

                    • TeMPOraL 8 months ago

                      Well, 1) is zip with compression into single file, 2) is zip without compression into multiple files. You can also combine the two. And in all cases, you need a container format.

                      The tasks are related enough that I don't really see the problem here.

                      • meehai 8 months ago

                        I meant that they should be separate tools that can be piped together. For example: you have 1 directory of many files (1Gb in total)

                        `zip out.zip dir/`

                        This results in a single out.zip file that is, let's say 500Mb (1:2 compression)

                        If you want to shard it, you have a separate tool, let's call it `shard` that works on any type of byte streams:

                        `shard -I out.zip -O out_shards/ --shard_size 100Mb`

                        This results in `out_shards/1.shard, ..., out_shards/5.shard`, each of 100Mb each.

                        And then you have the opposite: `unshard` (back into 1 zip file) and `unzip`.

                        No need for 'sharding' to exist as a feature in the zip utility.

                        And... if you want only the shard from the get go without the original 1 file archive, you can do something like:

                        `zip dir/ | shard -O out_shards/`

                        Now, these can be copied to the floppy disks (as discussed above) or sent via the network etc. The main thing here is that the sharding tool works on bytes only (doesn't know if it's an mp4 file, a zip file, a txt file etc.) and does no compression and the zip tool does no sharding but optimizes compression.

                        • shagie 8 months ago

                          In unix, that is split https://en.wikipedia.org/wiki/Split_(Unix) (and its companion cat).

                          The problem is that on DOS (and Windows), it didn't have the unix philosophy of a tool that did one thing well and you couldn't depend on the necessary small tools being available. Thus, each compression tool also included its own file spanning system.

                          https://en.wikipedia.org/wiki/File_spanning

                          • kd5bjo 8 months ago

                            The key thing that you get by integrating the two tools is the ability to more easily extract a single file from a multipart archive— Instead of having to reconstruct the entire file, you can look in the part/diskette with the index to find out which other part/diskette you need to use to get at the file you want.

                            • canucker2016 8 months ago

                              Don't forget that with this two-step method, you also require enough diskspace to hold the entire ZIP archive before it's sharded.

                              AFAIK you can create a ZIP archive saved to floppy disks even if your source hard disk has low/almost no free space.

                              Phil Katz (creator of the ZIP file format) had a different set of design constraints.

                          • rakoo 8 months ago

                            The problem seems to be that each individual split part is valid in itself. This means that the entire file, with the central directory at the end, can diverge from each entry. This is the original issue.

                          • murderfs 8 months ago

                            Why do you believe that archiving and compressing belong in the same layer more than sharding does? The unixy tool isn't zip, it's tar | gzip.

                            • edflsafoiewq 8 months ago

                              tar|gzip does not allow random access to files. You have to decompress the whole tarball (up to the file you want).

                              • jonjonsonjr 8 months ago

                                Even worse, in the general case, you should really decompress the whole tarball up to the end because the traditional mechanism for efficiently overwriting a file in a tarball is to append another copy of it to the end. (This is similar to why you should only trust the central directory for zip files.)

                            • chrisweekly 8 months ago

                              I agree!

                              Also, I enjoyed your Freudian slip:

                              single respectability tools

                              ->

                              single responsibility tools

                          • stouset 8 months ago

                            If the point is being able to access some files even if the whole archive isn’t uploaded, why not create 100 separate archives each with a partial set of files?

                            Or use a protocol that supports resume of partial transmits.

                            • thrdbndndn 8 months ago

                              Because sometimes your files are very large it's not easy to create separate archives with (roughly) even size.

                              A single video can easily be over 20GB, for example.

                              • ZoomZoomZoom 8 months ago

                                This carries the information that all those files are a pack in an inseparable and immutable way, contrary to encoding that in the archive's name or via some parallel channel.

                                • rawling 8 months ago

                                  Presumably it compresses better if it's all one archive?

                                • anthk 8 months ago

                                  nncp, bittorrent...

                                  • SubiculumCode 8 months ago

                                    I recently had to do this with about 700Gb, and yeah OneDrive hated that. I ended up concatenating tars together.

                                • poincaredisk 8 months ago

                                  >there are two mechanisms to identify the contents, the per-file header and the central directory

                                  There is only one right, standard mandated, way to identify the contents (central directory). For one or another reason many implementations ignore it, but I don't think it's fair to say that the zip format in ambiguous.

                                  • immibis 8 months ago

                                    Sometimes you want to read the file front-to-back in a streaming fashion.

                                    • cxr 8 months ago

                                      That doesn't change anything wrt what the parent commenter said.

                                      Imagine—

                                      Officer: The reason why I pulled you over is that you were doing 45, but this is a 25 mph school zone right now, and even aside from that the posted speed when this is not a school zone is only 35. So you shouldn't be going faster than that, like you were just now.

                                      Motorist: But sometimes you want to go faster than that.

                                  • canucker2016 8 months ago

                                    I don't think you understand the reason for the ZIP archive file design.

                                    Back in the late 1980s, backup media for consumers was limited to mostly floppy disks, some users had tape/another hard disk.

                                    Say you had a variable number of files to compress and write out to a ZIP archive.

                                    IF you write out the central directory first, followed by all the individually possibly compressed and/or encrypted files, you'd have to calculate all the files to be archived, process them (compress and/or encrypt), write them out, then go back and update the info for the actual compressed values and offsets for the ZIP local entries.

                                    Now if you wanted to add files to the ZIP archive, the central directory will grow and push the following individual compressed/encrypted files further out and you'll have to update ALL the central directory entries since each entry includes an offset from the beginning of the disk - if the archive does not span multiple disks, this offset is from the start of the ZIP archive file.

                                    So that's one reason for why the ZIP central directory is placed at the end of the ZIP archive file. If you're streaming the output from a ZIP program, then placing the ZIP central dir at the start of the file is a non-starter since you can't rewind a stream to update the ZIP central directory entries.

                                    Why do some programs ignore the ZIP central directory as the ONE source of truth?

                                    Before SSDs and their minimal seek latency, coders discovered that scanning the ZIP local entries to be a faster way to build up the ZIP archive entries, otherwise you're forced to seek all the way to the end of a ZIP archive and work backwards to locate the central directory and proceed accordingly.

                                    If the central directory in the ZIP archive is corrupted or missing, the user could still recover the data for the individual files (if all the ZIP local entries are intact). In this case, ignoring the ZIP central dir and scanning sequentially for ZIP local entries is REQUIRED.

                                    The fault here is the security scanners. There's never been any guarantee that the ONLY data in the ZIP archive was only valid ZIP local file entries followed by the ZIP central directory. Between ZIP local file entries, one can place any data. Unzip programs don't care.

                                    • Spivak 8 months ago

                                      The more general principle is that single source of truth is not ideal for data storage where you're worried about corruption. There's a backup MBR on your hard disk at the end, your ext4 filesystem has many backups of your superblock.

                                      When it comes to user data the natural programmer instinct for "is exactly what I expect or fail" which is typically good design, falls to pragmatism where try your hardest to not lose data, partial results are better then nothing, is desired.

                                      • tjoff 8 months ago

                                        Having a backup copy isn't quite the same thing though. It is just a copy of the single source of truth. Not a different implementation or used for a different use case. Also trivial to verify.

                                      • unoti 8 months ago

                                        > coders discovered that scanning the ZIP local entries to be a faster way to build up the ZIP archive entries, otherwise you're forced to seek all the way to the end of a ZIP archive and work backwards to locate the central directory

                                        Would this have worked? Reserve a few bytes at the beginning of the archive at a fixed location offset from the start, and say "this is where we will write the offset to where the central directory will start." Then build the whole archive, writing the central directory at the end. Then seek back to that known offset at the start of the file and write the offset to the central directory. When creating the archive, we can write the central directory to a temp file, and then append that in to the end of the file we're building at the end, and fix up the offset.

                                        Seems like this strategy would enable us to both have a number of files in the archive that are known at the beginning, and also allow us to do a high-speed seek to the central directory when reading the archive.

                                        I imagine people thought about this idea and didn't do it for one reason or another. I can imagine why we didn't do that for Unix TAR-- most tape devices are a one-way write stream and don't have random access. But ZIP was designed for disks; I'm curious why this idea wouldn't have solved both problems.

                                        • canucker2016 8 months ago

                                          You forgot about the streaming case. ZIP creators can stream the archive out and never seek back earlier in the stream (send the data to another tool or a write-only medium).

                                          The central directory at the end of the archive fits that constraint. Any design where a placeholder needs to be updated later won't.

                                      • failbuffer 8 months ago

                                        In a similar vein, HTTP header smuggling attacks exploit differences in header parsing. For instance, a reverse proxy and a web server might handle repetition of headers or the presence of whitespace differently.

                                        • spintin 8 months ago

                                          [dead]

                                        • _pdp_ 8 months ago
                                          • tedk-42 8 months ago

                                            I'm with you.

                                            I've evaded all sorts of scanning tools by base64 encoding data (i.e. binary data) to and copy pasting the text from insecure to highly secured environments.

                                            At the end of the day, these malware databases rely on hashing and detecting for known bad hashes and there are lots of command line tools to help get over that sort of thing like zip/tar etc.

                                            • trollbridge 8 months ago

                                              I used to have a workflow for updating code inside a very highly secure environment that relied on exactly this:

                                              Run build of prior version, run build of current version, run diff against them, compress with xz -9, base64 encode, generate output, base64 encode, e-mail it to myself, copy text of email, type "openssl base64 -d | unxz | bash", right click.

                                              E-mailing this was completely fine according to the stringent security protocols but e-mailing a zip of the code, etc. was absolutely 100% not. That would have to go on the vendor's formal portal.

                                              (Eventually I just opened my own "portal" to upload binaries to, put the vendor I worked for's logo on it, and issued a statement saying it was an official place to download binaries from the vendor. But sometimes their WAF would still mangle downloads or flag them as a risk, so I made sure builds had options of coming in an obfuscated base64 format.)

                                              • athrowaway3z 8 months ago

                                                rot13 must be outlawed for its use by cyber-criminals!

                                              • cyberax 8 months ago

                                                17 years? We played tricks with zip bombs that used this approach during 90-s.

                                                • shreddit 8 months ago

                                                  Yeah the 90s are just 17 ye… oh no I’m old

                                              • avidiax 8 months ago

                                                This is sometimes used non-maliciously to concatenate zipped eBooks to a JPEG of the cover art. 4Chan's /lit/ board used to do this, but I can't find any reference to it anymore.

                                                https://entropymine.wordpress.com/2018/11/01/about-that-jpeg...

                                                https://github.com/Anti-Forensics/tucker

                                                • theoreticalmal 8 months ago

                                                  I think the reason it’s not used anymore is because it was used maliciously to share CSAM on other boards and 4chan banned uploading anything that looks like a concatenated zip

                                                  • yapyap 8 months ago

                                                    :/

                                                  • bsimpson 8 months ago

                                                    That first read should be an HN post in its own right.

                                                  • 47282847 8 months ago

                                                    FYI, the blog post describes a zip file embedded in the ICC profile data of a JPEG in order to survive web image transformations, whereas the linked Tucker script is just appending the zip to the image.

                                                  • nh2 8 months ago

                                                    WinRAR does it right, 7zip and Windows Explorer do it wrong according to https://en.m.wikipedia.org/wiki/ZIP_(file_format)

                                                    > only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives)

                                                    • leni536 8 months ago

                                                      This contradicts the specification, which explicitly supports stream-processing zip files, which necessarily can't happen if your source of truth is the central directory record. Unless if you can wrap your stream processing in some kind of transaction that you can drop once you discover foul play.

                                                      • nh2 8 months ago

                                                        Source what you're referring to / explanation?

                                                        In ZIP, later info wins. I don't see how that isn't always streamable.

                                                        • leni536 8 months ago

                                                          Hmm, it appears that you are right. I vaguely remembered that zip was streamable, but it appears that it only means that it's stream writable, as in you can write zip file contents from a datastream of unknown size, and append the checksum and file size later in the zip file stream.

                                                          However such a zip file is definitely not stream readable, as the local file header no longer contains the size of the following file data, so you can't know where to end a given file. So for reading you definitely have to locate the central directory record.

                                                          In my defense the spec says[1].

                                                          > 4.3.2 Each file placed into a ZIP file MUST be preceded by a "local file header" record for that file. Each "local file header" MUST be accompanied by a corresponding "central directory header" record within the central directory section of the ZIP file.

                                                          Then in 4.3.6 it describes the file format, which seems to be fundamentally incompatible to altering zip files by appending data, as the resulting file would not conform to this format.

                                                          So basically some implementations (maybe opportunistically, relying on compressed sizes being available in the local file headers) stream read from the zip file, assuming that it's a valid zip file, but not validating. Some other implementations only use the central directory record at the end, but don't validate the file format either.

                                                          A validating zip file parser should be possible, by locating the central directory record and checking that the referred files with their metadata fully cover the file contents of the zip files, without gaps and overlaps. But this probably won't win any benchmarks.

                                                          [1] https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

                                                          • nh2 8 months ago

                                                            Good observations!

                                                            Indeed: ZIP files are stream-writable, and some ZIP files are stream-readable, but not both: ZIP files that were stream-written are not steam-readable.

                                                            Also, steaming unzipping always requires that by the time you arrive at the central directory, you delete so-far-unzipped files that don't have entries in it, as those were "deleted".

                                                            > Then in 4.3.6 it describes the file format, which seems to be fundamentally incompatible to altering zip files by appending data, as the resulting file would not conform to this format.

                                                            My interpretation of the spec and specifically of 4.3.6 is that it is informational for how ZIP files usually look, and that you may store arbitrary data in between files; such data then doesn't count as "files". This reading then does allow appending and concatenation.

                                                            Unfortunately 4.3.6 does not have MUST/MAY wording so we don't really know if this reading was intended by the authors (maybe they clarify it in the future), but allowing this reading seems rather useful because it permits append-only modification of existing ZIP files. (The wording /'A ZIP file MUST have only one "end of central directory record"'/ suggests somehow that the authors didn't intend this, but again one could argue that this is not necessarily to state, that there is only one EOCDR by definition, and that any previous ones are just garbage data that is to be ignored.)

                                                            • cxr 8 months ago

                                                              > Unfortunately 4.3.6 does not have MUST/MAY wording so we don't really know if this reading was intended by the authors

                                                              It's not explicit from the APPNOTE, but that's not the same as saying "we don't really know". We do know—islands of opaque data are allowed and that's the entire reason the format is designed the way it is. Katz designed ZIP for use on machines with floppy drives, and append-only modification of archives spanning multiple media was therefore baked into the scheme from the very roots. He also an produced an implementation that we happen to be able to check, and we know it works that way.

                                                              The only way to arrive at another interpretation is to look at APPNOTE in isolation and devoid of context.

                                                              > one could argue that this is not necessarily to state, that there is only one EOCDR by definition, and that any previous ones are just garbage data that is to be ignored

                                                              That is the correct interpretation; the wording doesn't suggest otherwise.

                                                      • cxr 8 months ago

                                                        The article says that WinRAR "displays both ZIP structures", so, no, it doesn't do it right. Of the three, only Windows Explorer is close to the correct behavior (showing the contents of the ZIP that was appended closest to the end, and ignoring everything else). The exception to its correctness is that they report it may fail to process the ZIP and identify it as being corrupt, which shouldn't happen so long as the endmost ZIP is well-formed.

                                                        • nh2 8 months ago

                                                          > The article says that WinRAR "displays both ZIP structures", so, no, it doesn't do it right.

                                                          That would be true, but the bleepingcomputer article seems to be misquoting that fact.

                                                          The original research they reference (which I also find to be a more sensible article),

                                                          https://perception-point.io/blog/evasive-concatenated-zip-tr...

                                                          says that WinRAR only shows they last archive contents, and shows a screenshot of that:

                                                          > WinRAR, on the other hand, reads the second central directory and displays the contents of the second archive

                                                          • cxr 8 months ago

                                                            Thanks, that's a much better article (and published on a site that, while slightly annoying on its own, is infested with fewer rudely intrusive and resource-consuming ads).

                                                            Reading both, it's clear that (a) you are correct, and (b) the submitted link, besides being materially inaccurate, is shameless reblog spam and should be changed.

                                                      • Retr0id 8 months ago

                                                        Related, my two favourite ZIP parser issues:

                                                        https://bugzilla.mozilla.org/show_bug.cgi?id=1534483 "Ambiguous zip parsing allows hiding add-on files from linter and reviewers"

                                                        https://issues.chromium.org/issues/40082940 "Security: Crazy Linker on Android allows modification of Chrome APK without breaking signature"

                                                        The big problem with the ZIP format is that although the "spec" says what a ZIP file looks like, it does not tell you in concrete terms how to parse it, leading to all sorts of ambiguities and divergent implementations. Someone needs to write a "strict ZIP" spec that has explicit and well-defined parsing rules, and then we need to get every existing ZIP implementation to agree to follow said spec.

                                                        • wyldfire 8 months ago

                                                          Or: better yet, just use an archive format for archival and a compression layer for compression. Don't use zip at all.

                                                          • Retr0id 8 months ago

                                                            What non-compressing archive format would you suggest? tar doesn't support random access which is a non-starter for many use cases.

                                                            • acka 8 months ago

                                                              DAR (Disk ARchiver)[1] looks to be a good alternative. It supports random access, encryption, and individual file compression within the archive.

                                                              [1] http://dar.linux.free.fr/

                                                              • Retr0id 8 months ago

                                                                That seems counter to GP's suggestion of doing compression at a separate layer

                                                                • olddustytrail 8 months ago

                                                                  Not really. There's no "dar compression" format. It calls different compression tools just like tar.

                                                                  • Retr0id 8 months ago

                                                                    You could say the same about ZIP (it uses deflate by default but optionally supports things like zstd)

                                                                    • undefined 8 months ago
                                                                      [deleted]
                                                                • undefined 8 months ago
                                                                  [deleted]
                                                            • Jerrrrrrry 8 months ago

                                                              Remember, secure encryption, good compression, and truely random data are indistinguishable.

                                                              It's best to paste that encrypted payload into a JPG with some bullshit magic headers and upload that to a trusted Exfil pivot instead.

                                                              Or, to get SuperMarioKart.rom to work with your chromeApp-XEMU emulator to play during down-time at work, just rename it to SMB.png, and email it you yourself.

                                                              • seanhunter 8 months ago

                                                                  > Remember, secure encryption, good compression, and truely random data are indistinguishable.
                                                                
                                                                Yes, and the only reason the bad guys get away with this is the people who trust signature-based scanning at the perimeter to detect all threats.

                                                                One of the hacks I'm most proud of in my whole career was when we were doing a proof of concept at an enterprise client we were being deliberately obstructed by the internal IT group due to politics between their boss and the boss who sponsored our POC. For unrelated trademark-related reasons we were prevented by a third party from having the software on physical media but we had a specific contractual clause agreeing to let us download it for install. So while we had been contractually engaged to provide this software and we had a strict deadline to prove value, the enterprise IT group were preventing us from actually getting it through the virus-scanning firewall to get it installed. What to do?

                                                                The scanner looked for the signature of executable or zipped files and blocked them. It would also block any files larger than a certain size. So what I did was write two shell scripts called "shred" and "unshred". "Shred" would take any files you gave it as input, make them into a tarball, encrypt that to confuse the virus scanner and then split it up into chunks small enough to get through the firewall, and "unshred" would reverse this. This almost worked, but I found that the first chunk was always failing to transmit through the firewall. The scanner noticed some signature that openssl was putting at the front of the file when encrypting it. The solution? Change shred to add 1k of random noise to the front of the file and unshred to remove it.

                                                                Job done. Our files were transmitted perfectly (I got the scripts to check the md5sum on both sides to be sure), and though the process was slow, we could continue.

                                                                The funny thing was the POC was a bake-off versus another (more established) vendor and they couldn't get their software installed until they had done a couple of weeks of trench warfare with enterprise IT. "To keep things fair" the people organising the POC decided to delay to let them have time to install, and eventually the person blocking us from installing was persuaded to change their mind (by being fired), so "shred" and "unshred" could be retired.

                                                                • Jerrrrrrry 8 months ago

                                                                    >This almost worked, but I found that the first chunk was always failing to transmit through the firewall
                                                                  
                                                                  Magic numbers/header scanning or, if it was awhile back, BOM (byte order mark) messing stuff up.
                                                                  • Hendrikto 8 months ago

                                                                    I did basically the same, to get some important CLI tools past the company firewall, just a few months back.

                                                                    Crazy that this is easier than dealing with the bullshit politics, to get some essentials tools to do my job. German public service is a joke. I quit since.

                                                                    • Jerrrrrrry 8 months ago

                                                                        >I quit since.
                                                                      
                                                                      apparently they have too :)
                                                                    • nwellinghoff 8 months ago

                                                                      You could have just done ssh reverse shell to a public jump server you control? Might have been easier.

                                                                    • rocqua 8 months ago

                                                                      Good compression should still be cryptographically distinguishable from true randomness right?

                                                                      Sure the various measures of entropy should be high, but I always just assumed that compression wouldn't pass almost any cryptographic randomness test.

                                                                  • kencausey 8 months ago
                                                                    • 486sx33 8 months ago

                                                                      Encrypted ZIP files have long been a way to evade any sort of malware detection during transmission.

                                                                      • slt2021 8 months ago

                                                                        you dont need to even encrypt zip, since encrypted ZIP file can trigger tripwires during transmission.

                                                                        unencrypted zip, but using .docx or .xlsx format is the way to go (the best way is to hide inside one of the openxml tags or xml comments )

                                                                        • Jerrrrrrry 8 months ago

                                                                          encode it with enough filler as to reduce its "entropy" :)

                                                                        • mmcgaha 8 months ago

                                                                          I doubt it still works but things I needed to get through email I would embed in word documents.

                                                                          • cogman10 8 months ago

                                                                            Would probably still work. There's just too many formats which makes it very hard for a content blocker to really stop.

                                                                            I pity the programmer that has to decode the 1000 versions of xls to find the binary blob that could be a virus.

                                                                            • telgareith 8 months ago

                                                                              1000? No. There's two. Openxml and the original xls. OpenXML can be scanned for issues like any other XML file.

                                                                              Alas, it's more difficult to get excel to accept that it shouldn't delete leading zeros than it is to check a spreadsheet's sus-o-scale.

                                                                              • cogman10 8 months ago

                                                                                1000 is an over exaggeration, but it is not just 2 standards.

                                                                                xls morphed with every version of Microsoft Excel. MS Excel has pretty good backwards compatibility, but making an xls parser is notoriously hard because of the many differences between versions.

                                                                            • mplewis9z 8 months ago

                                                                              A modern word document file (.docx) is literally just a Zip archive with a special folder structure, so unless your company is scanning word document contents I can’t imagine there’s any issue.

                                                                            • 0cf8612b2e1e 8 months ago

                                                                              Pretty common in corporate world. Email scanner will helpfully drop all manner of useful files between staff, so make an encrypted zip with simple password.

                                                                            • waltbosz 8 months ago

                                                                              I've done similar stuff. Concat a zip (that keeps throwing false positives) to a jpg and scanners will treat it like a jpg. Then write a script that chops off the jpg to access the zip. All this so I could automate a web a app deploy.

                                                                              • ttyprintk 8 months ago

                                                                                Or attach a zip through Exhange

                                                                              • unnouinceput 8 months ago

                                                                                Quote: "To defend against concatenated ZIP files, Perception Point suggests that users and organizations use security solutions that support recursive unpacking."

                                                                                That's the worse advice actually. You want the hidden shit to stay there unable to be seen by default programs. That's how you got all the crap in Windows mail starting from 90's when Outlook started to trying to be "smart" and automatically detect and run additional content. Be dumb and don't discover anything, let it rot in there. The only one that should do this is the antivirus, rest of unpackers/readers/whatever stay dumb.

                                                                                • brabel 8 months ago

                                                                                  I agree. The ZIP definition is extremely clear that the contents of the ZIP are defined by the single Central Directory at the end of the file. Local headers are only valid if pointed to by the Central Directory. Any other local headers are just supposed to be treated as garbage, except by software that is specifically meant to recover corrupted ZIP archive's contents.

                                                                                • mtnGoat 8 months ago

                                                                                  Nice remix of an old technique.

                                                                                  I remember file packing exes together for fun and profit back in the day.

                                                                                  • user070223 8 months ago

                                                                                    Last I checked virustotal doesn't test nested archives for viruses even though it's an issue old as modern computing

                                                                                    • canucker2016 8 months ago

                                                                                      VirusTotal just passes on the file to the actual virus scanners (a for-loop with an api as a front-end) - it's up to each individual virus scanner to scan as they see fit (including scanning any unreferenced holes and compressed zip entries in a zip archive).

                                                                                      I have no idea why those virus scanners don't check nested archives. Maybe time/cpu constraints?

                                                                                    • alkonaut 8 months ago

                                                                                      Why do scanners need to look inside compressed archives at all? If the malicious file is extracted, it can be detected then. If it’s not extracted and instead used directly from the archive, then the offending code that executes malicious payloads from inside archives should be detectable? Is that the role of AutoIt in this scenario?

                                                                                      • rocqua 8 months ago

                                                                                        Because they want to check on the wire, before it hits an endpoint.

                                                                                        Common situation is detecting malware sent through a phishing mail. You want to intercept those before a user can unpack the file.

                                                                                        • alkonaut 8 months ago

                                                                                          Yes that was my question: "why, do you want to do that when it sounds like a futile job to do?"

                                                                                          If you have any kind of transformation (whatever trivial ROT or XOR) of the payload then there is no chance of detecting it via pattern matching, and if you need a "program" to process it into its malicious form, then how does detection work?

                                                                                          I understand you want to keep malicious payload from reaching endpoints in a form that would risk being consumed (malformed documents, images causing buffer overflows, executable and script files). But beyond that?

                                                                                          • canucker2016 8 months ago

                                                                                            And "Defense In Depth" - the more layers that bad actors have to avoid/circumvent reduces the chance of their success.

                                                                                            see https://www.fortinet.com/resources/cyberglossary/defense-in-...

                                                                                        • rty32 8 months ago

                                                                                          I checked the time to make sure today is in the year of 2024.

                                                                                          I swear this has been widely known at least since the Win 98 era.

                                                                                          • PreInternet01 8 months ago

                                                                                            > To defend against concatenated ZIP files, Perception Point suggests that users and organizations use security solutions that support recursive unpacking

                                                                                            Yeah, or, you know, just outright reject any ZIP file that doesn't start with a file entry, where a forward-scan of the file entries doesn't match the result of the central-directory-based walk.

                                                                                            There is just so much malicious crud coming in via email that you just want to instantly reject anything that doesn't look 'normal', and you definitely don't want to descend into the madness of recursive unpacking, 'cuz that enables another class of well-known attacks.

                                                                                            And no, "but my precious use-case" simply doesn't apply, as you're practically limited to a whole 50MB per attachment anyway. Sure, "this ZIP file is also a PDF is also a PNG is also a NES cartridge which displays its own MD5" (viz https://github.com/angea/pocorgtfo/tree/master/writeups/19) has a place (and should definitely be required study material for anyone writing mail filters!), but business email ain't it.

                                                                                            • exmadscientist 8 months ago

                                                                                              That's fair, but do realize that sometimes people do have to send around archives from the last century (they got archived for a reason!) or created by eldritch-horror tools that just make weird files (which, sometimes, are the gold masters for certain very important outputs...). And it's kind of annoying when these weird but standard files get silently dropped. Especially when that same file went through just fine yesterday, before the duly zealous security settings changed for whatever reason.

                                                                                              All I'm saying is, don't drop my stuff silently because your code couldn't be arsed to deal with (ugly) standard formats. At least give me a warning ("file of type not scannable" or whatever, the actual words are not so important). And then when I have to yell at the Shanghai people I can yell at them for the correct reasons.

                                                                                              • PreInternet01 8 months ago

                                                                                                Oh, nothing gets dropped silently, but bounced right back with `550 5.7.1 Message rejected due to content (Attachment refused: MATCH-code)`.

                                                                                                And for anything oversized, funny or otherwise non-standard, we offer a very convenient file transfer service.

                                                                                                • exmadscientist 8 months ago

                                                                                                  The right way to do it!

                                                                                                  I wish our infrastructure had been so thoughtful.

                                                                                            • undefined 8 months ago
                                                                                              [deleted]
                                                                                              • solatic 8 months ago

                                                                                                Not really a new technique. A long time ago in a galaxy far far away, I needed to get libraries from the internet onto an air-gapped network, and the supported way was to burn them to a disk and bring the disk to be scanned. The scanner never allowed executables so it would always reject the libraries. Can't make this up, but the supported way that InfoSec (!!) explained to us to get past the scanner was to take advantage of WinRAR being available on the network, so split the rar archive a bunch of times (foo.r01, foo.r02,...) and the scanner, being unable to parse them nor execute them, would happily rubber-stamp them and pass them along. As long as the process was followed, InfoSec was happy. Thankfully this was before the days when people were really worried about supply chain security.

                                                                                                Glad to see this bit of security theater recognized as such.

                                                                                                • yieldcrv 8 months ago

                                                                                                  What year is it

                                                                                                  • pjdkoch 8 months ago

                                                                                                    Now?

                                                                                                    • nxnxhxnxhz 8 months ago

                                                                                                      [flagged]