• compressedgas a day ago

    It works. Already implemented: https://rdiff-backup.net/ https://github.com/rdiff-backup/rdiff-backup

    There are also other tools which have implemented reverse incremental backup or backup with reverse deduplication which store the most recent backup in contiguous form and fragment the older backups.

    • datastack a day ago

      Thank you for bringing this to my attention. Knowing that there is a working product using this approach gives me confidence. I'm working on a simple backup app for my personal/family use, so good to know I'm not heading in the wrong direction

      • trod1234 5 hours ago

        These type of projects can easily get sidetracked without a overarching goal. Are you looking to do something specific?

        An app (that requires remote infrastructure), seems a bit overkill and if your going through the hassle of doing that you might as well set up the equivalent of what MS used to call the Modern Desktop Experience which is how many enterprise level customers have their systems configured now.

        The core parts are cloud-based IDp, storage, and a slipstreamed deployment image which with network connectivity will pull down the config and sets the desired state, replicating the workspace down as needed (with OneDrive).

        Backup data layout/strategy/BCDR plan can then be automated from the workspace/IDp/cloud-storage backend with no user interaction/learning curve.

        If hardware fails, you use the deployment image to enroll new hardware, login and replicate the user related state down, etc. Automation for recurring tasks can be matched up to the device lifecycle phases (Provision, Enrollment, Recovery, Migration, Retirement). This is basically done in a professional setup with EntraID/Autopilot MDM with MSO365 plans. You can easily set up equivalents but you have to write your own glue.

        Most of that structure was taken from Linux grey beards ages ago, MS just made a lot of glue and put it in a nice package.

    • ahazred8ta a day ago

      For reference: a comprehensive backup + security plan for individuals https://nau.github.io/triplesec/

      • datastack a day ago

        Great resource in general, will look into it if it describes how to implement this backup scheme

      • dr_kiszonka a day ago

        It sounds like this method is I/O intensive as you are writing the complete image at every backup time. Theoretically, it could be problematic when dealing with large backups in terms of speed, hardware longevity, and write errors, and I am not sure how you would recover from such errors without also storing the first image. (Or I might be misunderstanding your idea. It is not my area.)

        • datastack a day ago

          You can see in step 2 and 3 that no full copy is written every time. It's only move operations to create the delta, and copy of new or changes files, so quite minimal on IO.

        • tacostakohashi 8 hours ago

          Sounds a bit like the netapp .snapshot directory thing (which is no bad thing).

          • brudgers 3 hours ago

            In principle, deleting archived data is the opposite of backing up.

            It is not clear what problem with existing backup strategies this solves.

            I mean you can use a traditional delta backup tool and make one full copy of the current data separately with less chance for errors.

            It seems too clever by half and it is not clear to me from the question what problem it solves. Good luck.

            • codingdave a day ago

              The low likelihood / high impact edge case this does not handle is: "Oops, our data center blew up." An extreme scenario, but one that this method does not handle. It instead turns your most recent backup into a single point of failure because you cannot restore from other backups.

              • datastack a day ago

                This sounds more like a downside of single site backups

                • codingdave 17 hours ago

                  Totally. Which is exactly what your post outlines. You said it yourself: "Only one full copy is needed." You would need to update your logic to have a 2nd copy pushed offsite at some point if you wanted to resolve this edge case.

              • vrighter a day ago

                I used to work on backup software. Our first version did exactly that. It was a selling point. We later switched approach to a deduplication based one.

                • datastack 21 hours ago

                  Exciting!

                  Yes, the deduplicated approach is superior, if you can accept requiring dedicated software to read the data or can rely on a file system that supports it (like Unix with hard links).

                  I'm looking for a cross-platform solution that is simple and can restore files without any app (in case I didn't maintain my app for the next twenty years).

                  I'm curious if the software you were working on used proprietary format, was relying on Linux, or used some other method of duplication.

                • rawgabbit a day ago

                  What happens if in the process of all this read write rewrite, data is corrupted?

                  • datastack a day ago

                    In this algo nothing is rewritten. A diff between source and latest is made, the changed or deleted files archives to a folder and the latest folder updated with source, like r sync. No more IO than any other backup tool. Versions other than the last one are never touched again

                  • wmf a day ago

                    It seems like ZFS/Btrfs snapshots would do this.

                    • HumanOstrich a day ago

                      No, they work the opposite way using copy-on-write.

                      • wmf a day ago

                        "For files that changed or were deleted: move them into a new delta folder. For new/changed files: copy them into the latest snapshot folder." is just redneck copy-on-write. It's the same result but less efficient under the hood.

                        • datastack a day ago

                          Nice to realize that this boils down to copy on write. Makes it easier to explain.

                          • sandreas 15 hours ago

                            Is there a reason NOT to use ZFS or BTRFS?

                            I mean the idea sounds cool but what are you missing? ZFS even works on Windows these days and with tools like zrepl you can configure time based snapshotting, auto-sync and auto-cleanup

                    • jiggawatts a day ago

                      The more common approach now is incrementals forever with occasional synthetic full backups computed at the storage end. This minimises backup time and data movement.

                      • datastack 21 hours ago

                        I agree it seems more common. However back-up time and data movement should be equivalent if you follow the algo steps.

                        According to chat GPT the forward delta approach is common because it can be implemented purely append only, whereas reverse deltas require the last snapshot to be mutable. This doesn't work well for backup tapes.

                        Do you also think that the forward delta approach is a mere historical artifact?

                        Although perhaps backup tapes are still widely used, I have no idea, I am not in this field. If so the reverse delta approach would not work in industrial settings.

                        • jiggawatts 20 hours ago

                          Nobody[1] backs up directly to tape any more. It’s typically SSD to cheap disk with a copy to tape hours later.

                          This is more-or-less how most cloud backups work. You copy your “premium” SSD to something like a shingled spinning rust (SMR) that behaves almost like tape for writes but like a disk for reads. Then monthly this is compacted and/or archived to tape.

                          [1] For some values of nobody.