• badlibrarian 2 minutes ago

    No climate control. No backup power. And it's secured by a wireless camera sitting in a potted plant. Bless them, but wow.

    • arjie 26 minutes ago

      This is very cool. One thing I am curious about is the software side of things and the details of the hardware. What is the filesystem and RAID (or lack of) layer to deal with this optimally? Looking into it a little:

      * power budget dominates everything: I have access to a lot of rack hardware from old connections, but I don't want to put the army of old stuff in my cabinet because it will blow my power budget for not that much performance in comparison to my 9755. What disks does the IA use? Any specific variety or like Backblaze a large variety?

      * magnetic is bloody slow: I'm not the Internet Archive so I'm just going to have a couple of machines with a few hundred TiB. I'm planning on making them all a big zfs so I can deduplicate but it seems like if I get a single disk failure I'm doomed to a massive rebuild

      I'm sure I can work it out with a modern LLM, but maybe someone here has experience with actually running massive storage and the use-case where tomorrow's data is almost the same as today's - as is the case with the Internet Archive where tomorrow's copy of wiki.roshangeorge.dev will look, even at the block level, like yesterday's copy.

      The last time I built with multi-petabyte datasets we were still using Hadoop on HDFS, haha!

      • tylerchilds 2 hours ago

        Why’s Wendy’s Terracotta moved?

      • ranger_danger 2 hours ago

        I was hoping an article about IA's storage would go into detail about how their storage currently works, what kind of devices they use, how much they store, how quickly they add new data, the costs etc., but this seems to only talk about quite old stats.

        • jonas21 43 minutes ago

          It does have these details for the current generation hardware. And if you want more, click on the link at the top:

          https://hackernoon.com/the-long-now-of-the-web-inside-the-in...

          • reaperducer 28 minutes ago

            Yeah, this is just blogspam. Some guy re-hashing the Hackernoon article, interspersed with his own comments.

            I wouldn't be surprised if it's AI.

            It's time to come up with a term for blog posts that are just AI-augmented re-hashes of other people's writing.

            Maybe blogslop.

            • dexdal 18 minutes ago

              That pattern shows up when publishing has near-zero cost and review has no gate. The fix is procedural: define what counts as original contribution and require a quick verification pass before posting. Without an input filter and a stop rule, you get infinite rephrases that drown out the scarce primary work.

          • metadat 42 minutes ago

            The Internet Archive's Infrastructure https://news.ycombinator.com/item?id=46613324 - 8 days ago, 124 comments