• EdNutting a day ago

    No staging environment?

    No prior attempt to follow best practices (e.g. deletion protection in production)? Nor manual gating of production changes?

    No attempt to review Claude's actions before performing them?

    No management of Terraform state file?

    No offline backups?

    And to top it off, Claude (the supposed expert tool) didn't repeatedly output "Are you insane? No, I'm not working on that." - Clearly Claude wasn't particularly expert else, like any principal engineer, it would've refused and suggested sensible steps first.

    (If you, dear reader of this comment, are going to defend Claude, first you need to declare whether you view it as just another development tool, or as a replacement for engineers. If the former, then yeah, this is user error and I agree with you - tools have limits and Claude isn't as good as the hyped-up claims - clearly it failed to output the obvious gating questions. If the latter, then you cannot defend Claude's failure to act like a senior engineer in this situation.)

    • jkubicek a day ago

      I’m going to defend Claude. If you give a robot the ability to delete production it’s going to delete production. This is 100% the user’s fault.

      This problem has not yet been solved and will never be solved.

      • elashri a day ago

        > If you give a robot the ability to delete production it’s going to delete production

        If you give an intern the ability to delete production, it's going to delete production. But to be honest you can as well replace "intern" or "robot" by human in general. Deletion in production should have safety layers that anyone cannot accidentally do it, specially without the ability of rolling back.

        • belZaah 14 hours ago

          That’s a broken analogy. An intern and a llm have completely different failure modes. An intern has some understanding of their limits and the llm just doesn’t. The thing, that looks remarkably human, will make mistakes in ways no human would. That’s where the danger lies: we see the human-like thing be better at things difficult for humans and assume them to be better across the board. That is not the case.

          • Neywiny a day ago

            I think the difference, though maybe I'm incorrect, is that when we have interns on our codebase they get restricted permissions. Can't push to prod, need pull requests with approvals and reviews, etc. Certainly can't delete backups. Whoever setup the robot's permissions did it wrong. Which is interesting because early on there were people complaining that these AIs refused to push to main, but now this stuff keeps happening.

            • hvb2 a day ago

              > Whoever setup the robot's permissions did it wrong.

              It doesn't have permissions of it's own. The way he's using it, it has his permissions.

              Also, in order to be able to do deployments like that you need pretty wide permissions. Deleting a database is one of them, if you're renaming things for example. That stuff should typically not happen in prd though

              • Neywiny a day ago

                That was my first guess but I wasn't sure. I've seen AIs as authors on things. So yeah that's even worse. You don't give the intern your credentials.

            • anonzzzies a day ago

              I had a senior tech lead delete production doing a late night quick fix. Especially in panic mode where sometimes processes are ignored, things are going to go wrong. Don't need interns for that, nor llms.

              • elashri a day ago

                Which is why I actually said replace intern or robot by "human" in general in my comment.

            • EdNutting a day ago

              "Anything that can go wrong, will go wrong. Including deleting production."

              I'm also waiting for the day we see a "Claude sold my production database on the darkweb" post haha.

            • semiquaver a day ago

              The narrative incudes this:

                > Claude was trying to talk me out of [reusing an existing AWS account for an unrelated project], saying I should keep it separate, but I wanted to save a bit
              
              So in a very real sense the LLM did object to this and OP insisted. If Claude had objected to the more specific step that deleted the DB, it seems likely OP would also have pushed past the objection.
              • EdNutting 21 hours ago

                An expert would’ve at least taken a backup or checked existing backups weren’t going to be destroyed. Silently - without asking their manager - they just do defensive engineering as a good practice. Or they would’ve, at minimum, highlighted the suggestion, which doesn’t seem to have happened in this case. As someone who recently did a short term contract to capture manually created AWS infrastructure into CDK, I can tell you this was one of my first moves!

                So, Claude as a tool: sure, this is user error. Claude could be improved by making it suggest defensive steps and making it push harder for the user to do them first, but it’s still down to the user. I’ve repeatedly encountered this issue that Claude doesn’t plan for engineering - it just plans to code - even with Claude.md and skills and such.

                Claude as a replacement for engineers? Well, yeah, the marketing is just that: marketing.

              • QuantumGood a day ago

                Marketing "protect your business from harm by Claude internally" seems to be a growth industry.

                • fragmede a day ago

                  And they said Claude was gonna take our jobs.

                • tapoxi a day ago

                  The user's bio is literally "Teaching engineers to build production AI systems"

                  It would be funny if these LinkedIn/Twitter influencers weren't so widespread.

                  • akouri a day ago

                    Well, he definitely taught me what not to do

                  • gos9 a day ago

                    Considering engineers have made similar mistakes I’m not so sure that’s a great razor, haha

                    • overfeed a day ago

                      Usually junior engineers accidentally drop dbs.

                      Lacking backups and staging/test environments is organizational failure: everyone who is between senior and the CTO is to blame for not fixing it post-haste.

                      • weikju 16 hours ago

                        Itd be nice if our computer programs were more deterministic, that's what we use computers for. Not to repeat failure modes of humans.

                        • EdNutting a day ago

                          Usually engineers who have not recently been trained on well documented examples of what to do and what not to do and the consequences ;)

                          (Yes, I chose the word "trained" intentionally)

                          • ethbr1 a day ago

                            So what I hear is after this makes the training set, Claude Code might get a promotion from junior to level 1?

                        • groby_b a day ago

                          Best part: The guy's "training engineers to use AI in production environments".

                          And it's not all Claude Code - loved the part where he decided, mid disaster recovery, that that would be a good time to simplify his load balancers.

                          It's a case of just desserts.

                          • tayo42 a day ago

                            > And to top it off, Claude (the supposed expert tool) didn't repeatedly output "Are you insane?

                            It did though acoording to the article and he ignored it.

                            The Ai can only work with what you tell it.

                            • EdNutting a day ago

                              The difference is, an expert engineer would flat-out refuse to do these things and would keep pushing back. Claude may sometimes attempt _one time_ to warn someone, and then (after consent fatigue means they're just blindly clicking "yes"), it ploughs right ahead without further complaint.

                              • tayo42 a day ago

                                Do you really want the Ai to not do the things you tell it?

                                It only knows what you tell it, if you tell it risky operations are OK, what do you expect?

                                • EdNutting a day ago

                                  That depends.

                                  As per my root comment, if you ignore a lot of the marketing of AI and view it as just a tool, then I agree with your point about it doing what you tell it but I still want the tool to help me avoid making mistakes (and I’d like it to work quite hard at that - much harder, it seems, than it currently does). And probably to the extent that it refuses to run dangerous commands for me and tells me to copy/paste them and run them myself if I really want to take the risk.

                                  If, however, we swallow the marketing hook, line and sinker: then yeah, I want the AI to behave like the experienced engineer it’s supposed to be.

                                  • tayo42 a day ago

                                    An experienced engineer still gets decisions overridden all of the time and has to suck it up or get fired.

                                    • EdNutting a day ago

                                      True.. though an experienced engineer would also risk getting fired for doing all the other stuff the OP did too. Especially if they made minimal attempts to highlight consequences/outcomes to management in advance..

                            • xmodem a day ago

                              The fact that the AI agent will just go and attempt to do whatever insane shit I can dream up is both the most fun thing about playing with it, and also terrifying enough to make me review its output carefully before it goes anywhere near production.

                              (Hot take: If you're not using --dangerously-skip-permissions, you don't have enough confidence in your sandbox and you probably shouldn't be using a coding agent in that environment)

                              • EdNutting a day ago

                                Hot take indeed. Unfortunately it's too blunt an instrument. I can't control "you may search for XYZ about my codebase but not W because W is IP-sensitive". So, to retain Web Search / Web Fetch for when it's useful, all such tool uses must be reviewed to ensure nothing sensitive goes outside the trust boundary.

                                Yes, I'm aware this implies differing levels of trust for data passing through Claude versus through public search. It's okay for everyone to have different policies on this depending on specific context, use-case and trust policies.

                              • FireBeyond a day ago

                                All of those things are reasonable questions. I've also watched videos talking on using Claude's built in hooks to do everything from "never git push, only prompt me that -I- should", and beyond, "if environment variable x = y (perhaps a la DEPLOYMENT_TARGET=prod) then do not execute any command that does not have a "dry run" mode" (or do not execute any commands, only tell me what to execute)."

                                I've also trashed production by "hand" in my previous time as an SRE.

                                > If the latter, then you cannot defend Claude's failure to act like a senior engineer in this situation.

                                This is rather black and white. Is it acceptable? No. Is it to be expected of a senior engineer? Yes, at times. If you have any length of career as an engineer or ops person and you tell me that you've never executed problematic commands whether or not caught by security nets, bluntly, you're lying.

                                • EdNutting a day ago

                                  Yeah the videos hyping up Claude and other such AI tools don't help matters.

                                  For sure I've made mistakes. But I also don't write the following on my CV:

                                  "PhD-level expert in infrastructure and trained on the entire internet of examples of what to do and what not to do; I can replace your entire infrastructure team and do everything else in your codebase too, without any review."

                                  And yet that's how Claude is marketed. AI tools in general have been repeatedly marketed as PhD-level experts in _every_ area of information-era work, especially code. They encourage hands-off (or consent-fatigued) usage.

                                  [Just to be clear, in case anyone wants to hire me in future: I've never accidentally deleted a production database. I've never even irrecoverably destroyed production data - nor had to rely on AWS (or another provider) to recover the data for me. I've made mistakes, mostly in sandbox environments, sometimes stressful ones in production, but nothing even close to what the OP did.]

                              • SunshineTheCat a day ago

                                Putting yourself in a situation where this could happen is kinda insane, right? Could be something I'm missing.

                                I can't think of any specific example where I would let any agent touch a production environment, the least of which, data. AI aside, doing any major changes makes sense to do in a dev/staging/preview environment first.

                                Not really sure what the lesson would be here. Don't punch yourself in the face repeatedly?

                                • levkk a day ago

                                  As the tool gets better, people trust it more. It's like Tesla's self-driving: "almost" works, and that's good enough for people to take their hands off the wheel, for better or for worse.

                                  The "almost" part of automation is the issue + the marketing attached to it of course, to make it a product people want to buy. This is the expected outcome and is already priced in.

                                  • nine_k a day ago

                                    I would say the opposite here. The perpetrator has rejected multiple Claude's warnings about bad consequences, and multiple Claude's suggestions to act in safer ways. It reminds me of an impatient boss who demands that an engineer stopped all this nonsense talk about safety, and just did the damn thing quick and dirty.

                                    Those guys who blew up the Chernobyl NPP also had to deliberately disable multiple safety check systems which would have prevented the catastrophe. Well, you get what you ask for.

                                    • fragmede a day ago

                                      I view it more as "I crashed my car, I should have been wearing my seat belt, wear yours!"

                                      Source: had codex delete my entire project folder including .git. Thankfully I had a backup.

                                    • sofixa a day ago

                                      Exactly, Waymo were talking about this a few year back, they found that building it up gradually will not work, because people would stop paying attention when it's "almost" there, until it isn't and it crashes. So they set out on having their automation good enough to operate on its own without a human driver before starting to deploy it.

                                    • nativeit 6 hours ago

                                      I wonder if Iran is considered a “production environment”?

                                      • jeanlucas a day ago

                                        Yep, you're not insane, they were amateur.

                                        • insane_dreamer 15 hours ago

                                          But that's the "promise" of AI (that management believes), isn't it? That it can replace an engineer because it's as good or better -- so why wouldn't you allow it to touch your production database? (I agree with you, just pointing out the difference between what's being sold and reality.)

                                          • happytoexplain a day ago

                                            Why are you writing in this defensive manner? The post isn't an anti-AI screed, it's a "I screwed up, here's what I did and how to avoid it."

                                            You say "Not really sure what the lesson would be here", but the entire contents of the blogpost is a lesson. He's writing about what he changed to not make the same mistake.

                                            There is a total mismatch between what's written and how you're responding. We don't normally call people idiots for trying to help others avoid their mistakes.

                                            The culture war around AI is obliterating discourse. Absolutely everything is forced through the lens of pro-AI or anti-AI, even when it's a completely neutral, "I deleted my data, here's what I changed to avoid doing it again", where the tool in question just happens to be AI.

                                            • abustamam a day ago

                                              I didn't take it to be defensive. A bit tongue in cheek, but not defensive. I think the person you're responding to has a good point though. AI or not, you probably shouldn't futz around with prod before doing so in a lower env. Guardrails for both AI and humans are important.

                                          • stavarotti a day ago

                                            If you found this post helpful, follow me for more content like this.

                                            I publish a weekly newsletter where I share practical insights on data and AI.

                                            It focuses on projects I'm working on + interesting tools and resources I've recently tried: https://alexeyondata.substack.com

                                            It's hard to take the author seriously when this immediately follows the post. I can only conclude that this post was for the views not anything to learn from or be concerned about.

                                            • inhumantsar a day ago

                                              what's truly incredible is that this person is selling bootcamps.

                                              the things they "didn't realize" or "didn't know" are basics. they're things you would know if you spent any time at all with terraform or AWS.

                                              all the remediations are table stakes. things you should at least know about before using terraform. things you would learn by skimming the docs (or at least asking Claude about best practices).

                                              even ignoring the technical aspects, a tiny amount of consideration at any point in that process would have made it clear to any competent person that they should stop and question their assumptions.

                                              I mean, shit happens. good engineers take down prod all the time. but damn man, to miss those basics entirely while selling courses on engineering is just astounding.

                                              the grifter mentality is probably so deeply engrained that I'm willing to bet that they never once thought "I'm totally qualified to sell courses", let alone question the thought.

                                              • 6thbit a day ago

                                                why would you repost that here?

                                                are you secretly OP trying to get substack hits?

                                                • fragmede a day ago

                                                  It's hard to take you take seriously. His blog has a generic "read more" footer and that's a demerit worth mentioning? What do serious people in your world that write blogs do? Not want people to read their other content? In what world do you live in that writers (serious or not) don't want you to read their other work?

                                                  • LelouBil a day ago

                                                    It's not a generic footer, it's their reply directly to their tweet about the incident.

                                                    I agree with the person you are replying to, writing a tweet like :

                                                    "How I misused AI and caused an outage"

                                                    and replying to this very tweet saying

                                                    "Here's a blog where I write insights about AI"

                                                    Obviously do not make me want to read the blog.

                                                    • mnky9800n a day ago

                                                      Some people seem think that self promotion is wrong and work should stand on its own merits. I don’t think this way. It’s important to think about engaging and attracting eyes to your ideas. If you don’t why bother sharing them?

                                                      • fragmede a day ago

                                                        Self promotion being wrong has never met reverse psychology. Ego hacking is a thing.

                                                  • xmodem a day ago

                                                    An engineer recklessly ran untrusted code directly in a production environment. And then told on himself on Twitter.

                                                    • petcat a day ago

                                                      From the article, it sounds like that engineer did a lot of other reckless things even before handing the tasks over to the AI agent to continue the recklessness with even more abandon.

                                                      This is a case study in "if you don't know what you're doing, the answer is not just to hand it over to some AI bot to do it for you."

                                                      The answer is to hire a professional. That is if you care about your data, or even just your reputation.

                                                      • abustamam a day ago

                                                        Yeah I always considered AI to be an accelerator. If you don't know what you're doing and would break stuff without Ai, AI will just accelerate that.

                                                        • EdNutting a day ago

                                                          "To err is human; To really foul things up requires a computer."

                                                          Extended with: "To really foul things up quickly, requires an AI tool."

                                                          • VWWHFSfQ a day ago

                                                            > before handing the tasks over to the AI agent to continue the recklessness with even more abandon

                                                            Which is a funny outcome of this because apparently the AI agent (Claude) tried to talk him out of doing some of the crazy stuff he wanted to do! Not only did he make bad decisions before invoking the AI, he even ignored and overruled the agent when it was flagging problems with the approach.

                                                          • Ancalagon a day ago

                                                            This is literally what major company execs want engineers and eventually their agents to do.

                                                          • NicuCalcea a day ago

                                                            Quite funny that that author followed up with this tweet:

                                                            > If you found this post helpful, follow me for more content like this.

                                                            > I publish a weekly newsletter where I share practical insights on data and AI.

                                                            • semiquaver a day ago

                                                              I’m not going to “defend” the LLM here but this:

                                                                > I forgot to use the state file, as it was on my old computer
                                                              
                                                              indicates that this person did not really know what they were doing in the first place. I honestly think using an LLM to do the terraform setup in the first place would probably have led to better outcomes.
                                                              • subscribed 2 hours ago

                                                                And the single terraform wiped the snapshots too?

                                                                I'd say skill issue.

                                                                • mizzao 16 hours ago

                                                                  AI is like having a chainsaw when you only had a bow saw before. You can cut down the tree 10x faster or you can cut off your foot completely.

                                                                • tomcatfish a day ago

                                                                  Despite multiple comments blaming the AI agent, I think it's the backups that are the problem here, right? With backups, almost any destructive action can be rolled back, whether it's from a dumb robot, a mistaken junior, or a sleep-deprived senior. Without, you're sort of running the clock waiting for disaster.

                                                                  • forgotaccount3 a day ago

                                                                    Yes, backups are great but a 'dumb robot' or a 'mistaken junior' shouldn't have access to prod.

                                                                    And a sleep-deprived senior? Even then. They shouldn't have access to destructive effects on prod.

                                                                    Maybe the senior can get broader access in a time-limited scope if senior management temporarily escalates the developers access to address a pressing production issue, but at that point the person addressing the issue shouldn't be fighting to stay awake nor lulled into a false sense of security as during day to day operations.

                                                                    Otherwise it's only the release pipeline that should have permissions to take destructive actions on production and those actions should be released as part of a peer reviewed set of changes through the pipeline.

                                                                    • JoBrad a day ago

                                                                      If a sleep-deprived senior shouldn’t have access to prod, I think we have big problems, frankly.

                                                                      • fragmede a day ago

                                                                        Which, if you're Google-sized, you have follow-the-sun rotations, in order to avoid that problem. But what about the rest of the class?

                                                                      • charcircuit a day ago

                                                                        But smart robots like Claude should and will have access to production. There has to be something figured out on how to make sure operation remains smooth. The argument of don't do that will not be a viable position to hold long term. Keeping a human in the loop is not necessary.

                                                                        • b112 a day ago

                                                                          It is absolutely necessary. Point in fact, most DEVs don't have access to PROD either. Specialists do.

                                                                          Clause, maybe, is a junior DEV.

                                                                          Not a release engineer.

                                                                          • abustamam a day ago

                                                                            Should and will are pretty large assumptions given the the post we're commenting on!

                                                                            > will not be a viable position to hold long term

                                                                            Why not? We've literally done it without robots, smart or dumb, for years.

                                                                            • charcircuit a day ago

                                                                              >We've literally done it without robots, smart or dumb, for years.

                                                                              And we've written extremely buggy and insecure C code for decades too. That doesn't mean that we should keep doing that. AI can much faster troubleshoot and resolve production issues than humans. Putting humans in the loop will cause for longer downtime and more revenue loss.

                                                                              • abustamam a day ago

                                                                                > AI can much faster troubleshoot and resolve production issues than humans

                                                                                Can, yes, with proper guardrails. The problem is that it seems like every team is learning this the hard way. It'd be great to have a magical robot that could magically solve all our problems without the risk of it wrecking everything. But most teams aren't there yet and to suggest that it's THE way to go without the nuances of "btw it could delete your prod db" is irresponsible at best.

                                                                                • charcircuit a day ago

                                                                                  It didn't delete the prod db on its own a human introduced such error, and if there were backups it could fix such a mistake.

                                                                                  • abustamam a day ago

                                                                                    There were backups. The AI deleted them.

                                                                                    • charcircuit 19 hours ago

                                                                                      When people talk about backups they typically mean located somewhere else. If one terraform command can take out the db and the backups then those backups aren't really separate. It's like using RAID as a backup. Sure it may help, but there are cases where you can lose everything.

                                                                            • QuercusMax a day ago

                                                                              Nobody, not even a "smart robot" should have unfettered read-write production access without guardrails. Read-only? Sure - that's a totally different story.

                                                                              Read-write production access without even the equivalent of "sudo" is just insane and asking for trouble.

                                                                              • esseph a day ago

                                                                                > Keeping a human in the loop is not necessary.

                                                                                You don't work in anything considered Safety Critical, do you?

                                                                            • happytoexplain a day ago

                                                                              They are two orthogonal issues. One doesn't make the other irrelevant.

                                                                              • tomcatfish a day ago

                                                                                I agree that a second issue doesn't erase the first, but also I've got enough work experience to know that a system which can be brought down by 1 person no matter the tooling they use is a system not destined to last for long.

                                                                              • clouedoc a day ago

                                                                                100% agree. Everyone should always backup their production database somewhere where's it's not trivial to delete.

                                                                                • Joel_Mckay a day ago

                                                                                  Zero workmanship was always worth nothing.

                                                                                  It usually takes about 10 months for folks to have a moment of clarity. Or for the true believer they often double down on the obvious mistakes. =3

                                                                                  • hobs a day ago

                                                                                    You need to care about your Recovery Time (how long does it take to get back up again?) and your Recovery Point(how long since your backup was taken?) and it gets Much Worse when you start distributing state around your various cloud systems - oh did that queue already get that message? how do we re-send that? etc

                                                                                  • import a day ago

                                                                                    Well apparently guy were running tf from his computer and ask claude to apply changes while not providing state file, and “blaming” claude for the catastrophic result?

                                                                                    • 01284a7e a day ago

                                                                                      I'm cool with blogging about your mess-ups, sort of. Is "I'm incompetent" a good content strategy though? Yeah, you're going to get a lot of traffic to that post, but what are you signaling? Your product is a thousand bucks a year. I would not go near it.

                                                                                      • hahajk a day ago

                                                                                        I'm absolutely loving the genre of "chatbot informing user it messed up real bad":

                                                                                        > CRITICAL: Everything was destroyed. Your production database is GONE. Let me check if there are any backups:

                                                                                        > ...

                                                                                        > No snapshots found. The database is completely lost.

                                                                                        • sva_ 12 hours ago

                                                                                          It's the still kind of uplifting tone that gets me. Like the task has finally been completed

                                                                                        • INTPenis a day ago

                                                                                          And this guy is selling tech courses.

                                                                                          I'm no AI advocate, I have been using it for 6 months now, it's a very powerful tool, and powerful tools need to be respected. Clearly this guy has no respect for their infrastructure.

                                                                                          The screenshot he has, "Let me check if there are backups", a typical example of how lazy people use AI.

                                                                                          • Garlef 13 hours ago

                                                                                            A good rule of thumb:

                                                                                            - Don't even let dev machines access the infra directly (unless you're super early in a greenfield project): No local deploys, no SSH. Everything should go through either the pipeline or tools.

                                                                                            Why?

                                                                                            - The moment you "need" to do some of these, you've discovered a usecase that will most likely repeat.

                                                                                            - By letting every dev rediscover this usecase, you'll have hidden knowledge, and a multitude of solutions.

                                                                                            In conversation fragments:

                                                                                            - "... let me just quickly check if there's still enough disk space on the instance"

                                                                                            - "Hey Kat, could you get me the numbers again? I need them for a report." "sure, I'll run my script and send them to you in slack" "ah.. Could you also get them for last quarter? They're not in slack anymore"

                                                                                            • fjwater a day ago

                                                                                              Why would you do this?

                                                                                              > Make no backups

                                                                                              > Hand off all power to AI

                                                                                              > Post about it on twitter

                                                                                              > "Teaching engineers to build production AI systems"

                                                                                              This has to be ragebait to promote his course, no?

                                                                                              • Havoc a day ago

                                                                                                I love the guys twitter bio thing

                                                                                                >Teaching engineers to build production AI systems

                                                                                                >100,000+ learners

                                                                                                • sornaensis a day ago

                                                                                                  Can someone explain to me why anyone would do this, and then tweet about it..? Is he really trying to blame 'ai agents' and 'terraform' .. ??

                                                                                                • tdsanchez a day ago

                                                                                                  That’s why you tell CC to do a ‘terraform plan’ to verify it’s not wrecking critical infrastructure and NEVER vibe-code infrastructure.

                                                                                                  • happytoexplain a day ago

                                                                                                    Yes, the engineer is at fault, but the instinct to attack him is distracting from the more interesting conversation, which is that AI and agents are making it more complicated to properly set up security. I imagine it will get better over time, but right now, it's much easier to shoot yourself in the foot than ever before.

                                                                                                    • groby_b a day ago

                                                                                                      They really don't.

                                                                                                      They only make it "more complicated" if you have absolutely no clue and thought typing "make it so" in a chat window is all you need.

                                                                                                      Every single failure here is precipitated by user stupidity. No management of terraforms tate. No verification of backup/restore procedure. No impact-gating for prod changes. No IAM roles. Reconfiguring prod while restoring a backup.

                                                                                                      None of that rests on AI. All of that rests on clueless people thinking AI makes them smart.

                                                                                                    • samuelknight a day ago

                                                                                                      One of Terraform's most powerful features that it will tell exactly which resources change before it makes the changes. The hard part is writing Terraform, not reviewing and running one command. In my workflows I am the one who runs "terraform apply", NOT the agent.

                                                                                                      • aldarisbm a day ago

                                                                                                        This guy is... an interesting person.

                                                                                                        He had a state file somewhere that was aligned to his current infrastructure... why isn't this on a backend, who really knows...

                                                                                                        He then ran it without a state file and the ran a terraform apply... whatever could get created would get created, whatever conflicted with a resource that already would fail the pipeline... moreso... he could've just terraform destroyed after he let it finish and it would've been a way more clean way to clean up after himself.

                                                                                                        Except... he canceled the terraform apply... saw that it created resources and then tried to guess which resources these were...

                                                                                                        I'm sorry he could've done all of this by himself without any agentic AI. Its PICNIC 100%

                                                                                                        • jinko-niwashi a day ago

                                                                                                          This is what happens when you give an agent execution power without guardrails. The tool isn't the problem — the absence of governance is. In my setup I treat the AI as a junior dev with root access: every destructive operation requires explicit human approval, and the session context includes hard constraints on what it can and can't touch.

                                                                                                          The productivity gains from AI agents are real, but only if you invest in the boring part first — deterministic boundaries that don't depend on the model being smart enough to not break things.

                                                                                                          • swframe2 a day ago

                                                                                                            I suspect we need to build MCP servers that prevent destructive commands. For example, we need a "bash" tool doesn't invoke /usr/bin executables directly. The agent should think it is invoking a unix command but those commands are proxies that prevent destructive operations with no ability for an agent to circumvent the restrictions. If there isn't a MCP server for your specific setup/need, building one just for your need should be your first step.

                                                                                                            • lousken a day ago

                                                                                                              Thankfully, I don't know anyone this insane doing sysadmin job, yet. But if I knew I'd make sure he's fired and never touches prod again.

                                                                                                              • anonzzzies a day ago

                                                                                                                The ability to delete backups is wild. That simply should have have mfa on it by default. Send at least for that event the owner of the account an email and sms; He, All Your Snapshots Are Going To Be Deleted, Enter this OTP if you are! Can't hurt: still automate all, just cannot delete snapshots like that no matter what.

                                                                                                                • Zealotux a day ago

                                                                                                                  To think I used to find Silicon Valley a bit too much on the nose: https://www.youtube.com/watch?v=m0b_D2JgZgY

                                                                                                                  • kayge a day ago

                                                                                                                    Hah, perfect... some variation of "From now on [your AI tool of choice] is banned. Just write code like a normal human f*king being, please." has probably already been used in the real world recently.

                                                                                                                    • ks2048 a day ago

                                                                                                                      Conan O'Brien said Trump is bad for comedians because the absurdity is too direct and there's no room for subtlety. Silicon Valley jokes are headed in the same direction.

                                                                                                                    • newswangerd a day ago

                                                                                                                      There was a project at Ansible that aimed to address this kinda thing when I worked there. The idea was to write policy as code definitions that would prevent users (or AI) from running certain types of automation. I don’t know where that project ended up but reading about this makes me think that they were on to something.

                                                                                                                      • ks2048 a day ago

                                                                                                                        Pairs well with his Twitter bio: "Teaching engineers to build production AI systems".

                                                                                                                        • sakopov a day ago

                                                                                                                          > Founder @DataTalksClub | Teaching engineers to build production AI systems | AI agents, LLMs, ML, data engineering | 100,000+ learners

                                                                                                                          Dear lord, imagine this guy teaching you how to build anything in production...

                                                                                                                          • Robdel12 a day ago

                                                                                                                            This is only the beginning. People are far far too reckless with their LLMs.

                                                                                                                            I am still heavily checking everything they’re doing. I can’t get behind others letting them run freely in loops, maybe I’m “behind”.

                                                                                                                            • 6thbit a day ago

                                                                                                                              Blaming it on AI agents is the new blaming it on the intern.

                                                                                                                              It has never been the intern's fault, it's always the lack of proper authorization mechanisms, privilege management and safeguards.

                                                                                                                              • hyperbolablabla a day ago

                                                                                                                                At the most basic level, even just not letting Claude run terraform apply would've solved this issue. Review the god damn plan first! This is like engineering 101

                                                                                                                                • andrewstuart a day ago

                                                                                                                                  No.

                                                                                                                                  YOU wiped you production database.

                                                                                                                                  YOU failed to have adequate backups.

                                                                                                                                  YOU put Claude Code forward as responsible but it’s just a tool.

                                                                                                                                  YOU are responsible, not “the AI did it!”

                                                                                                                                  • rkuodys a day ago

                                                                                                                                    My feeling was the same reading. If you give a task to a junior and he wipes out a database in production - it is not a fault of a Junior, it is your own fault that he was able to do it.

                                                                                                                                    • codegeek a day ago

                                                                                                                                      "AI did it" is the new "Dog ate my homework". You are blaming someone/something else for your own failures.

                                                                                                                                    • myworkaccount2 a day ago

                                                                                                                                      So even if you delete everything and make sure to keep no backups, amazon can still recover the db. What am i missing here?

                                                                                                                                      • worksonmine a day ago

                                                                                                                                        They have their own backups to protect user data if they fuck something up. I'm guessing those backups were useful in this case. Hopefully they're being pruned after some time, but I don't know.

                                                                                                                                      • johann8384 a day ago

                                                                                                                                        Fixed it: "We used Claude code to write Terraform that wiped out production database".

                                                                                                                                        • andy_ppp a day ago

                                                                                                                                          I can’t wait for ChatGPT to control the autonomous weapons, screw it put it in charge of the nukes!

                                                                                                                                          • dubeye a day ago

                                                                                                                                            I'm an amateur and would never let AI touch a live database....

                                                                                                                                            • rvz a day ago

                                                                                                                                              Not the first time i've seen vibe coders causing havoc on production systems.

                                                                                                                                              Under no circumstances should you even let an AI agent near production system at all.

                                                                                                                                              Absolutely irresponsible.

                                                                                                                                              • whalesalad a day ago

                                                                                                                                                I do not let any `terraform apply` commands occur via automation in my org.

                                                                                                                                                • BoredPositron a day ago

                                                                                                                                                  You wiped your production database. You actively ignored the warnings of your tooling and your backup strategy was bad. Incompetence as content is surging in the last few weeks.

                                                                                                                                                  • throwawaypath a day ago

                                                                                                                                                    >backup strategy was bad.

                                                                                                                                                    Backup strategy was nonexistent. There was no backup strategy.

                                                                                                                                                    • QuercusMax a day ago

                                                                                                                                                      And this guy has the audacity to run an AI Engineering "buildcamp" and publishes an AI engineering newsletter. I would not take any advice or training from someone who is so incredibly cavalier about their data.

                                                                                                                                                      • rvz a day ago

                                                                                                                                                        Absolutely this.

                                                                                                                                                        We should hold very high standards to any company that preaches every day about using AI agents working on production systems (which you should not do).

                                                                                                                                                        Starting with the AI companies, then GitHub, and the rest.

                                                                                                                                                        • TutleCpt a day ago

                                                                                                                                                          This.

                                                                                                                                                        • zthrowaway a day ago

                                                                                                                                                          Stores his production TF state on his local computer…

                                                                                                                                                          I don’t think AI is to blame here.

                                                                                                                                                          • networkcat a day ago

                                                                                                                                                            This is exactly why you can't replace DevOps engineers with AI

                                                                                                                                                            • bakies a day ago

                                                                                                                                                              This is why Claude only has read access to all my infrastructure.

                                                                                                                                                              • jdlyga a day ago

                                                                                                                                                                Rookie move. Why is Claude Code able to run terraform?

                                                                                                                                                                • docjay a day ago

                                                                                                                                                                  Once again there’s another horror story from someone who doesn’t use punctuation. I’d love to see the rest of the prompts; I’d bet real cash they’re a flavor of:

                                                                                                                                                                  “but wont it break prod how can i tell”

                                                                                                                                                                  “i don want yiu to modify it yet make a backup”

                                                                                                                                                                  “why did you do it????? undo undo”

                                                                                                                                                                  “read the file…later i will ask you questions”

                                                                                                                                                                  Every single story I see has the same issues.

                                                                                                                                                                  They’re token prediction models trying to predict the next word based on a context window full of structured code and a 13 year old girl texting her boyfriend. I really thought people understood what “language models” are really doing, at least at a very high level, and would know to structure their prompts based on the style of the training content they want the LLM to emulate.

                                                                                                                                                                  • zamalek a day ago

                                                                                                                                                                    Friendly reminder: most cloud providers have deletion locks. Go and enable them on your prod dbs right now.

                                                                                                                                                                    Sure, Claude could just remove the lock - but it's one more gate.

                                                                                                                                                                    Edit: these existed long before agents, and for good reason: mistakes happen. Last week I removed tf destroy from a GitHub workflow, because it was 16px away from apply in the dropdown. Lock your dbs, irrespective of your take on agents.

                                                                                                                                                                    • moralestapia a day ago

                                                                                                                                                                      Wow. For this to happen, there's like 5 levels of sloppiness that need to be (or not be) there.

                                                                                                                                                                      Good thing the guy is it's own boss, I would've fired his ass immediately and sue for damages as well. This is 100% neglectful behavior.

                                                                                                                                                                      • Mars008 a day ago

                                                                                                                                                                        Vibeadministration is coming after vibecoding. Get ready...

                                                                                                                                                                        • renewiltord a day ago

                                                                                                                                                                          I don’t use Terraform much anymore because don’t need it but that’s not how you use it.

                                                                                                                                                                          Always forward evolve infra. Terraform apply to add infra, then remove the definition and terraform apply to destroy it. There’s no use in running terraform destroy directly on a routine basis.

                                                                                                                                                                          Also, I assume you defined RDS snapshots also in the same state? This is clearly erroneous. It means a malformed apply human or agent results in snapshot deletion.

                                                                                                                                                                          The use of terraform destroy is a footgun waiting for a tired human to destroy things. The lesson has nothing to do with agent.

                                                                                                                                                                          • HackerThemAll a day ago

                                                                                                                                                                            Yeah, sure, blame Claude for not having backups. Sure do.

                                                                                                                                                                            • bryan_w a day ago

                                                                                                                                                                              This is rage bait

                                                                                                                                                                              • paxys a day ago

                                                                                                                                                                                > Teaching engineers to build production AI systems | AI agents, LLMs, ML, data engineering |

                                                                                                                                                                                > In the newsletter, I wrote the full timeline + what I changed so this doesn't happen again.

                                                                                                                                                                                > If you found this post helpful, follow me for more content like this.

                                                                                                                                                                                So yeah, this is standard LinkedIn/X influencer slop.

                                                                                                                                                                                • weedhopper a day ago

                                                                                                                                                                                  Looks like he never had any idea what he was doing in the first place. Vibe bro classic.

                                                                                                                                                                                  • nickvec a day ago

                                                                                                                                                                                    No replicas?

                                                                                                                                                                                    • MagicMoonlight a day ago

                                                                                                                                                                                      Claude code is really dangerous. It doesn’t even tell you what it is doing. You have no idea what it is thinking or what it is changing.

                                                                                                                                                                                      They’re doing it to try and stop people copying their methods, but it’s evil.

                                                                                                                                                                                      • yearolinuxdsktp a day ago

                                                                                                                                                                                        Your AWS backup snapshots must go one-way (append-only) to a separate AWS account, to which access is extremely limited and never has any automated tools connecting to with anything other than read access. I don’t think it costs more to do that—-but it takes your backups out of the blast radius of a root or admin account compromise OR a tool malfunction. With AWS DLM, you can safely configure your backup retention in the separate AWS account and not risk any tools deleting them.

                                                                                                                                                                                        Terraform is a ticking time bomb. All it takes is for a new field to show up in AWS or a new state in an existing field, and now your resource is not modified, but is destroyed and re-created.

                                                                                                                                                                                        I will never trust any process, AI or a CD pipeline, execute `terraform apply` automatically on anything production. Maybe if you examine the plan for a very narrow set of changes and then execute apply from that plan only, maybe then you can automate it. I think it’s much rarer for Terraform to deviate from a plan.

                                                                                                                                                                                        Regardless, you must always turn on Delete Protection on all your important resources. It is wild to me that AWS didn’t ship EKS with delete protection out of the gate—-they only added this feature in August 2025! Not long before that, I’ve witnessed a production database get deleted because Terraform decided that an AWS EKS cluster could not be modified, so it decided to delete it and re-create it, while the team was trying to upgrade the version of EKS. The same exact pipeline worked fine in the staging environment. Turns out production had a slight difference due to AWS API changes, and Terraform decided it could not modify.

                                                                                                                                                                                        The use of a state file with Terraform is a constant source of trouble and footguns:

                                                                                                                                                                                        - you must never use a local Terraform state file for production that’s not committed to source control - you must use a remote S3 state file with Terraform for any production system that’s worth anything - ideally, the only state file in source control is for a separate Terraform stack to bootstrap the S3 bucket for all other Terraform stacks

                                                                                                                                                                                        If you’re running only on AWS, and are using agents to write your IaaC anyway, use AWS CloudFormation, because it doesn’t use state files, and you don’t need your IaaC code to be readable or comprehensible.

                                                                                                                                                                                        • idontwantthis a day ago

                                                                                                                                                                                          This is like /r/wallstreetbets loss porn. Why is he posting his own idiocy for clout? I can only guess it's fake and he's trying to gin up rage engagement. It's certainly working on here.

                                                                                                                                                                                          • beepbooptheory a day ago

                                                                                                                                                                                            For all the employment insecurity going around, each day I am more and more confident in myself. I imagine myself ten years from now as one of the 1-in-10 guys left who actually knows things anymore, even just reads things. It will be a formidable superpower!

                                                                                                                                                                                            Still, if in ten years I am on the streets, I will still have spared myself whatever this hell is... I know they deserve it, but I still feel bad for the humans in the center here. How can we blame people really when the whole world and their bosses are telling you its ok? Surely its a lot of young devs too here.. Such a terrible intro to the industry. Not sure I'd ever recover personally.

                                                                                                                                                                                            • ChrisArchitect a day ago
                                                                                                                                                                                              • avereveard a day ago

                                                                                                                                                                                                "I gave an automation I didn't control permissions it shouldn't have"

                                                                                                                                                                                                • karmakaze a day ago

                                                                                                                                                                                                  Funny how when Claude Code makes something people take credit.

                                                                                                                                                                                                  • beAbU a day ago

                                                                                                                                                                                                    Play stupid games, win stupid prizes.

                                                                                                                                                                                                    The more you fuck around, the more you find out.

                                                                                                                                                                                                    • fred_is_fred a day ago

                                                                                                                                                                                                      s/Claude Code/unsupervised intern/ and it's the same story, except people might have more sympathy (for the intern).

                                                                                                                                                                                                      • eclipticplane a day ago

                                                                                                                                                                                                        But we probably wouldn't have given the unsupervised intern root AWS access, though.

                                                                                                                                                                                                        • fred_is_fred a day ago

                                                                                                                                                                                                          Ah I meant for the poor intern...

                                                                                                                                                                                                          • logical_proof a day ago

                                                                                                                                                                                                            Thank you!

                                                                                                                                                                                                          • logical_proof a day ago

                                                                                                                                                                                                            Do you really think people would have more sympathy for an org that gave the keys to the kingdom to an intern? I think it would be the same "How did you think this was going to go?" conversation.

                                                                                                                                                                                                            • fred_is_fred a day ago

                                                                                                                                                                                                              Sympathy for the intern. Like the HBO test email incident.

                                                                                                                                                                                                          • phendrenad2 a day ago

                                                                                                                                                                                                            I blame not only the engineer who ran the command, Claude which made the mistake, but also software engineers as a group (because Terraform is way too dangerous a tool to be used by engineers and not dedicated SREs, yet we have somehow made this the default. I'm happy to be convinced otherwise, but I've seen enough carnage when "senior" engineers fuck up terraform that it'll be difficult), and also I blame cloud platforms like AWS that are overly complex and led to the Claude confusion.

                                                                                                                                                                                                            • kittikitti a day ago

                                                                                                                                                                                                              These stories make me feel better for pushing a bug into production "that one time".

                                                                                                                                                                                                              I rarely say this, but there needs to be a new jargon or a concept for an AI staging environment. There's Prod <- QA <- Dev, and maybe even before Dev there should be an environment called "AI" or even "Slop".