I found myself agreeing with quite a lot of this article.
I'm a pretty huge proponent for AI-assisted development, but I've never found those 10x claims convincing. I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
That's not too far from this article's assumptions. From the article:
> I wouldn't be surprised to learn AI helps many engineers do certain tasks 20-50% faster, but the nature of software bottlenecks mean this doesn't translate to a 20% productivity increase and certainly not a 10x increase.
I think that's an under-estimation - I suspect engineers that really know how to use this stuff effectively will get more than a 0.2x increase - but I do think all of the other stuff involved in building software makes the 10x thing unrealistic in most cases.
Yeah. I just need to babysit it too much. Take copilot, it gives good suggestions and blows me away sometimes with a block of code which is exactly what I'd type. But actively letting it code (at least with gpt4.1 or gpt4o) just doesn't work well enough for me. Half of the time it doesn't even compile, and after fixing that it's just not really correctly working either. I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.
>I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.
This seems to be the current consensus.
A very similar quote from another recent AI article:
One host compares AI chatbots to “a very smart assistant who has a dozen Ph.D.s but is also high on ketamine like 30 percent of the time.”
https://lithub.com/what-happened-when-i-tried-to-replace-mys...
To be fair I've known 10x developers who are high on ketamine 100 percent of the time, it boggles my mind that this can work.
Even saying it has a dozen PhDs belies the reality that these things have no relationship with the truth
I find statements like this kind of funny.
If an AI assistant was the equivalent of “a dozen PhDs” at any of the places I’ve worked you would see an 80-95% productivity reduction by using it.
Yeah, we're only seeing a 20% reduction in productivity.
>you would see an 80-95% productivity reduction by using it.
they are the equivalent.
there is already an 80-95% productivity reduction by just reading about them on Hacker News.
yes, you are overestimating phds.
Yes yes yes we're all aware that these are word predictors and don't actually know anything or reason. But these random dice are somehow able to give reasonably seemingly well-educated answers a majority of the time and the fact that these programs don't technically know anything isn't going to slow the train down any.
i just don't get why people say they don't reason. It's crazy talk. the kv cache is effectively a unidirectional turing machine so it should be possible to encode "reasoning" in there. and evidence shows that llms occasionally does some light reasoning. just because it's not great at it (hard to train for i suppose) doesn't mean it does it zero.
Would I be crazy to say that the difference between reasoning and computation is sentience? This is an impulse with no justification but it rings true to me.
Taking a pragmatic approach, I would say that if the AI accomplishes something that, for humans, requires reasoning, then we should say that the AI is reasoning. That way we can have rational discussions about what the AI can actually do, without diverting into endless discussions about philosophy.
Eh...
Suppose A solves a problem and writes the solution down. B reads the answer and repeats it. Is B reasoning, when asked the same question? What about one that sounds similar?
fine. prove to me llms aren't sentient. your proof can't just be "vibes"
See: "This is an impulse with no justification." In that sense yes my justification absolutely can be vibes, and it is! Suck it!
i see we are in agreement
"Majority" may be a bit generous, and would highly depend on the context and application.
Totally disagree. The current state of coding AIs is “a level 2 product manager who is a world class biker balancing on a unicycle trying to explain a concept in French to a Spanish genius who is only 4 years old.” I’m not going to explain what I mean, but if you’ve used Qwen Code you understand.
Qwen Code is really not representative of the state of the art though. With the right prompt I have no problem getting Claude to output me a complete codebase (e.g. a non trivial library interfacing with multiple hardware devices) with the specs I want, in modern c++ that builds, runs, has documentation and unit tests sourced from data sheets and manufacturer specs from the go
Assuming there aren't tricky concurrency issues and the documentation makes sense (you know what registers to set to configure and otherwise work the device,) device drivers are the easiest thing in the world to code.
There's the old trope that systems programmers are smarter than applications programmers but SWE-Bench puts the lie to that. Sure, SWE-Bench problems are all in the language of software, applications programmers take badly specified tickets in the language of product managers, testers and end users and have to turn that into the language of SWE-Bench to get things done. I am not that impressed with 65% performance on SWE-Bench because those are not the kind of tickets that I have to resolve at work, but rather at work if I want to use AI to help maintain a large codebase I need to break the work down into that kind of ticket.
> device drivers are the easiest thing in the world to code.
Except the documentation lies and in reality your vendor shipped you a part with timing that is slightly out of sync with what the doc says and after 3 months of debugging, including using an oscilloscope, you figure out WTF is going on. You report back to your supplier and after two weeks of them not saying any thing they finally reply that the timings you have reverse engineered are indeed the correct timings, sorry for any misunderstandings with the documentation.
As an application's engineer, my computer doesn't lie to me and memory generally stays at a value I set it to unless I did something really wrong.
Backend services are the easiest thing in the world to write, I am 90% sure that all the bullshit around infra is just artificial job security, and I say this as someone who primarily does backend work now days.
I'm not sure if this counts as systems or application engineering, but if you think your computer doesn't lie to you, try writing an nginx config. Those things aren't evaluated at /all/ the way they look like they are.
To some extent, for sure. The fact that electronics engineers that have picked up a bit of software write a large fraction of the world's device drivers does point to it not being the most challenging of software tasks, but on the other hand the real 'systems engineering' is writing the code that lets those engineers do so successfully, which I think is quite an impressive feat.
I was joking! Claude Code is still the best afaik, though I’d compare it more to “sending a 1440p HDR fax of your user story to a 4-armed mime whose mind is then read by a Aztec psychic who has taken just the right amount of NyQuil.”
That exceeds my expectations! I'm willing to change my mind, do you have any cool examples i should look at?
That also wildly exceeds my experience. The documentation + code generated would be enlightening!
Probably the saddest comment I've read all day. Crafting software line-by-line is the best part of programming (maybe when dealing with hardware devices you can instead rely on auto-generated code from the register/memory region descriptions).
How long would that be economically viable when a sufficient number of people can generate high-qualify code in 1/10th the time? (Obviously, it will always be possible as a hobby.)
I think eventually the move to "coding with AI" may be like the jump from coding in low level to higher level languages was.
Because people like getting high and look for justifications to do so.
Left, right or center justification, it's all the same.
But you're right.
"Ketamine has been found to increase dopaminergic neurotransmission in the brain"
This property is likely an important driver of ketamine abuse and it being rather strongly 'moreish', as well as the subjective experiences of strong expectation during a 'trip'. I.e. the tendency to develop redose loops approaching unconsciousness in a chase to 'get the message from the goddess' or whatever, which seems just out of reach (because it's actually a feeling of expectation and not actually a partially installed divine T3 rig).
The “multiple PhDs” thing is interesting. The point of a PhD is to master both a very specific subject and the research skills needed to advance the frontier of knowledge in that area. There’s also plenty of secondary issues, like figuring out the politics of academia and publishing enough to establish a reputation.
I don’t think models are doing that. They certainly can retrieve a huge amount of information that would otherwise only be available to specialists such as people with PhDs… but I’m not convinced the models have the same level of understanding as a human PhD.
It’s easy to test though- the models simply have to write and defend a dissertation!
To my knowledge, this has not yet been done.
> But actively letting it code (at least with gpt4.1 or gpt4o)
It's funny, Github Copilot puts these models in the 'bargin bin' (they are free in 'ask' mode, whereas the other models count against your monthly limit of premium requests) and it's pretty clear why, they seem downright nerfed. They're tolerable for basic questions but you wouldn't use them if price weren't a concern.
Brandwise, I don't think it does OpenAI any favors to have their models be priced as 'worthless' compared to the other models on premium request limits.
Shhh... the free GPT 4.1 exposed to the VS Code LM API is the only reason I still pay for GitHub Copilot.
>I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.
Best analogy I've ever heard and it's completely accurate. Now, back to work debugging and finishing a vibe coded application I'm being paid to work on.
Two other observations I've found working with ChatGPT and Copilot:
First, until I can re-learn boundaries, they are a fiasco for work-life balance. It's way too easy to have a "hmm what if X" thought late at night or first thing in the morning, pop off a quick ticket from my phone, assign to Copilot, and then twenty minutes later I'm lying in bed reviewing a PR instead of having a shower, a proper breakfast, and fully entering into work headspace.
And on a similar thread, Copilot's willingness to tolerate infinite bikeshedding and refactoring is a hazard for actually getting stuff merged. Unlike a human colleague who loses patience after a round or two of review, Copilot is happy to keep changing things up and endlessly iterating on minutiae. Copilot code reviews are exhausting to read through because it's just so much text, so much back and forth, every little change with big explanations, acknowledgments, replies, etc.
I've found this with Claude Code too. It has nonstop energy (until you run out of tokens) and is always a little too eager to make random edits, which means it's somehow very tiring to use even though you're not doing anything.
But it is the most productive intern I've ever pair programmed with. The real ones hallucinate about as often too.
I think there are three factors to this: 1. What to code (longer, more specific prompts are better but take longer to write), and 2. How to code it (specify languages, libraries, APIs, etc.) And if you're trying to write code that uses a newer version of a library that works differently from what's most commonly documented, it's a long uphill battle of constantly reminding the LLM of the new changes.
If you're not specific enough, it will definitely spit out a half-baked pseudocode file where it expects you to fill in the rest. If you don't specify certain libraries, it'll use whatever is featured in the most blogspam. And if you're in an ecosystem that isn't publicly well-documented, it's near useless.
With something like Devin, where it integrates directly with your repo and generates documentation based on your project(s), it's much more productive to use as an agent. I can delegate like 4-5 small tasks that would normally take me a full day or two (or three) of context switching and mental preparation, and knock them out in less than a day because it did 50-80% of the work, leaving only a few fixes or small pivot for me to wrap them up.
This alone is where I get a lot of my value. Otherwise, I'm using Cursor to actively solve smaller problems in whatever files I'm currently focused on. Being able to refactor things with only a couple sentences is remarkably fast.
The more you know about your language's features (and their precise names), and about higher-level programming patterns, the better time you'll have with LLMs, because it matches up with real documentation and examples with more precision.
> Being able to refactor things with only a couple sentences is remarkably fast.
I'm curious, this is js/ts? Asking because depending on the lang, good old machine refactoring is either amazeballs (Java + IDE) or non-existent (Haskell).
I'm not js/ts so I don't know what the state of machine refactoring is in VS code ... But if it's as good as Java then "a couple of sentences" is quite slow compared to a keystroke or a quick dialog box with completion of symbol names.
I'm using TypeScript. In my case, these refactors are usually small and only spanning up to 5 files depending on how interdependent things are. The benefit with an Agent is it's ability to find and detect related side effects caused by the refactor (broken type-safety, broken translation strings, etc.) and renaming for related things, like an actual UI string if it's tied to the naming of what I'm working on, and my changes happened to include a rename.
It's not always right, but I find it helpful when it finds related changes that I should be making anyway, but may have overlooked.
Another example: selecting a block that I need to wrap (or unwrap) with tedious syntax, say I need to memoize a value with a React `useMemo` hook. I can select the value, open Quick Chat, type "memoize this", and within milliseconds it's correctly wrapped and saved me lots of fiddling on the keyboard. Scale this to hundreds of changes like these over a week, it adds up to valuable time-savings.
Even more powerful: selecting 5, 10, 20 separate values and typing: "memoize all of these" and watching it blast through each one in record time with pinpoint accuracy.
Is work paying for Devin or you are? How pricey is it to delegate the task example you gave?
Work is. I actually don't have access to our billing, so I couldn't tell you exactly, but it depends on how many ACUs (Agent Compute Units) you've used.
We use a Team plan ($500 /mo), which includes 250 ACUs per month. Each bug or small task consumes anywhere between 1-3 ACUs, and fewer units are consumed if you're more precise with your prompt upfront. A larger prompt will usually use fewer ACUs because follow-up prompts cause Devin to run more checks to validate its work. Since it can run scripts, compilers, linters, etc. in its own VM -- all of that contributes to usage. It can also run E2E tests in a browser instance, and validate UI changes visually.
They recommend most tasks should stay under 5 ACUs before it becomes inefficient. I've managed to give it some fairly complex tasks while staying under that threshold.
So anywhere between $2-6 per task usually.
complete coding noob here.
if I want to throw a shuriken abiding to some artificial, magic Magnus force like in the movie wanted, both chatGpt and Claude let me down, using pygame. what if I wanted c-level performance or if I wanted to use zig? burp.
It works like the average Microsoft employee, like some doped version of an orange wig wearer who gets votes because his daddys kept the population as dumb as it gets after the dotcom x Facebook era. in essence, the ones to be disappointed by are the Chan-Zuckerbergs of our time. there was a chance, but there also was what they were primed for
It codes like a junior, has the design sense of a mid, while being a savant at algorithms.
Make it idiot at algorithms and I believe you.
What does it really mean to know something or understand something. I think AI knows a great deal (associating facts with symbols), confabulates at times when it doesn't know (which is dishonestly called hallucination, implying a conscious agent misperceiving, which AI is not), and understands almost nothing.
The best way to think of chat bot "AI" is as the compendium of human intelligence as recorded in books and online media available to it. It is not intelligent at all on its own and its judgement can't be better than its human sources because it has no biological drive to sythesize and excel. Its best to think of AI as a librarian of human knowledge or an interactive Wikipedia which is designed to seem like an intelligent agent but is actually not.
One cannot learn everything from books and in any case many books contradict each other so every developer is a variation based on what they have read and experienced and thought along the way. How can that get summed up into one thing? It might not even be useful to do that.
I suspect that some researchers with a very different approach will come up with a neural network that learns and works more like a human in future though. Not the current LLMS but something with a much more efficient learning mechanism that doesn't require a nuclear power station to train.
To date, I've not been able to effectively use Copilot in any projects.
The suggestions were always unusably bad. The /fix were always obviously and straight up false unless it was a super silly issue.
Claude Code with Opus model on the other hand was mind-blowing to me and made me change my mind on almost everything wrt my opinion of LLMs for coding.
You still need to grow the skill of how to build the context and formulate the prompt, but the buildin execution loop is a complete game changer and I didn't realize that until I actually used it effectively on a toy project myself.
MCP in particular was another thing I always thought was massively over hyped, until I actually started to use some in the same toy project.
Frankly, the building blocks already exist at this point to make a vast majority of all jobs redundant (and I'm thinking about all grunt work office jobs, not coding in particular). The tooling still need to be created, so I'm not seeing a short term realization (<2 yrs), but medium term (5+yrs)?
You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
> You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
I don't buy that. The linked article makes a solid argument for why that's not likely to happen: agentic loop coding tools like Claude Code can speed up the "writing code and getting it working" piece, but the software development lifecycle has so much other work before you get to the "and now we let Claude Code go brrrrrrr" phase.
And I completely agree with that!
These are exactly the people that are going to stay, medium term.
Let's explore a fictional example that somewhat resembles my, and I suspect a lot of peoples current dayjob.
A Micro-Service architecture, each team administers 5-10 services and the whole application, which is once again only a small part of the platform as a whole is developed by maybe 100-200 devs. So something like ~200 micro services
The application architects are gonna be completely save in their jobs. And so are the lead devs in each team - at least from my perspective. Anyone else? I suspect MBAs in 5 yrs will not see their value anymore. That's gonna be the vast majority of all devs, that's likely going to cost 50% of the devs their jobs. And middle management will be slimmed down just as quickly, because you suddenly need a lot less managers.
Let’s extreme this further - why would the company exist in the first place? The customers of said company pay them because they don’t do the service themselves - but in the future when it’s laughably easy to vibe code anything your heart desires, their customers will just build the service themselves that they used to outsource!
tl;dr: in the future when vibe coding works 100% of the time, logically the only companies that will exist are the ones that have processes that AI can’t do, because all the other parts of the supply chain can all be done in-house
That scenario is a lot further out compared to what I was talking about.
It's conceivable that thats going to happen, eventually. but that'd likely require models a lot more advanced to what we have now.
The agent approach with lead devs administering and merging the code the agents made is feasible with today's models. The missing part is the tooling around the models and the development practices that that standardizes this workflow.
That's what I'd expect to take around 5 yrs to settle.
Thanks for this perspective, but I am a bit confused by some of your takes: you used "Claude Code with Opus model" in "the same toy project" with great success, which led you to conclude that this will "make a vast majority of all jobs redundant".
Toy project viability does not connect with making people redundant in the process (ever, really) — at least not for me. Care to elaborate where do you draw the optimism from?
I cannot use it on my production code base. I'm working for a company that requires the devs to code from virtual workplaces, which is a fancy term to say virtual machines running in the azure cloud. These are completely locked down and anything but copilot is forbidden from use, and enforced via firewall and process monitoring. I can still use sonnet 3.7 through that, but that's a far cry from my experience on my personal time with Claude Code.
I called it a toy project because I'm not earning money with it - hence it's a toy.
It does have medium complexity with roughly 100k loc though.
And I think I need to repeat myself, because you seem to read something into my comment that I didn't say: the building blocks exist doesn't mean that today's tooling is sufficient for this to play out, today.
I very explicitly set a time horizon of 5 yrs.
> You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
I'm gonna pivot to building bomb shelters maybe
Or stockpiling munitions to sell during the troubles
Maybe some kind of protest support saas. Molotov deliveries as a service, you still have to light them and throw them but I guarantee next day delivery and they will be ready to deploy into any data center you want to burn down
What Im trying to say is "companies letting people go in staggering numbers" is a societal failure state not an ideal
I find it so weird how many engineers seem positively giddy to get replaced by a chatbot that functionally cannot do the job. Ill help your molotovs as a service startup, free guillotine with every 6th order.
> until I actually started to use some in the same toy project
Thats the key right there. Try to use it in a project that handles PII, needs data to be exact, or has many dependencies/libraries and needs to not break for critical business functions.
So what happens when someone calls in and the "AI" answers (because the receptionist has been fired and replaced by "AI"), and the caller asks to access some company record that should be private? Will the LLM always deny the request? Hint: no, not always.
There are so many flaws in your plan, I have no doubt that "AI" will ruin some companies that try to replace humans with a "tin can". LLMs are being inserted loosey-goosey into too many places by people that don't really understand the liability problems it creates. Because the LLM doesn't think, it doesn't have a job to protect, it doesn't have a family to feed. It can be gamed. It simply won't care.
The flaws in "AI" are already pretty obvious to anyone paying attention. It will only get more obvious the more LLMs get pushed into places they really do not belong.
> Will the LLM always deny the request? Hint: no, not always.
And you are confident that the human receptionist will never fall for social engineering?
I don't think data protection is even close to the biggest problem with replacing all/most employees with bots.
Who buys their crap if you fire everyone?
My biggest takeaway from using AI is that
(1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code.
(2) however, the biggest unlock is it makes working on side projects __immensely__ easier. Before AI I was always too tired to spend significant time on side projects. Now, I can see my ideas come to life (albeit with shittier code), with much less mental effort. I also get to improve my AI engineering skills without the constraint of deadlines, data privacy, tool constraints etc..
2 heavily resonates with me. Simon Wilson made the point early on that AI makes him more ambitious with his side projects, and I heavily agree. Suddenly lots of things that seemed more or less un-feasible are now not only do-able, but can actually meet or exceed your own assumptions for them.
Being able to sit down after a long way of work and ask an AI model to implement some bug or feature on something while you relax and _not_ type code is a major boon. It is able to immediately get context and be productive even when you are not.
Funny. This is exactly how I use it too. I love to make a ui change prompt and switch to the browser and watch hot reload incrementally make the changes I assume will happen.
If my work involves doing a bit of tooling and improve the testing and documenting them, I find myself having much lesser resistance and I'm rather happy to give it off to an AI agent.
I haven't begun doing side projects or projects for self, yet. But I did go down the road of finding out what would be needed to do something I wished existed. It was much easier to explore and understand the components and I might have a decent chance at a prototype.
The alternative to this would have been to ask people around or formulate extensively researched questions for online forums, where I'd expect to get half cryptic answers (and a jibe at my ignorance every now and then) at a pace that I would take years before I had something ready.
I see the point for AI as a prototyping and brainstorming tool. But I doubt we are at a point where I would be comfortable pushing changes to a production environment without giving 3x the effort in reviewing. Since there's a chance of the system hallucinating, I have a genuine fear that it would seem accurate, but what it would do is something really really stupid.
> (1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code
I hear this take a lot but does it really make that much of an improvement over what we already had with search engines, online documentation and online Q&A sites?
It is the best version of fuzzy search I have ever seen: the ultimate "tip of my tongue" assistant. I can ask super vague things like "Hey, I remember seeing a tool that allows you to put actual code in your files to do codegen, what could it be?" and it instantly gives me a list of possible answers, including the thing I'm looking for: Cog.
I know that a whole bunch of people will respond with the exact set of words that will make it show up right away on Google, but that's not the point: I couldn't remember what language it used, or any other detail beyond what I wrote and that it had been shared on Hacker News at some point, and the first couple Google searches returned a million other similar but incorrect things. With an LLM I found it right away.
That's a great example. (Also I love Cog.)
The training cutoff comes into play here a bit, but 95% of the time I'm fuzzy searching like that I'm happy with projects that have been around for a few years and hence are both more mature and happen to fall into the training data.
Yes.
Me, typing into a search engine, a few years ago: "Postgres CTE tutorial"
Me, typing into any AI engine, in 2025: "Here is my schema and query; optimize the query using CTEs and anything else you think might improve performance and readability"
And nowadays if you type that into a search engine you may be overwhelmed with ads or articles of varying quality that you'll need to read and deeply understand to adapt to your use-case.
> you'll need to read and deeply understand to adapt to your use-case
This sort of implies you are not reading and deeply understanding your LLM output, doesn't it?
I am pretty strongly against that behavior
I didn't say that. When you're trying to get a job done, it's time consuming to sift through a long tutorial online because a big part of that time is spent determining whether its garbage and whether its solving the exact problem that you need to solve. IME the LLM helps with both of those problems.
Those things don't really help with getting unstuck, especially if the reason you are struck is that there tedious code that you anticipate writing and don't want to deal with.
Exactly. My two worst roadblocks are the beginning of a new feature when I procrastinate way too much (I'm a bit afraid of choosing a design/architecture and committing to it) and towards the end when I have to fix small regressions and write tests, and I procrastinate because I just don't want to. AI solved the second roadblock 100% of the time, and help with design decisions enough to be useful (Claude4 at least). The code in the middle is a plus, but tbh I often do it myself (unless it's frontend code).
Yes. It's so dramatically better it's not even funny. It's not that information doesn't exist out there, it's more that an LLM can give it to you in a few seconds and it's tailored to your specific situation. The second part is especially helpful if the internet answer is 95% correct but is missing something specific to you that ends up taking you 20 minutes to figure out.
> that ends up taking you 20 minutes to figure out
That 20 minutes, repeated over and over over the course of a career, is the difference between being a master versus being an amateur
You should value it, even if your employer doesn't.
Your employer would likely churn you into ground beef if there was a financial incentive to, never forget that
Yeah I strongly disagree. I want to spend time figuring the things that important to me and my career. I could care less about the one regex I write every year. Especially when I've learned and forgotten the syntax more times than I can count.
There's a funny quote about regex
"You had a problem. You tried to solve it with regex. Now you have two problems"
1) your original problem 2) your broken regex
I would like to propose an addition
"You had a problem. You tried to solve it with AI generated regex. Now you have three problems"
1) your original problem 2) your broken regex 3) your reliance on AI
> does it really make that much of an improvement over what we already had with search engines, online documentation and online Q&A sites?
This can't be a serious question? 5 minutes of testing will prove to you that it's not just better, it's a totally new paradigm. I'm relatively skeptical of AI as a general purpose tool, but in terms of learning and asking questions on well documented areas like programming language spec, APIs etc it's not even close. Google is dead to me in this use case.
> This can't be a serious question? 5 minutes of testing will prove to you that it's not just better, it's a totally new paradigm
It is a serious question. I've spent much more than 5 minutes testing this, and I've found that your "totally new paradigm" is for morons
In my experience a lot our "google engineers" now do both. We tend to preach that they go to the documentation first, since that will almost always lead to actual understanding of what they are working on. Eventually most of them pick up that habbit, and in my experience, they never really go back to being "google engineers" after that... Where the AI helps with this, is that it can search documentation rather well. We do a lot of work with Azure, and while the Microsoft documentation is certainly extensive, it can be rather hard to find exactly what you're looking for. LLM's can usually find a lot of related pages, and then you can figure out which are relevant easier than you can with google/ecosia/ddg. I've havent used kagi, so maybe that works better?
As far as writing "tedious" code goes, I think the AI agents are great. Where I have personally found a huge advantage is in keeping documentation up-to-date. I'm not sure if it's because I have ADHD or because my workload is basically enough for 3 people, but this is an area I struggle with. In the past, I've often let the code be it's own documentation, because that would be better than having out-dated/wrong documentation. With AI agents, I find that I can have good documentation that I don't need to worry about beyond approving in the keep/discard part of the AI agent. I also rarely write SQL, bicep, yaml configs and similar these days, because it's so easy to determine if the AI agent got it wrong. This requires you're an expert on infrastructure as code and SQL, but if you are, the AI agents are really fast. I think this is one of the areas where they 10x at times. I recently wrote an ingress for an ftp pod (don't ask), and writing all those ports for passive mode would've taken me a while. There are a lot of risk involved. If you can't spot errors or outdated functionality quickly, then I would highly recommend you don't do this. Bicep LLM output is often not up to date, and since the docs are excellent what I do in those situations is that I copy/paste what I need. Then I let the AI agent update things like parameters, which certainly isn't 10x but still faster than I can do it.
Similarily it's rather good at writing and maintaining automatic tests. I wouldn't recommend this unless you're working with actively dealing with corrupted states directly in your code. But we do fail-fast programming/Design by Contract so the tests are really just an extra precaution and compliance thing, meaning that they aren't as vital as they will be for more implicit ways of dealing with error handling.
I don't think AI's are good at helping you with learning or getting unstuck. I guess it depends on how you would normally deal with. If the alternative is "google programming" and I imagine it is sort of similar and probably more effective. It's probably also more dangerous. At least we've found that our engineers are more likely to trust the LLM than a medium article or a stackoverflow thread.
Are you boycotting AI or something?
If you try it yourself you'll soon find out that the answer is a very obvious yes.
You don't need a paid plan to benefit from that kind of assistance, either.
> Are you boycotting AI or something?
At this point I am close to deciding to fully boycott it yes
> If you try it yourself you'll soon find out that the answer is a very obvious yes
I have tried plenty over the years, every time a new model releases and the hype cycle fires up again I look in to see if it is any better
I try to use it a couple of weeks, decide it is overrated and stop. Yes it is improving. No it is not good enough for me to trust
You asked whether it's really better than "what we already had with search engines, online documentation and online Q&A sites".
How have you found it not to be significantly better for those purposes?
The "not good enough for you to trust" is a strange claim. No matter what source of info you use, outside of official documentation, you have to assess its quality and correctness. LLM output is no different.
> How have you found it not to be significantly better for those purposes
Not even remotely
> LLM output is no different
It is different
A search result might take me to the wrong answer but an LLM might just invent nonsense answers
This is a fundamentally different thing and is more difficult to detect imo
> A search result might take me to the wrong answer but an LLM might just invent nonsense answers
> This is a fundamentally different thing and is more difficult to detect imo
99% of the time it's not. You validate and correct/accept like you would any other suggestion.
Yes, it can.
#2 is the reason I keep paying for Claude Code Pro.
For 20 a month I can get my stupid tool and utility ideas from "it would be cool if I could..." to actual "works well enough for me" -tools in an evening - while I watch my shows at the same time.
After a day at work I don't have the energy to start digging through, say, OpenWeather's latest 3.0 API and its nuances and how I can refactor my old code to use the new API.
Claude did it in maybe one episode of What We Do in the Shadows :D I have a hook that makes my computer beep when Claude is done or pauses for a question, so I can get back, check what it did and poke it forward.
Ghostty has native push notification support for Claude Code‘s „finished“ events.
Guys, it's built in
claude config set --global preferredNotifChannel terminal_bell
https://docs.anthropic.com/en/docs/claude-code/terminal-conf...
#2 I expect to wind up as a huge win professionally as well. It lowers the investment for creating an MVP or experimental/exploratory project from weeks to hours or days. That ability to try things that might have been judged too risky for a team previously will be pretty amazing.
I do also believe that those who are often looked at or referred to as 10x engineers will maybe only see a marginal productivity increase.
The smartest programmer I know is so impressive mainly for two reasons: first, he seems to have just an otherworldly memory and seems to kind of have absolutely every little feature and detail of the programming languages he uses memorized. Second, his real power is really in cognitive ability, or the ability to always quickly and creatively come up with the smartest and most efficient yet elegant and clean solution to any given problem. Of course somewhat opinionated but in a good way. Funnily he often wouldn't know the academic/common name for some algorithm he arrived at but it just happened to be what made sense to him and he arrived at it independently. Like a talented musician with perfect pitch who can't read notation or doesn't know theory yet is 10x more talented than someone who has studied it all.
When I pair program with him, it's evident that the current iteration of AI tools is not as quick or as sharp. You could arrive at similar solutions but you would have to iterate for a very long time. It would actually slow that person down significantly.
However, there is such a big spectrum of ability in this field that I could actually see this increasing for example my productivity by 10x. My background/profession is not in software engineering but when I do it in my free time the perfectionist tendencies make me work very slowly. So for me these AI tools are actually cool for generating the first crappy proof of concepts for my side projects/ideas, just to get something working quickly.
I like the quip that AI raises the floor not the ceiling. I think it helps the bottom 20% perform more like the middle 50% but doesn't do much for people at the top.
Maybe to get an impression that they'd be performing like them - but not actually performing.
It helps me being lazy because I have a rough expectation of what the outcome should be - and I can directly spot any corner cases or other issues the AI proposed solution has, and can either prompt it to fix that, or (more often) fix those parts myself.
The bottom 20% may not have enough skill to spot that, and they'll produce superficially working code that'll then break in interesting ways. If you're in an organization that tolerates copy and pasting from stack overflow that might be good enough - otherwise the result is not only useless, but as it provides the illusion of providing complete solution you're also closing the path of training junior developers.
Pretty much all AI attributed firings were doing just that: Get rid of the juniors. That'll catch up with us in a decade or so. I shouldn't complain, though - that's probably a nice earning boost just before retirement for me.
I randomly stumbled across Tekwetu who've made a pretty good step-by-step example of coding with Claude Code, using MCPs, etc.[1]. None of the upsell or gushing. It's a pretty simple app with a backend, with a slightly complicated storage mechanism.
I was watching to learn how other devs are using Claude Code, as my first attempt I pretty quickly ran into a huge mess and was specifically looking for how to debug better with MCP.
The most striking thing is she keeps on having to stop it doing really stupid things. She slightly glosses over those points a little bit by saying things like "I roughly know what this should look like, and that's not quite right" or "I know that's the old way of installing TailwindCSS, I'll just show you how to install Context7", etc.
But in each 10 minute episodes (which have time skips while CC thinks) it happens at least twice. She has to bring her senior dev skills in, and it's only due to her skill that she can spot the problem in seconds flat.
And after watching much of it, though I skipped a few episodes at the end, I'm pretty certain I could have coded the same app quicker than she did without agentic AI, just using the old chat window AIs to bash out the React boilerplate and help me quickly scan the documentation for getting offline. The initial estimate of 18 days the AI came up with in the plan phase would only hold truye if you had to do it "properly".
I'm also certain she could have too.
[1] https://www.youtube.com/watch?v=erKHnjVQD1k
It's worth a watch if you're not doing agentic coding yet. There were points I was impressed with what she got it to do. The TDD section was quite impressive in many ways, though it immediately tried to cheat and she had to tell it to do it properly.
Personally I find MCP a bit limiting - I'm using Emacs bindings, and then provide LLMs elisp functions to call.
I posted a demo here a while ago where I try to have it draw turtle graphics:
https://news.ycombinator.com/item?id=44013939
Since then I've also provided enough glue that it can interact with the Arch Linux installer in a VM (or actual hardware, via serial port) - with sometimes hilarious results, but at least some LLMS do manage to install Arch with some guidance:
https://github.com/aard-fi/arch-installer
Somewhat amusingly, some LLMs have a tendency to just go on with it (even when it fails), with rare hallucinations - while other directly start lying and only pretend they logged in.
maybe, but I find that it makes it much faster to do things that _I already know how to do_, and can only slowly, ploddingly get me to places that I don't already have a strong mental model for, as I have to discover mistakes the hard way
I've only used Copilot, but this is just about exactly right. (I've only used it for Python.)
If I'm writing a series of very similar test cases, it's great for spamming them out quickly, but I still need to make sure they're actually right. This is easier to spot errors because I didn't type them out.
It's also decent for writing various bits of boilerplate for list / dict comprehensions, log messages (although they're usually half wrong, but close enough to what I was thinking), time formatting, that kind of thing. All very standard stuff that I've done a million times but I may be a little rusty on. Basically StackOverflow question fodder.
But for anything complex and domain-specific, it's more wrong than it's right.
things backed by Claude Sonnet can get a little further out than Copilot can, and when it’s in agent mode _sometimes_ it will do things like read the library source code to understand the API, or google for the docs
but the principle is the same: if the human isn’t doing theory-building, then no one is
I add to that analogy. AI raises the floor but some of the floor tiles fall away, unpredictably.
I think its more effective at lowering the floor. The amount of people that can't code at all but can now slap something together makes it a huge step forward. Albeit one that mostly steps on a pile of dogshit after it hits any sort of production reality.
Its like Wordpress all over again but with people even less able to code. There's going to be vast amounts of opportunities for people to get into the industry via this route but its not going to be a very nice route for many of them. Lots of people who understand software even less than c-suite holding the purse-strings.
AI is strong in different places, and if it keeps on being strong in certain ways then people very soon won't be able to keep up. For example, extreme horizontal knowledge and the ability to digest new information almost instantly. That's not something anyone can do. We don't try to compete against computers on raw calculation, and soon we won't compete on this one either. We simply won't even think to compare.
People keep focusing on general intelligence style capabilities but that is the golden grail. The world could go through multiple revolutions before finding that golden grail, but even before then everything would have changed beyond recognition.
So write an integration over the API docs I just copy-pasted.
Thanks for the comment Simon! This is honestly the first one I've read where it feels like someone actually read the article. I'm totally open to the idea that some people, especially those working on the languages/tools that LLMs are good at, are indeed getting a 2x improvement in certain parts of their job.
Something I have realized about Hacked News is that most of the comments on any given article are from people who are responding to the headline without actually clicking through and reading it!
This is particularly true for headlines like this one which stand alone as statements.
Perhaps that's my fault for making the title almost clickbaity. My goal was to get people who felt anxious about AI turning them into dinosaurs not feel like they are missing some secret sauce, so hopefully the reach this is getting contributes that.
Again, appreciate your thoughts, I have a huge amount of respect for your work. I hope you have a good one!
Nah, it’s the way news aggregators have been we ever since they have existed.
If you hadn't made the title clickbaity you probably wouldn't have hit the homepage!
The term "10x" occurs 25 times in this article, including in a subhed deep into the piece.
people not Reading The Fine Article is as old as the Web, you're fine!
> most of the comments on any given article are from people who are responding to the headline without actually clicking through and reading it!
Well, the people who quote from TFA have usually at least read the part they quoted ;)
Humans have been hallucinating responses given a prompt long before chatgpt was a thing!
This is a truism across the entire web
The other thing is that I don't believe software developers actually "do their best" when writing the code itself, that is, optimize the speed of writing code. Nor do they need to; they know writing the code doesn't take up time, waiting for CI and a code review and that iteration cycle does.
And does an AI agent doing a code review actually reduce that time too? I have doubts. Caveat, I haven't seen it in practice yet.
I think even claims of 2-5x are highly suspect. It would imply that if your team is using AI then all else equal they accomplish 2-5 times as much in a quarter. I don't know about you but I'm certainly not seeing this and most people on my team use AI.
[And to those saying we're using it wrong... well I can't argue with something that's not falsifiable]
My company is all in on LLMs and honestly the improvement seems to be like 0.9x to 1.2x depending on the project. None of them are moving at break neck speed and many projects are just as bogged down by complexity as ever. Pretty big (3000+) company with a large mature codebase in multiple languages. For god knows how much money spent on it
Once a company gets to a certain size, they no longer optimise for product development, instead they optimise for risk mitigation. A lot of processes will be put in place with the sole purpose of slowing down code development.
Do you mean -10 to +20% or 90 to 120%?
I am not allowed to use LLMs at work for work code so I can't tell what claims are real. Just my 80s game reimplementations of Snake and Asteroids.
There's a bunch of open source projects where we can observe that not happening in real time.
This is (I think) a reference to the 10x engineer, another myth of which I have always been highly dubious (https://www.simplethread.com/the-10x-programmer-myth/).
10x sounds nice which is probably why it stuck, but it came from actual research which found the difference was larger than 10x - but also they were measuring between best and worst, not best and average as it's used nowadays.
https://www.construx.com/blog/productivity-variations-among-...
All of this is hard to quantify. How much better than the average engineer is John Carmack, or Rob Pike or Linus? I consider myself average-ish and I don't think there's any world in which I could do what those guys did no matter how much time you gave me (especially without the hindsight knowledge of the creations). So I'd say they're all infinitely better than me.
I guess that makes Newton a 10x scientist. Really puts in perspective how utterly unrealistic it is to be looking to hire exclusively 10x programmers - the true 10x'ers are legends, not just regular devs who type a bit faster.
It would be more sensible if the "10x" moniker was dropped altogether, and we just went back to calling these people what they've always been: "geniuses". Then there might be more realistic expectations of only finding them among 1% of the population.
Newton was a 1000x
I'd argue he was 9.81x
I think it's more important to measure his changing contribution as a function of time.
And how much better are they than your average engineer when plopped into a mediocre organization where they aren’t the political and technical top dog? I would guess they would all quit within a week.
Good engineers don't stay in mediocre organizations, mediocre ones do. Do you think these "top dogs" were at the top of their game from day one? They all learned, just like everyone else; talent just gave them a higher ceiling.
>but also they were measuring between best and worst, not best and average as it's used nowadays.
Depending on the environment, I can imagine the worst devs being net negative.
And they get promoted too. Multiple times I've seen people get promoted for decisions that doom the company years later. Considering all the various departments and people that go into supporting these net negative engineers, more people are net negative than they think.
It highly depends on the circumstances. In over 30 years in the industry I met 3 people that were many times more productive than everyone else around them, even more than 10 times. What does this translate to? Well, there are some extraordinary people around, very rare and you cannot count on finding some and, when you find them, it is almost impossible to retain them because management and HR never agree to pay them enough to stay around.
You don't believe Fabrice Bellard exists?
He doesn't believe there are hundreds of Fabrice Bellard clones who think working at your company wouldn't be a waste of their time. The myth might be that thinking about 10X is useful in any sense. You can't plan around one gracing you with their presence and you won't be able to retain them when they do.
Thinking about it personally, a 10X label means I'm supposedly the smartest person in the room and that I'm earning 1/10th what I should be. Both of those are huge negatives.
I agree. I'm a big fan/proponent of AI assisted development (though nowhere near your amount of experience with it). And I think that 2x-10x speed up can be true, depending on what you mean exactly and what your task is exactly.
This article thinks that most people who say 10x productivity are claiming 10x speedup on end-to-end delivering features. If that's indeed what someone is saying, they're most of the time quite simply wrong (or lying).
But I think some people (like me) aren't claiming that. Of course the end to end product process includes a lot more work than just the pure coding aspect, and indeed none of those other parts are getting a 10x speedup right now.
That said, there are a few cases where this 10x end-to-end is possible. E.g. when working alone, especially on new things but not only - you're skipping a lot of this overhead. That's why smaller teams, even solo teams, are suddenly super interesting - because they are getting a bigger speedup comparatively speaking, and possibly enough of one to be able to rival larger teams.
Programmers are notoriously bad about making estimates. Sure it sped something up 10x, but did you consider those 10 tries using AI that didn't pan out? You're not even breaking even, you are losing time.
I’ve found I do get small bursts of 10x productivity when trying to prototype an idea - much of the research on frameworks and such just goes away. Of course that’s usually followed by struggling to make a seemingly small change for an hour or two. It seems like the 10x number is just classic engineers underestimating tasks - making estimates based on peak productivity that never materializes.
I have found for myself it helps motivate me, resulting in net productivity gain from that alone. Even when it generates bad ideas, it can get me out of a rut and give me a bias towards action. It also keeps me from procrastinating on icky legacy codebases.
> I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
I think that the key realization is that there are tasks where LLMs excel and might even buy you 10x productivity, whereas some tasks their contribution might even be net negative.
LLM are largely excellent at writing and refactoring unit tests, mainly because their context is very limited (i.e., write a method in a class that calls this specific method of this specific class a specific way and check the output) and their output is very repetitive (i.e., write isolated methods in standalone classes without output that are not called anywhere). They also seem helpful when prompted to add logging. LLMs are also effective in creating greenfield projects, serving as glorified template engines. But when lightly pressed on specific tasks like implementing a cross-domain feature... Their output starts to be at best a big ball of mud.
> engineers that really know how to use this stuff effectively
I guess this is still the "caveat" that can keep the hype hopes going. But I've found at a team velocity level, with our teams, where everyone is actively using agentic coding like Claude Code on the daily, we actually didn't see an increase in team velocity yet.
I'm curious to hear anecdotal from other teams, has your team seen velocity increase since it adopted agentic AI?
Same here. I have a colleague that is completely enamored with these agents. Uses them for everything he can, not just coding. Commit messages, opening PRs, Linear tickets, etc. Basically, he uses agents for everything he can. But the productivity gain is just not there. He's about as fast or rather as slow as he was before. And to a degree I think this goes for the whole team. It's the oxymoron of AI: more code, more documentation, more text, more of everything generated than ever, but the effect is that this means more complexity, more PRs to review, more bugs, more stuff to know and understand, ... We are all still learning how to use these agents effectively. And the particular developer's effect can and does multiply as everything else with GenAI. Was he a bit sloppy before, not covering various edge-cases and used quick-and-dirty shortcuts? Then this remains true for the code he produces using agents. And to those, who claim that "by using more agents I will gain 10x productivity" I say please read a certain book about how just adding developers to a project makes it even more delayed. The resemblance of team/project leadership -> developers dynamic is truly uncanny.
My experience with GenAI is that it's a significant improvement to Stack Overflow, and generally as capable as someone hired right out of college.
If I'm using it to remember the syntax or library for something I used to know how to do, it's great.
If I'm using it to explore something I haven't done before, it makes me faster, but sometimes it lies to me. Which was also true of Stack Overflow.
But when I ask it to so something fairly complex on it's own, it usually tips over. I've tried a bunch of tests with a bunch of models, and it never quite gets it right. Sometimes it's minor stuff that I can fix if I bang on it long enough, and sometimes it's a steaming pile that I end up tossing in the garbage.
For example, I've asked it to code me a web-based calculator, or a 3D model of the solar system using WebGL, and none of the models I've tried have been able to do either.
I wonder if a better metric would be developer happiness? Instead of being 2x or 5x more productive, what if we looked at what a developer enjoyed doing and figured out how to use AI for everything else?
> I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
This feels exactly right and is what I’ve thought since this all began.
But it also makes me think maybe there are those that A.I. helps 10x, but more because that code input is actually a very large part of their job. Some coders aren’t doing much design or engineering, just assembly.
Yeah, I hadn't thought about that. If you really are a programmer who gets all of their work assigned to them as detailed specifications maybe you are seeing a 10x boost.
I don't think I've encountered programmer like that in my own career, but I guess they might exist somewhere!
Personally I've found it's very good at writing support tools / shell scripts. I mostly use it to parse the output of other tools that don't have machine-readable output yet.
Claude Code (which is apparently the best in general) isn't very good at reviewing existing large projects IME, because it doesn't want to load a lot of text into its context. If you ask it to review an existing project it'll search for keywords instead of just loading an entire file.
That and it really wants to please you, so if you imply you own a project it'll be a lot more positive than it may deserve.
Completely agree.
What will happen is over time this will become the new baseline for developing software.
It will mean we can deliver software faster. Maybe more so than other advances, but it won't fundamentally change the fact that software takes real effort and that effort will not go away, since that effort is much more than just coding this or that function.
I could create a huge list of things that have made developing and deploying quality software easier: linters, static type checkers, code formatters, hot reload, intelligent code completion, distributed version control (i.e., Git), unit testing frameworks, inference schema tools, code from schema, etc. I'm sure others can add dozens of items to that list. And yet there seems to be an unending amount of software to be built, limited only by the people available to build it and an organizations funding to hire those people.
In my personal work, I've found AI-assisted development to make me faster (not sure I have a good estimate for how much faster.) What I've also found is that it makes it much easier to tackle novel problems within an existing solution base. And I believe this is likely to be a big part of the dev productivity gain.
Just an example, lets say we want to use the strangler pattern as part of our modernization approach for a legacy enterprise app that has seen better days. Unless you have some senior devs who are both experienced with that pattern AND experienced with your code base, it can take a lot of trial and error to figure out how to make it work. (As you said, most of our work isn't actually typing code.)
This is where an AI/LLM tool can go to work on understanding the code base and understanding the pattern to create a reference implementation approach and tests. That can save a team of devs many weeks of trial & error (and stress) not to mention guidance on where they will run into roadblocks deep into the code base.
And, in my opinion, this is where a huge portion of the AI-assisted dev savings will come from - not so much writing the code (although that's helpful) but helping devs get to the details of a solution much faster.
It's that googling has always gotten us to generic references and AI gets us those references fit for our solution.
I've basically come to the same 2x to 5x conclusion as you. Problem is that "5x productivity" is really only a small portion of my actual job.
The hardest part of my job is actually understanding the problem space and making sure we're applying the correct solution. Actual coding is probably about 30% of my job.
That means, I'm only looking at something like 30% productivity gain by being 5x as effective at coding.
The thing that I keep wondering about: If the coding part is 2-5x more productive for you, but the stuff around the coding doesn't change... at some point, it'll have to, right? The cost/benefit of a lot of practices (this article talks about code review, which is a big one) changes a lot if coding becomes significantly easier relative to other tasks.
Why would it have to? Most of the job is trying to understand what people really want and that’s not any easier than it was 10 or 30 years ago
Yes, absolutely. Code used to be more expensive to write, which meant that a lot of features weren't sensible to build - the incremental value they provided wasn't worth the implementation effort.
Now when I'm designing software there are all sorts of things where I'm much less likely to think "nah, that will take too long to type the code for".
If the 10x thing was true we’d all be on iOS 24 and the typescript Go rewrite would have been done months ago.
But of course that’s ridiculous.
Speed of shipping software and pace of writing code are different things. Shipping software like iOS has a <50% component of programming so Amdahl's law caps the end-to-end improvement rather low, assuming other parts of the process stay the same.
> and the typescript Go rewrite would have been done months ago.
10x is intended to symbolize a multiplier. As Microsoft fired that guy, 10 × 0 is still 0.
At first I thought becoming “10x” meant outputting 10x as much code. Now that I’m using Claude more as an expensive rubber duck, I’m hoping that I spend more time defining the fundamentals correctly that will lead to a large improvement in outcomes in the long run.
| I think that's an under-estimation
I'm not sure it is and I'll take it a step further:
Over the course of development, efficiency gains trend towards zero.
AI has a better case for increasing surface area (what an engineer is capable of working on) and effectiveness, but efficiency is a mirage.
It lets me try things I couldn't commit the time to in the past, like quickly cobbling together a keystroke macro. I can also put together the outline of a plan in a few minutes. So much more can be 'touched' upon usefully.
I completely agree. I saw the claims about 30% increase in dev productivity a while ago and thought how is that possible when most of my job consists of meetings, SARs, threat modeling, etc.
The problem is, if you don’t achieve 10x the venture case won’t work out properly.
I don't doubt that some people are mistaken or dishonest in their self-reports as the article asserts, but my personal experience at least is a firm counterexample.
I've been heavily leaning on AI for an engagement that would otherwise have been impossible for me to deliver to the same parameters and under the same constraints. Without AI, I simply wouldn't have been able to fit the project into my schedule, and would have turned it down. Instead, not only did I accept and fit it into my schedule, I was able to deliver on all stretch goals, put in much more polish and automated testing than originally planned, and accommodate a reasonable amount of scope creep. With AI, I'm now finding myself evaluating other projects to fit into my schedule going forward that I couldn't have considered otherwise.
I'm not going to specifically claim that I'm an "AI 10x engineer", because I don't have hard metrics to back that up, but I'd guesstimate that I've experienced a ballpark 10x speedup for the first 80% of the project and maybe 3 - 5x+ thereafter depending on the specific task. That being said, there was one instance where I realized halfway through typing a short prompt that it would have been faster to make those particular changes by hand, so I also understand where some people's skepticism is coming from if their impression is shaped by experiences like that.
I believe the discrepancy we're seeing across the industry is that prompt-based engineering and traditional software engineering are overlapping but distinct skill sets. Speaking for myself, prompt-based engineering has come naturally due to strong written communication skills (e.g. experience drafting/editing/reviewing legal docs), strong code review skills (e.g. participating in security audits), and otherwise being what I'd describe as a strong "jack of all trades, master of some" in software development across the stack. On the other hand, for example, I could easily see someone who's super 1337 at programming high-performance algorithms and mid at most everything else finding that AI insufficiently enhances their core competency while also being difficult to effectively manage for anything outside of that.
As to how I actually approach this:
* Gemini Pro is essentially my senior engineer. I use Gemini to perform codebase-wide analyses, write documentation, and prepare detailed sprint plans with granular todo lists. Particularly for early stages of the project or major new features, I'll spend a several hours at a time meta-prompting and meta-meta-prompting with Gemini just to get a collection of prompts, documents, and JSON todo lists that encapsulate all of my technical requirements and feedback loops. This is actually harder than manual programming because I don't get the "break" of performing out all the trivial and boilerplate parts of coding; my prompts here are much more information-dense than code.
* Claude Sonnet is my coding agent. For Gemini-assisted sprints, I'll fire Claude off with a series of pre-programmed prompts and let it run for hours overnight. For smaller things, I'll pair program with Claude directly and multitask while it codes, or if I really need a break I'll take breaks in between prompting.
* More recently, Grok 4 through the Grok chat service is my Stack Overflow. I can't rave enough about it. Asking it questions and/or pasting in code diffs for feedback gets incredible results. Sometimes I'll just act as a middleman pasting things back and forth between Grok and Claude/Gemini while multitasking on other things, and find that they've collaboratively resolved the issue. Occasionally, I've landed on the correct solution on my own within the 2 - 3 minutes it took for Grok to respond, but even then the second opinion was useful validation. o3 is good at this too, but Grok 4 has been on another level in my experience; its information is usually up to date, and its answers are usually either correct or at least on the right track.
* I've heard from other comments here (possibly from you, Simon, though I'm not sure) that o3 is great at calling out anti-patterns in Claude output, e.g. its obnoxious tendency to default to keeping old internal APIs and marking them as "legacy" or "for backwards compatibility" instead of just removing them and fixing the resulting build errors. I'll be giving this a shot during tech debt cleanup.
As you can see, my process is very different from vibe coding. Vibe coding is fine for prototyping, on for non-engineers with no other options, but it's not how I would advise anyone to build a serious product for critical use cases.
One neat thing I was able to do, with a couple days' notice, was add a script to generate a super polished product walkthrough slide deck with a total of like 80 pages of screenshots and captions covering different user stories, with each story having its own zoomed out overview of a diagram of thumbnails linking to the actual slides. It looked way better than any other product overview deck I've put together by hand in the past, with the bonus that we've regenerated it on demand any time an up-to-date deck showing the latest iteration of the product was needed. This honestly could be a pretty useful product in itself. Without AI, we would've been stuck putting together a much worse deck by hand, and it would've gotten stale immediately. (I've been in the position of having to give disclaimers about product materials being outdated when sharing them, and it's not fun.)
Anyway, I don't know if any of this will convince anyone to take my word for it, but hopefully some of my techniques can at least be helpful to someone. The only real metric I have to share offhand is that the project has over 4000 (largely non-trivial) commits made substantially solo across 2.5 months on a part-time schedule juggled with other commitments, two vacations, and time spent on aspects of the engagement other than development. I realize that's a bit vague, but I promise that it's a fairly complex project which I feel pretty confident I wouldn't have been capable of delivering in the same form on the same schedule without AI. The founders and other stakeholders have been extremely satisfied with the end result. I'd post it here for you all to judge, but unfortunately it's currently in a soft launch status that we don't want a lot of attention on just yet.
If 10x could be believed, we're long enough into having AI-coding assist that any such company that had gone all in would be head and shoulders above their competitors by now.
And we're not seeing that at all. The companies whose software I use that did announce big AI initiatives 6 months ago, if they really had gotten 10x productivity gain, that'd be 60 months—5 years—worth of "productivity". And yet somehow all of their software has gotten worse.
Looking forward to those 20x most productive days out of an LLMs. And what are those most productive days? The ones when you can simplify and delete hundreds of lines of code... :-)
> 0.2x increase
1.2x increase
> but I've never found those 10x claims convincing
Who's making these claims?
There was a YC video just a few months ago where a bunch of jergoffs sat in a circle and talked about engineers being 10 to 100x as effective as before. Im sure google will bring it up.
> Im sure google will bring it up.
It didn’t.
Found it https://youtu.be/IACHfKmZMr8
So this whole discussion started because some guy got triggered by a thinly veiled advertisement on a podcast run by a VC firm?
I’ll admit this is not helping the case of “but people are saying…”
You asked who’s making these silly claims, I provided one example of YC partners doing it. Not sure who got triggered or what advertising you are talking about, but there you go.
“10x? That’s crazy talk. I’m only 5x more productive, let’s be accurate sirs”
I'm skeptical of the 10x claims for different reasons than the author focuses on. The productivity gains might be real for individual tasks, but they're being measured wrong.
Most of the AI productivity stories I hear sound like they're optimizing for the wrong metric. Writing code faster doesn't necessarily mean shipping better products faster. In my experience, the bottleneck is rarely "how quickly can we type characters into an editor" - it's usually clarity around requirements, decision-making overhead, or technical debt from the last time someone optimized for speed over maintainability.
The author mentions that real 10x engineers prevent unnecessary work rather than just code faster. That rings true to me. I've seen more productivity gains from saying "no" to features or talking teams out of premature microservices(or adopting Kafka :D) than from any coding tool.
What worries me more is the team dynamic this creates. When half your engineers feel like they're supposed to be 10x more productive and aren't, that's a morale problem that compounds. The engineers who are getting solid 20-30% gains from AI (which seems realistic) start questioning if they're doing it wrong.
Has anyone actually measured this stuff properly in a production environment with consistent teams over 6+ months? Most of the data I see is either anecdotal or from artificial coding challenges.
Olympic athletes don't exist because no one at my gym runs that fast.
You are right that typing speed isn't the bottleneck, but wrong about what AI actually accelerates. The 10x engineers aren't typing faster they're exploring 10 different architectural approaches in the time it used to take to try one, validating ideas through rapid prototyping, automating the boring parts to focus on the hard decisions.
You can't evaluate a small sample size of people who are not exploiting the benefits well and come to an accurate assessment of the utility of a new technology.
Skill is always a factor.
There’s something ironic here. For decades, we dreamed of semi-automating software development. CASE tools, UML, and IDEs all promised higher-level abstractions that would "let us focus on the real logic."
Now that LLMs have actually fulfilled that dream — albeit by totally different means — many devs feel anxious, even threatened. Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
I think Colton’s article nails the emotional side of this: imposter syndrome isn’t about the actual 10x productivity (which mostly isn't real), it’s about the perception that you’re falling behind. Meanwhile, this perception is fueled by a shift in what “software engineering” looks like.
LLMs are effectively the ultimate CASE tools — but they arrived faster, messier, and more disruptively than expected. They don’t require formal models or diagrams. They leap straight from natural language to executable code. That’s exciting and unnerving. It collapses the old rites of passage. It gives power to people who don’t speak the “sacred language” of software. And it forces a lot of engineers to ask: What am I actually doing now?
I now understand what artists felt when seeing stable diffusion images - AI code is often just wrong - not in the moral sense, but it contains tons of bugs, weirdness, excess and peculiarities you'd never be happy to see in a real code base. Often getting rid of all of this, takes comparable amount of time as doing the job in the first place.
Now I can always switch to a different model, increase the context, prompt better etc. but I still feel that actual good quality AI code is just out of arms reach, or when something clicks, and the AI magically starts producing exactly what I want, that magic doesn't last.
Like with stable diffusion, people who don't care as much or aren't knowledgeable enough to know better, just don't get what's wrong with this.
A week ago, I received a bug ticket claiming one of the internal libs i wrote didn't work. I checked out the reporter's code, which was full of weird issues (like the debugger not working and the typescript being full of red squiggles), and my lib crashed somewhere in the middle, in some esoteric minified js.
When I asked the guy who wrote it what's going on, he admitted he vibe coded the entire project.
The comparison to art is apt. Generated art gets the job done for most people. It's good enough. Maybe it's derivative, maybe there are small inaccuracies, but it is available instantly for free and that's what matters most. Same with code, to many people.
And the knock-on effect is that there is less menial work. Artists are commissioned less for the local fair, their friend's D&D character portrait, etc. Programmers find less work building websites for small businesses, fixing broken widgets, etc.
I wonder if this will result in fewer experts, or less capable ones. As we lose the jobs that were previously used to hone our skills will people go out of their way to train themselves for free or will we just regress?
Artistic paintings are not technical artwork like computer programs or circuit boards. Nothing falls down if something is out of place.
A schematic of a useless amplifier that oscillates looks just as pretty as one of a correct amplifier. If we just want to use it as a repeated print for the wallpaper of an electronic lab, it doesn't matter.
Shoe cobblers were once a large respected career of expert professionals. They still exist.
> When I asked the guy who wrote it what's going on, he admitted he vibe coded the entire project.
This really irritates me. I’ve had the same experience with teammates’ pull requests they ask me to review. They can’t be bothered to understand the thing, but then expect you to do it for them. Really disrespectful.
At same time, there's also a huge of annoying Tech-brothers constantly shouting at artists something like, 'Your work was never valuable to begin with; why can't I copy your style? You're nothing but another matrix.'
I think if you're paying any attention to the state of the world, you can see labor is getting destroyed by capital - bad wages, worse working conditions including more surveillance, metrics everywhere, immoral companies, short contracts and unstable companies/career paths, increasing monopolization and consolidation of power. We were so insulated from this for so long that it's easy to not really grasp how bad things are for most workers. Now the precarity of our situation is dawning on us.
omg not everything is late stage capitalism
"Software engineering" will become vibe fixing.
There's many jobs that can be eliminated with software, but haven't because managers don't want to hire SWEs without proven value. I don't think HN realizes how big that market is.
With AI, the managers will replace their employees with a bunch of code they don't understand, watch that code fail in 3 years, and have to hire SWEs to fix it.
I'd bet those jobs will outnumber the ones initially eliminated by having non-technical people deliver the first iteration.
Many of those jobs will be high-skill/impact because they are necessarily focused on fixing stuff AI can't understand.
I try using an LLM for coding now and then, and tried again today with giving a model dedicated to coding a rather straight forward prompt and task.
The names all looked right, the comments were descriptive, it has test cases demonstrating the code work. It looks like something I'd expect a skilled junior or a senior to write.
The thing is, the code didn't work right, and the reasons it didn't work were quite subtle. Nobody would have fixed it without knowing how to have done it in the first place, and it took me nearly as long to figure out why as if I'd just written it myself in the first place.
I could see it being useful to a junior who hasn't solved a particular problem before and wanted to get a starting point, but I can't imagine using it as-is.
> They don’t require formal models or diagrams.
Nor do they produce those (do they?). That is what I would like to see. Formal models and diagrams are not needed to produce code. Their point is that they allow us to understand code and to formalize what we want it to do. That's what I'm hoping AI could do for me.
You miss the fundamental constraint. The bottleneck in software development was never typing speed or generation, but verification and understanding.
Even if LLMs worked perfectly without hallucinations (they don't and might never), a conscientious developer must still comprehend every line before shipping it. You can't review and understand code 10x faster just because an LLM generated it.
In fact, reviewing generated code often takes longer because you're reverse-engineering implicit assumptions rather than implementing explicit intentions.
The "10x productivity" narrative only works if you either:
- Are not actually reviewing the output properly
or
- Are working on trivial code where correctness doesn't matter.
Real software engineering, where bugs have consequences, remains bottlenecked by human cognitive bandwidth, not code generation speed. LLMs shifted the work from writing to reviewing, and that's often a net negative for productivity.
> Even if LLMs worked perfectly without hallucinations (they don't and might never), a conscientious developer must still comprehend every line before shipping it.
This seems excessive to me. Do you comprehend the machine code output of a compiler?
False analogy.
I must comprehend code at the abstraction level I am working at. If I write Python, I am responsible for understanding the Python code. If I write Assembly, I must understand the Assembly.
The difference is that Compilers are deterministic with formal specs. I can trust their translation. LLMs are probabilistic generators with no guarantees. When an LLM generates Python code, that becomes my Python code that I must fully comprehend, because I am shipping it.
That is why productivity is capped at review speed, you can't ship what you don't understand, regardless of who or what wrote it.
Compilers definitely don't have formal specs. Even CompCert mostly but doesn't entirely have them.
It can actually be worse when they do. Formalizing behavior means leaving out behavior that can't be formalized, which basically means if your language has undefined behavior then the handling of that will be maximally confusing, because your compiler can no longer have hacks for handling it in a way that "makes sense".
I think the analogy holds within the context of the statement I replied to: > Even if LLMs worked perfectly without hallucinations...
Speaking of irony... did ChatGPT help you write this comment?
Please stop assuming that every comment that includes an em dash is AI. Em dashes are very useful!
Back in the day i used square brackets [but now i think they are too heavy].
Doesn't read like it to me. It has "reddit spacing", but enough information density and colloquialism that it's probably a person.
It doesn't read to me like AI content. It's also against HN guidelines to randomly suggest comments are AI generated.
The use of en dashes and short staccato sentences for rhetorical flourishes are a giveaway. AI writes like a LinkedIn post.
> Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
is what raised flags in my head. Rather than explain the difference between glorified autocompletion and generation, the post assumes there is a difference then uses florid prose to hammer in the point it didn't prove.
I've heard the paragraph "why? Because X. Which is not Y. And abcdefg" a hundred times. Deepseek uses it on me every time I ask a question.
Here’s the thing though…if you read enough of it, you’re gonna start using it a lot more often. It’s not just AI slop, it’s fundamentally rewiring how we as a society think in real time! It’s the classic copycat AI mannerisms cried wolf Problem!
Haha, got a laugh out of me assuming you were intentionally demonstrating by example :)
Alternately, the AI learned it from us, because we write that way.
Which came first...
I definitely didn't "randomly" suggest it, unless you're suggesting all human actions are the result of randomness. I also just re-read the guidelines and I didn't see anything about in the letter of the law, but I agree it probably goes against the spirit. I'll take the downvotes and keep it to myself next time.
I think it's human-written but meant to sound like GPT cliches. Deliberately laying it on thick, as a pisstake.
Very interesting perspective. Thanks for sharing!
Let's connect on HackedIn!
It kills the magic of coding for sure. The thing is. Now with everyone doing it, you get a ton of slop. Computing’s become saturated as hell. We don’t even need more code as it is. Before LLMs you could pretty much find what you needed on github… Now it’s even worse.
AI slop.
- it’s not just X, it’s Y
- emdashes everywhere
People object to CASE tools also I think.
And while I don't categorically object to AI tools, I think your selling objections to them short.
It's completely legitimate to want an explainable/comprehendable/limited-and-defined tool rather than a "it just works" tool. Ideally, this puts one in an "I know its right" position rather than a "I scanned it and it looks generally right and seems to work" position.
lmao well done. can't tell if it's the topvoted reply because people got the joke, or because they didn't.
In many ways this feels like average software engineers telling on themselves. If you know the tech you're building, and you're good at splitting up your work, then you know ahead of time where the complexity is and you can tell the AI what level of granularity to build at. AI isn't magic; there is an upper limit to the complexity of a program that e.g. Sonnet 4 can write at once. If you can grok that limit, and you can grok the tech of your project, then you can tell the AI to build individual components that stay below that threshold. That works really well.
This is tautological. If you keep instructions dumbed-down enough for AI to work well, it will work well.
The problem is that AI needs to be spoon-fed overly detailed dos and donts, and even then the output can't be trusted without carefully checking it. It's easy to reach a point where breaking down the problem into pieces small enough for AI to understand takes more work than just writing the code.
AI may save time when it generates the right thing on the first try, but that's a gamble. The code may need multiple rounds of fixups, or end up needing a manual rewrite anyway, after wasting time and effort on instructing the AI. The ceiling of AI capabilities is very uneven and unpredictable.
Even worse, the AI can confidently generate code that looks superficially correct, but has subtle bugs/omissions/misinterpretations that end up costing way more time and effort than the AI saved. It has uncanny ability to write nicely structured, well-commented code that is just wrong.
But the hard part is figuring out the more complex parts. Getting that right is what takes the time, not typing in the more trivial parts.
The point is that good software engineers are good at doing the "hard part". So good that they have a backlog of "trivial" typing tasks. In a well functioning organization they would hand off the backlog of trivial parts to less experienced engineers, who might be herded by a manager. Now we don't need the less experienced engineers or the manager to herd them.
Not typing the trivial parts is pretty great though
I think most developers bypass the typing of the trivial part by just using a library or a framework. And sometimes typing trivial things can be relaxing, especially after an intense bout with a complex thing.
Being forced to type in trivial boilerplate means you're very motivated to abstract it. Not saying this'll offset anything but I can see AI making codebases much more verbose
Until it spends 10 minutes fucking up the trivial part and then youre 10 minutes down and you still have to do it yourself.
It can be, but if you're familiar with what you're working with and have experience with other systems that have transferrable knowledge, again, it can be an advantage.
I was surprised with claude code I was able to get a few complex things done that I had anticipated to be a few weeks to uncover, stitch together and get moving.
Instead I pushed Claude to consistently present the correct udnerstanding of the problem, strucutre, approach to solving things, and only after that was OK, was it allowed to propose changes.
True to it's shiny things corpus, it will over complicate things because it hasn't learned that less is more. Maybe that reflects the corpus of the average code.
Looking at how folks are setting up their claude.md and agents can go a long way if you haven't had a chance yet.
Is the implication here that you consider yourself an above-average engineer?
might it not be the other way round? For all we know its mediocre devs who are relishing the prospect of doing jack shit all day and still being able to submit some auto generated PRs. Being "amazed" at what it produces when someone with higher standards might be less than amazed.
I find it impossible to work out who to trust on the subject, given that I'm not working directly with them, so remain entirely on the fence.
Of course there is an upper limit for AI. There's an upper limit for humans too.
What you need is just boring project management. Have a proper spec, architecture and tasks split into manageable chunks with enough information to implement them.
Then you just start watching TV and say "implement github issue #42" to Claude and it'll get on with it.
But if you say "build me facebook" and expect a shippable product, you'll have a bad time.
I agree, and the fact that in their list of scenarios that cause these not actually mentioning some people actually are 10x definitely points to them not being self-aware.
I'd be curious how skill atrophy affects engineers who use AI semi-exclusively for these trivial tasks.
I tried doing all of my work for ~1 month with copilot/claude. It didn’t cause a ton of atrophy, because it didn’t work - I couldn’t avoid actually getting involved in rewriting code
Sure, absolutely. But that's the hard part of software engineering. The dream is composing software from small components and adding new features by just writing more small components.
But nobody has ever managed to get there despite decades of research and work done in this area. Look at the work of Gerald Sussman (of SICP fame), for example.
So all you're saying is it makes the easy bit easier if you've already done, and continue to do, the hard bit. This is one of the points made in TFA. You might be able to go 200mph in a straight line, but you always need to slow down for the corners.
You didn't read the article
I thought this would be another AI hate article, but it made some great points.
One thing that AI has helped me with is finding pesky bugs. I mainly work on numerical simulations. At one point I was stuck for almost a week trying to figure out why my simulation was acting so strange. Finally I pulled up chatgpt, put some of my files into the context and wrote a prompt explaining the strange behavior and what I thought might be happening. In a few seconds it figured out that I had improperly scaled one of my equations. It came down to a couple missing parentheses, and once I fixed it the simulation ran perfectly.
This has happened a few times where AI was easily able to see something I was overlooking. Am I a 10x developer now that I use AI? No... but when used well, AI can have a hugely positive impact on what I am able to get done.
This is my experience. Code generation is OK if uneven, but debugging can be a big boost.
It’s a rubber duck that’s pretty educated and talks back.
Indeed. As a (mostly) hobbyist programmer LLMs have been a godsend for those late night coding sessions when the brain fog is thick.
Yep same experience here saved me an infinite amount of time so to me that puts me somewhere between 10x and infinity ha
I don't consider myself a 10x engineer. The number one thing that I've realized makes me more productive than other engineers at my company is thinking through system design and business needs with patterns that don't take badly written product tickets literally.
What I've seen with AI is that it does not save my coworkers from the pain of overcomplicating simple things that they don't really think through clearly. AI does not seem to solve this.
> What I've seen with AI is that it does not save my coworkers from the pain of overcomplicating simple things that they don't really think through clearly. AI does not seem to solve this.
100%. The biggest challenge with software is not that it’s too hard to write, but that it’s too easy to write.
I don't consider myself a 2x engineer; my company tells me that by not paying me 2x vs my colleagues, even if I know (and others believe that too) I deliver more than 2x their output.
Using AI will change nothing in this context.
Counter: you are looking at it wrong. You can get work done in 1/2 of the time it used to. Now you got 1/2 of the day to just mess around. Socialize or network. It’s not necessarily that you’re producing 2x.
> You can get work done in 1/2 of the time it used to. Now you got 1/2 of the day to just mess around. Socialize or network.
This has never been the case in any company I've ever worked at. Even if you can finish your day's work in, say, 4 hours, you can't just dip out for the other 4 hours of the day.
Managers and teammates expect you to be available at the drop of a hat for meetings, incidents, random questions, "emergencies", etc.
Most jobs I've worked at eventually devolve into something like "Well, I've finished what I wanted to finish today. I could either stare at my monitor for the rest of the day waiting for something to happen, or I could go find some other work to do. Guess I'll go find some other work to do since that's slightly less miserable".
You also have to delicately "hide" the fact that you can finish your work significantly faster than expected. Otherwise the expectations of you change and you just get assigned more work to do.
I'll go even further. I've been the guy who gets all his work done way faster than others. You know what happens? I get assigned way more work than most people until I am overflowing and the literal bottleneck is my peers being able to code review everything I do. Yet, I am still blamed for overproducing then too cause now I am creating too much work for my peers!
Literally unwinnable scenarios. Only way to succeed is to just sit your ass in the chair. Almost no manager actually cares about your actual output - they all care about presentation and appearances.
That‘s corporate jobs for you. It‘s about appearance, not results. That‘s why you make a big deal out of everything you work on.
If you're remote, you can. This is the crux of why a lot of developers love remote work and management hates it.
Remote is dead. They clawed it back as soon as they could. I’d argue it’s even hard to get a remote position now, then it was before COVID.
> If you're remote, you can
Uh, no?
It‘s easier, since you don‘t have to stare at your monitor for 4 hours straight. But still, people expect availability since you‘re paid for 8 hours.
No, but you can go onto hn and shitpost every 30 minutes, instead of only being able to do it twice a day previously.
This is the way.
I had a task to do a semi-complex UI addition, the whole week was allocated for that.
I sicked the corp approved Github Copilot with 4o and Claude 3.7 at it and it was done in an afternoon. It's ~95% functionally complete, but ugly as sin. (The model didn't understand our specific Tailwind classes)
Now I can spend the rest of the week on polish.
The first red flag there is "2x their output". You can find many an anecdote where a good engineer produced better solution in fewer lines of code (or sometimes, by removing code — the holy grail).
So always aim for outcomes, not output :)
At my company, we did promote people quickly enough that they are now close to double their salaries when they started a year or so ago, due to their added value as engineers in the team. It gets tougher as they get into senior roles, but even there, there's quite a bit of room for differentiation.
Additionally, since this is a market, you should not even expect to be paid twice for 2x value provided — then it makes no difference to a company if they get two 1x engineers instead, and you are really not that special if you are double the cost. So really, the "fair" value is somewhere in between: 1.5x to equally reward both parties, or leaning one way or the other :)
When I go to buy 2 bottles of milk I am never offered to get it for 1.x the price of one bottle. I don't see any way it is fair to deliver double and get just 1.5x, in a hypothetical scenario just for the sake of the discussion. The suggestion to work 50% of the time and relax, socialize and network the other 50% is way more reasonable, when possible (not in my case).
this is pedantry but isn't that literally what BOGO or similar coupons do? so you're probably offered it a lot
No one said anything about lines of code. I would assume output here means features completed, tickets knocked out, tasks completed etc.
Even so, tickets munched out or tasks completed is still "output" — sometimes you could provide more value by avoiding tickets that are not bringing benefits to customers or business, solving things customers need and not what they think they need, suggesting solutions which are 5% of work yet provide 90% of the value, etc.
My job is to do what's asked of me. Do the stories, knock out the tickets. It helps me do that faster. That's a crazy far goalpost move from you.
This article sets a ludicrous bar ("10x"), then documents the author's own attempt over some indeterminate time to clear that bar. As a result, the author has classified all the AI-supporters in the industry into three categories: (1) people who are wrong in good faith, (2) people who are selling AI tools, and (3) evil bosses trying to find leverage in programmer anxiety.
That aside: I still think complaining about "hallucination" is a pretty big "tell".
> I still think complaining about "hallucination" is a pretty big "tell".
The conversation around LLMs is so polarized. Either they’re dismissed as entirely useless, or they’re framed as an imminent replacement for software developers altogether.
Hallucinations are worth talking about! Just yesterday, for example, Claude 4 Sonnet confidently told me Godbolt was wrong wrt how clang would compile something (it wasn’t). That doesn’t mean I didn’t benefit heavily from the session, just that it’s not a replacement for your own critical thinking.
Like any transformative tool, LLMs can offer a major productivity boost but only if the user can be realistic about the outcome. Hallucinations are real and a reason to be skeptical about what you get back; they don’t make LLMs useless.
To be clear, I’m not suggesting you specifically are blind to this fact. But sometimes it’s warranted to complain about hallucinations!
That's not what people mean when they bring up "hallucinations". What the author apparently meant was that they had an agent generating Terraform for them, and that Terraform was broken. That's not surprising to me! I'm sure LLMs are helpful for writing Terraform, but I wouldn't expect that agents are at the point of being able to reliably hand off Terraform that actually does anything, because I can't imagine an agent being given permission to iterate Terraform. Now have an agent write Java for you. That problem goes away: you aren't going to be handed code with API calls that literally don't exist (this is what people mean by "hallucination"), because that could wouldn't pass a compile or linter pass.
Are we using the same LLMs? I absolutely see cases of "hallucination" behavior when I'm invoking an LLM (usually sonnet 4) in a loop of "1 generate code, 2 run linter, 3 run tests, 4 goto 1 if 2 or 3 failed".
Usually, such a loop just works. In the cases where it doesn't, often it's because the LLM decided that it would be convenient if some method existed, and therefore that method exists, and then the LLM tries to call that method and fails in the linting step, decides that it is the linter that is wrong, and changes the linter configuration (or fails in the test step, and updates the tests). If in this loop I automatically revert all test and linter config changes before running tests, the LLM will receive the test output and report that the tests passed, and end the loop if it has control (or get caught in a failure spiral if the scaffold automatically continues until tests pass).
It's not an extremely common failure mode, as it generally only happens when you give the LLM a problem where it's both automatically verifiable and too hard for that LLM. But it does happen, and I do think "hallucination" is an adequate term for the phenomenon (though perhaps "confabulation" would be better).
Aside:
> I can't imagine an agent being given permission to iterate Terraform
Localstack is great and I have absolutely given an LLM free rein over terraform config pointed at localstack. It has generally worked fine and written the same tf I would have written, but much faster.
With terraform, using a property or a resource that doesn't exist is effectively the same as an API call that does not exist. It's almost exactly the same really, because under the hood terraform will try to make a gcloud/aws API call with your param and it will not work because it doesn't exist. You are making a distinction without a difference. Just because it can be caught at runtime doesn't make it insignificant.
Anyway, I still see hallucinations in all languages, even javascript, attempting to use libraries or APIs that do not exist. Could you elaborate on how you have solved this problem?
> Anyway, I still see hallucinations in all languages, even javascript, attempting to use libraries or APIs that do not exist. Could you elaborate on how you have solved this problem?
Gemini CLI (it's free and I'm cheap) will run the build process after making changes. If an error occurs, it will interpret it and fix it. That will take care of it using functions that don't exist.
I can get stuck in a loop, but in general it'll get somewhere.
Yeah, again, zero trouble believing that agents don't reliably produce sane Terraform.
As if a compiler or linter is the sole arbiter of correctness.
Nobody said anything about "correctness". Hallucinations aren't bugs. Everybody writes bugs. People writing code don't hallucinate.
It's a pretty obvious rhetorical tactic: everybody associates "hallucination" with something distinctively weird and bad that LLMs do. Fair enough! But then they smuggle more meaning into the word, so that any time an LLM produces anything imperfect, it has "hallucinated". No. "Hallucination" means that an LLM has produced code that calls into nonexistent APIs. Compilers can and do in fact foreclose on that problem.
Speaking of rhetorical tactics, that's an awfully narrow definition of LLM hallucination designed to evade the argument that they hallucinate.
If, according to you, LLMs are so good at avoiding hallucinations these days, then maybe we should ask an LLM what hallucinations are. Claude, "in the context of generative AI, what is a hallucination?"
Claude responds with a much broader definition of the term than you have imagined -- one that matches my experiences with the term. (It also seemingly matches many other people's experiences; even you admit that "everybody" associates hallucination with imperfection or inaccuracy.)
Claude's full response:
"In generative AI, a hallucination refers to when an AI model generates information that appears plausible and confident but is actually incorrect, fabricated, or not grounded in its training data or the provided context.
"There are several types of hallucinations:
"Factual hallucinations - The model states false information as if it were true, such as claiming a historical event happened on the wrong date or attributing a quote to the wrong person.
"Source hallucinations - The model cites non-existent sources, papers, or references that sound legitimate but don't actually exist.
"Contextual hallucinations - The model generates content that contradicts or ignores information provided in the conversation or prompt.
"Logical hallucinations - The model makes reasoning errors or draws conclusions that don't follow from the premises.
"Hallucinations occur because language models are trained to predict the most likely next words based on patterns in their training data, rather than to verify factual accuracy. They can generate very convincing-sounding text even when "filling in gaps" with invented information.
"This is why it's important to verify information from AI systems, especially for factual claims, citations, or when accuracy is critical. Many AI systems now include warnings about this limitation and encourage users to double-check important information from authoritative sources."
What is this supposed to convince me of? The problem with hallucinations is (was?) that developers were getting handed code that couldn't possibly have worked, because the LLM unknowingly invented entire libraries to call into that don't exist. That doesn't happen with agents and languages with any kind of type checking. You can't compile a Rust program that does this, and agents compile Rust code.
Right across this thread we have the author of the post saying that when they said "hallucinate", they meant that if they watched they could see their async agent getting caught in loops trying to call nonexistent APIs, failing, and trying again. And? The point isn't that foundation models themselves don't hallucinate; it's that agent systems don't hand off code with hallucinations in it, because they compile before they hand the code off.
If I ask an LLM to write me a skip list and it instead writes me a linked list and confidently but erroneously claims it's a skip list, then the LLM hallucinated. It doesn't matter that the code compiled successfully.
Get a frontier model to write an slist when you asked for a skip list. I'll wait.
Hi there! I appreciate your comment, and I remember reading your article about AI and some of the counterarguments to it helped me get over the imposter syndrome I was feeling.
To be clear, I did not classify "all the AI-supporters" as being in those three categories, I specifically said the people posting that they are getting 10x improvements thanks to AI.
Can you tell me about what you've done to no longer have any hallucinations? I notice them particularly in a language like Terraform, the LLMs add properties that do not exist. They are less common in languages like Javascript but still happen when you import libraries that are less common (e.g. DrizzleORM).
Can you help me understand which articles you're referring to? A link to the biggest "AI made me a 10x developer" article you've read would certainly clear this up.
My goal here was not to publicly call out any specific individual or article. I don't want to make enemies and I don't want to be cast as dunking on someone. I get that that opens me up to criticism that I'm fighting a strawman, I accept that.
Your article does not specifically say 10x, but it does say this:
> Kids today don’t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They’ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.
> “I’m sipping rocket fuel right now,” a friend tells me. “The folks on my team who aren’t embracing AI? It’s like they’re standing still.” He’s not bullshitting me. He doesn’t work in SFBA. He’s got no reason to lie.
That's not quantifying it specifically enough to say "10x", but it is saying no uncertain terms that AI engineers are moving fast and everyone else is standing still by comparison. Your article was indeed one of the ones I specifically wanted to respond to as the language directly contributed to the anxiety I described here. It made me worry that maybe I was standing still. To me, the engineer you described as sipping rocket fuel is an example both of the "degrees of separation" concept (it confuses me you are pointing to a third party and saying they are trustworthy, why not simply describe your workflow?), and the idea that a quick burst of productivity can feel huge but it just doesn't scale in my experience.
Again, can you tell me about what you've done to no longer have any hallucinations? I'm fully open to learning here. As I stated in the article, I did my best to give full AI agent coding a try, I'm open to being proven wrong and adjusting my approach.
I believe that quote in Thomas’ blog can be attributed to me. I’ve at least said something near enough to him that I don’t mind claiming it.
I _never_ made the claim that you could call that 10x productivity improvement. I’m hesitant to categorize productivity in software in numeric terms as it’s such a nuanced concept.
But I’ll stand by my impression that a developer using ai tools will generate code at a perceptibly faster pace than one who isn’t.
I mentioned in another comment the major flaw in your productivity calculation, is that you aren’t accounting for the work that wouldn’t have gotten done otherwise. That’s where my improvements are almost universally coming from. I can improve the codebase in ways that weren’t justifiable before in places that do not suffer from the coordination costs you rightly point out.
I no longer feel like my peers are standing still, because they’ve nearly uniformly adopted ai tools. And again, you rightly point out, there isn’t much of a learning curve. If you could develop before them you can figure out how to improve with them. I found it easier than learning vim.
As for hallucinations I don’t experience them effectively _ever_. And I do let agents mess with terraform code (in code bases where I can prevent state manipulation or infrastructure changes outside of the agents control).
I don’t have any hints on how. I’m using a pretty vanilla Claude code setup. But im not sure how an agent that can write and run compile/test loops could hallucinate.
Appreciate the comment!
> I mentioned in another comment the major flaw in your productivity calculation, is that you aren’t accounting for the work that wouldn’t have gotten done otherwise. That’s where my improvements are almost universally coming from. I can improve the codebase in ways that weren’t justifiable before in places that do not suffer from the coordination costs you rightly point out.
I'm a bit confused by this. There is work that apparently is unlocking big productivity boosts but was somehow not justified before? Are you referring to places like my ESLint rule example, where eliminating the startup costs of learning how to write one allows you to do things you wouldn't have previously bothered with? If so, I feel like I covered this pretty well in the article and we probably largely agree on the value that productivity boost. My point is still stands that that doesn't scale. If this is not what you mean, feel free to correct me.
Appreciate your thoughts on hallucinations. My guess is the difference between what we're experiencing is that in your code hallucinations are still happening but getting corrected after tests are run, whereas my agents typically get stuck in these write-and-test loops and can't figure out how to solve the problem, or it "solves" it by deleting the tests or something like that. I've seen videos and viewed open source AI PRs which end up in similar loops as to what I've experienced, so I think what I see is common.
Perhaps that's an indication of that we're trying to solve different problems with agents, or using different languages/libraries, and that explains the divergence of experiences. Either way, I still contend that this kind of productivity boost is likely going to be hard to scale and will get tougher to realize as time goes on. If you keep seeing it, I'd really love to hear more about your methods to see what I'm missing. One thing that has been frustrating me is that people rarely share their workflows after makign big claims. This is unlike previous hype cycles where people would share descriptions of exactly what they did ("we rewrote in Rust, here's how we did it", etc.) Feel free to email me at the address in my about page[1] or send me a request on LinkedIn or whatever. I'm being 100% genuine that I'd love to learn from you!
> but getting corrected after tests are run, whereas my agents typically get stuck in these write-and-test loops
This maybe a definition problem then. I don’t think “the agent did a dumb thing that it can’t reason out of” is a hallucination. To me a hallucination is a pretty specific failure mode, it invents something that doesn’t exist. Models still do that for me but the build test loop sets them aright on that nearly perfectly. So I guess the model is still hallucinating but the agent isn’t so the output is unimpacted. So I don’t care.
For the agent is dumb scenario, I aggressively delete and reprompt. This is something I’ve actually gotten much better at with time and experience, both so it doesn’t happen often and I can course correct quickly. I find it works nearly as well for teaching me about the problem domain as my own mistakes do but is much faster to get to.
But if I were going to be pithy. Aggressively deleting work output from an agent is part of their value proposition. They don’t get offended and they don’t need explanations why. Of course they don’t learn well either, that’s on you.
What I'm saying is that the model will get into one of these loops where it needs to be killed, and I'll look at some of the intermediate states and the reasons for failure and they are because it hallucinated things, ran tests, got an error. Does that make sense?
Deleting and re-prompting is fine. I do that too. But even one cycle of that often means the whole prompting exercise takes me longer than if I just wrote the code myself.
I think maybe this is another disconnect. A lot of the advantage I get does not come from the agent doing things faster than me, though for most tasks it certainly can.
A lot of the advantage is that it can make forward progress when I can’t. I can check to see if an agent is stuck, and sometimes reprompt it, in the downtime between meetings or after lunch before I start whatever deep thinking session I need to do. That’s pure time recovered for me. I wouldn’t have finished _any_ work with that time previously.
I don’t need to optimize my time around babysitting the agent. I can do that in the margins. Watching the agents is low context work. That adds the capability to generate working solutions during times that was previously barred from that.
I've done a few of these types of hands off and go to a meeting style interactions. It has worked a few times, but I tend to just find that they over do it or cause issues. Like you ask them to fix an error and they add a try catch, swallow the error, and call it a day. Or the PR has 1000 line changes when it should have two.
Either way, I'm happy that you are getting so much out of the tools. Perhaps I need to prompt harder, or the codebase I work on has just deviated too much from the stuff the LLMs like and simply isn't a good candidate. Either way, appreciate talking to you!
> One thing that has been frustrating me is that people rarely share their workflows after making big claims
Good luck ever getting that. I've asked that about a dozen times on here from people making these claims and have never received a response. And I'm genuinely curious as well, so I will continue asking.
People share this stuff all the time. Kenton Varda published a whole walkthrough[1], prompts and all. Stories about people's personal LLM workflows have been on the front page here repeatedly over the last few months.
What people aren't doing is proving to you that their workflows work as well as they say they do. You want proof, you can DM people for their rate card and see what that costs.
Thanks for sharing and that is interesting to read through. But it's still just a demo, not live production code. From the readme:
> As of March, 2025, this library is very new, prerelease software.
I'm not looking for personal proof that their workflows work as well as they say they do.
I just want an example of a project in production with active users depending on the service for business functions that has been written 1.5/2/5/10/whatever x faster than it otherwise would have without AI.
Anyone can vibe code a side project with 10 users or a demo meant to generate hype/sales interest. But I want someone to actually have put their money where their mouth is and give an example of a project that would have legal, security, or monetary consequences if bad code was put in production. Because those are the types of projects that matter to me when trying to evaluate people's claims (since those are what my paycheck actually depends on).
Do you have any examples like that?
Dude.
That code tptacek linked you to? It's part of our (Cloudflare's) MCP framework. Which means all of the companies mentioned in this blog post are using this code in production today: https://blog.cloudflare.com/mcp-demo-day/
There you go. This is what you are looking for. Why are you refusing to believe it?
(OK fine. I guess I should probably update the readme to remove that "prerelease" line.)
Lol misunderstanding a disclaimer in a readme is not refusing to believe something. But my apologies and appreciate the clarification.
Yeah OK fair that line in the readme is more prominent than I remember it being.
I never look at my own readmes so they tend to get outdated. :/
Fixing: https://github.com/cloudflare/workers-oauth-provider/pull/59
See, I just shared Kenton Varda describing his entire workflow, and you came back asking that I please show you a workflow that would find more credible. Do you want to learn about people's workflows, or do you want to argue with them that their workflows don't work? Nobody is interested in doing the latter with you.
I don't think you understood me at all. I don't care about the actual workflow. I just want an example of of a project that:
1. Would have legal, security, or monetary consequences if bad code was put in production
2. Was developed using an AI/LLM/Agent/etc that made the development many times faster than it otherwise would have (as so many people claim)
I would love to hear an example where "I used Claude to develop this hosting/ecommerce/analytics/inventory management service that is used in production by 50 paying companies. Using an LLM we deployed the project in 4 week where it would normally take us 4 months." Or "We updated an out of date code base for a client in half the time it would normally take and have not seen any issues since launch"
At the end of the day I code to get paid. And it would really help to be able to point to actual cases where both money and negative consequences of failure are on the line.
So if you have any examples please share. But the more people deflect the more skeptical I get about their claims.
Seems like I understand you pretty well! If you wanted to talk about workflows in a curious and open way, your best bet would have been finishing that comment with something other than "the more people deflect the more skeptical I get". Stay skeptical! You do you.
Sorry if I came of as prickly, but it wasn't exactly like your parent comment was much kinder.
I mean it's pretty simple - there are a lot of big claims that I read but very few tangible examples that people share where the project has consequences for failure. Someone else replied with some helpful examples in another thread. If you want to add another one feel free, if not that's cool too.
It almost feels like sealioning. People say nobody shares their workflow, so I share it. They say well that's not production code, so I point to PRs in active projects I'm using, and they say well that doesn't demonstrate your interactive flow. I point out the design documents and prompts and they say yes but what kind of setup do you do, which MCP servers are you running, and I point them at my MCP repo.
At some point you have to accept that no amount of proof will convince someone that refuses to be swayed. It's very frustrating because, while these are wonderful tools already, its clear that the biggest thing that makes a positive difference is people using and improving them. They're still in relative infancy.
I want to have the kind of conversations we had back at the beginning of web development, when people were delighted at what was possible despite everything being relatively awful.
I don't care about your workflow, that can be figured out from the 10,000 blog posts all describing the same thing. My issue is with people claiming this huge boost in productivity only to find out that they are working on code bases that have no real consequence if something fails, breaks, or doesn't work as intended.
Since my day job is creating systems that need to be operational and predictable for paying clients - examples of front end mockups, demos, apps with no users, etc don't really matter that much at the end of the day. It's like the difference between being a great speaker in a group of 3 friends vs standing up in front of a 30 person audience with your job on the line.
If you have some examples, I'd love to hear about them because I am genuinely curious.
Sure, I'm working on a database proxy in rust at the moment, if you hop on GitHub, same username. It's not pure AI in the PRs but I know approximately no Rust, so AI support has been absolutely critical. I added support for parsing binary timestamps from PG's wire format, as an example.
I spent probably a day building prompts and tests and getting an example of failing behavior in Python, and then I wrote pseudocode and had it implement and write comprehensive unit tests in rust. About three passes and manual review of every line. I also have an MCP that calls out to O3 as a second opinion code review and passes it back in
Very fun stuff
I use agentic flows writing code that deals with millions of pieces of financial data every day.
I rolled out a PR that was a one shot change to our fundamental storage layer on our hot path yesterday. This was part of a large codebase and that file has existed for four years. It hadn’t been touched in 2. I literally didn’t touch a text editor on that change.
I have first hand experience watching devs do this with payment processing code that handles over a billion dollars on a given day.
Thanks, it's quite helpful to hear examples like that.
When you say you didn't touch a text editor, do you mean you didn't review the code change or did you just look at the diff in the terminal/git?
I reviewed that PR in the GitHub web gui and in our CI/CD gui. It was one of several PRs that I was reviewing at the time, some by agents, some by people and some by a mix.
Because I was the instigator of that change a second code owner was required to approve the PR as well. That PR didn't require any changes, which is uncommon but not particularly rare.
It is _common_ for me to only give feedback to the agents via the GitHub gui the same way I do humans. Occasionally I have to pull the PR down locally and use the full powers of my dev environment to review but I don't think that is any more common than with people. If anything its less common because of the tasks the agents get typically they either do well or I kill the PR without much review.
> But I’ll stand by my impression that a developer using ai tools will generate code at a perceptibly faster pace than one who isn’t.
And this is the problem.
Masterful developers are the ones you pay to reduce lines of code, not create them.
Every. Single. Time. You say you get productivity gains from ai tools on the internet someone will tell you that you weren’t good at your job before the ai tooling.
Perhaps, start from the assumption that I have in fact spent a fair bit of time doing this job at a high level. Where does that mental exercise take you with regard to your own position on ai tools.
In fact, you don’t have to assume I’m qualified to speak on the subject. Your retort assumes that _everyone_ who gets improvement is bad at this. Assume any random proponent isn’t.
I think what GP is saying is that in most cases generating allot of code is not a good thing. Every line of LLM generated code has to be audited because they are prone to hallucinations and auditing someone else's code is much more difficult and time consuming than auditing your own code. Allot of code also requires more maintenance.
The comment is premised on the idea that Kasey either doesn't know what a "masterful developer" is or needs to be corrected back to it.
It's a commentary on one of the things I perceive as a flaw with LLMs, not you.
One of the most valuable qualities of humans is laziness.
We're constantly seeking efficiency gains, because who wants to carry buckets of water, or take laundry down to the river?
Skilled developers excel at this. They are "lazy" when they code - they plan for the future, they construct code in a way that will make their life better, and easier.
LLMs don't have this motivation. They will gleefully spit out 1000 lines of code when 10 will do.
It's a fundamental flaw.
Now, go back and contemplate what my feedback means if I am well versed on Larry Wall-isms.
Wait, now you're saying I set the 10x bar? No, I did not.
> Wait, now you're saying I set the 10x bar? No, I did not.
I distinctly did not say that. I said your article was one of the ones that made me feel anxious. And it's one of the ones that spurred me to write this article. I demonstrated how your language implies a massive productivity boost from AI. Does it not? Is this not the entire point of what you wrote? That engineers who aren't using AI are crazy (literally the title) because they are missing out on all this "rocket fuel" productivity? The difference between rocket fuel and standing still has to be a pretty big improvement.
The points I make here still apply, there is not some secret well of super-productivity sitting out in the open that luddites are just too grumpy to pick up and use. Those who feel they have gotten massive productivity boosts are being tricked by occasional, rare boosts in productivity.
You said you solved hallucinations, could you share some of how you did that?
I asked for an example of one of the articles you'd read that said that LLMs were turning ordinary developers into 10x developers. You cited my article. My article says nothing of the sort; I find the notion of "10x developers" repellant.
If you really need some, there are some links in another comment. Another one that was made me really wonder if I was missing the bus and makes 10x claims repeatedly is this YC podcast episode[1]. But again, I'm not trying to write a point by point counter of a specific article or video but a general narrative. If you want that for your article, Ludicity does a better job eviscerating your post than I ever could: https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-arti...
I'm trying to write a piece to comfort those that feel anxious about the wave of articles telling them they aren't good enough, that they are "standing still", as you say in your article. That they are crazy. Your article may not say the word 10x, but it makes something extremely clear: you believe some developers are sitting still and others are sipping rocket fuel. You believe AI skeptics are crazy. Thus, your article is extremely natural to cite when talking about the origin of this post.
You can keep being mad at me for not providing a detailed target list, I said several times that that's not what the point of this is. You can keep refusing to actually elaborate on how you use AI day to day and solve its problems. That's fine. I don't care. I care a lot more to talk about the people who are actually engaging with me (such as your friend) and helping me to understand what they are doing. Right now, if you're going to keep not actually contributing to the conversation, you're just kinda being a salty guy with an almost unfathomable 408,000 karma going through every HN thread every single day and making hot takes.
how much faster does an engine on rocket fuel go, than one not on rocket fuel?
The article in question[0] has the literal tag line:
> My AI Skeptic Friends Are All Nuts
how much saner is someone who isn't nuts to someone who is nuts? 10x saner? What do the specific numbers matter given you're not writing a paper?
You're enjoying the click bait benefits of using strong language and then acting offended when someone calls you out on it. Yes, maybe you didn't literally say "10x" but you said or quoted things in exactly that same ballpark and its worthy of a counter point like the OP has provided. They're both interesting articles with strong opinions that make the world a more interesting place so idk why you're trying to disown the strength with which you wrote your article.
I'm not complaining about "strong language", I'm saying: my post didn't say anything about "10x developers", and was just cited to me as the source of this post's claims about 10x'ing.
I'm not offended at all. I'm saying: no, I'm not a valid cite for that idea. If the author wants to come back and say "10x developer", a term they used twenty five times in this piece, was just a rhetorical flourish, something they conjured up themselves in their head, that's great! That would resolve this small dispute neatly. Unfortunately: you can't speak for them.
10x is a meme in our industry that relates to developer productivity and I think it well reflects the sort of productivity gain that someone would be "nuts" to be skeptical about. You might not have specifically said "10x" but I imagine many people left your article believing that agentic AI is the "next 10x" productivity boost.
They used it 25 times in their piece and in your piece stated that being interested in "the craft" is something people should do in their own time from now on. Strongly implying, if not outright stating; that the processes and practices we've refined for the past 70 years of software engineering need to move aside for the next hotness that has only been out for 6 months. Sure you never said "10x", but to me it read entirely like you're doing the "10x" dance. It was a good article and it definitely has inspired me to check it out.
No. There's all sorts of software engineering craft that usually has no place on the job site; for instance, there's a huge amount of craft in learning pure-functional languages like Haskell, but nobody freaks out when their teams decide people can't randomly write Haskell code instead of the Python and Rust everyone else is writing. You're extrapolating because you're trying to defend your point, but the point you're trying to make is that I meant to communicate something in my own article that I not only never said, but also find repellant.
Sure, I'm extrapolating what I read as strong language in your article as being a direct attack on making the code precise and flexible over good enough to ship (mediocre code, first-pass, etc). I imagine this might continue to be a battleground as adoption increases, especially at orgs with less engineering culture, in order to drive down costs and increase agentic throughput.
However there is a bit of irony in that you're happy to point out my defensiveness as a potential flaw when you're getting hung up on nailing down the "10x" claim with precision. As an enjoyer of both articles I think this one is a fair retort to yours, so I think it a little disappointing to get distracted by the specifics.
If only we could accurately measure 1x developer productivity, I imagine the truth might be a lot clearer.
Again, as you've acknowledged, there's a whole meme structure in the industry about what a "10x" programmer is. I did not claim that LLMs turn programmers into "10x programmers", because I do not believe in "10x" programmers to begin with. I'm not being defensive, I'm rebutting a (false) factual claim. It's very clearly false; you can just read the piece and see for yourself.
> I'm not being defensive, I'm rebutting a (false) factual claim.
You're rebutting a claim about your rant that -if it ever did exist- has been backed away from and disowned several times.
From [0]
> > Wait, now you're saying I set the 10x bar? No, I did not.
>
> I distinctly did not say that. I said your article was one of the ones that made me feel anxious. And it's one of the ones that spurred me to write this article.
and from [1]
> I'm trying to write a piece to comfort those that feel anxious about the wave of articles telling them they aren't good enough, that they are "standing still", as you say in your article. That they are crazy. Your article may not say the word 10x, but it makes something extremely clear: you believe some developers are sitting still and others are sipping rocket fuel. You believe AI skeptics are crazy. Thus, your article is extremely natural to cite when talking about the origin of this post.
A cursory scroll on X, LinkedIn, etc... will show you.
That seemed to me be to be the author's point.
His article resonated with me. After 30 years of development and dealing with hype cycles, offshoring, no-code "platforms", endless framework churn (this next version will make everything better!), coder tribes ("if you don't do typescript, you're incompetent and should be fired"), endless bickering, improper tech adopting following the FANGs (your startup with 0 users needs kubernetes?) and a gazillion other annoyances we're all familiar with, this AI stuff might be the thing that makes me retire.
To be clear: it's not AI that I have a problem with. I'm actually deeply interested in it and actively researching it from a math's up approach.
I'm also a big believer in it, I've implemented it in a few different projects that have had remarkable efficiency gains for my users, things like automatically extracting values from a PDF to create a structured record. It is a wonderful way to eliminate a whole class of drudgery based tasks.
No, the thing that has me on the verge of throwing in the towel is the wholesale rush towards devaluing human expertise.
I'm not just talking about developers, I'm talking about healthcare providers, artists, lawyers, etc...
Highly skilled professionals that have, in some cases, spent their entire lives developing mastery of their craft. They demand a compensation rate commensurate to that value, and in response society gleefully says "meh, I think you can be replaced with this gizmo for a fraction of the cost."
It's an insult. It would be one thing if it were true - my objection could safely be dismissed as the grumbling of a buggy whip manufacturer, however this is objectively, measurably wrong.
Most of the energy of the people pushing the AI hype goes towards obscuring this. When objective reality is presented to them in irrefutable ways, the response is inevitably: "but the next version will!"
It won't. Not with the current approach. The stochastic parrot will never learn to think.
That doesn't mean it's not useful. It demonstrably is, it's an incredibly valuable tool for entire classes of problems, but using it as a cheap replacement for skilled professionals is madness.
What will the world be left with when we drive those professionals out?
Do you want an AI deciding your healthcare? Do you want a codebase that you've invested your life savings into written by an AI that can't think?
How will we innovate? Who will be able to do fundamental research and create new things? Why would you bother going into the profession at all? So we're left with AIs training on increasingly polluted data, and relying on them to push us forward. It's a farce.
I've been seriously considering hanging up my spurs and munching popcorn through the inevitable chaos that will come if we don't course correct.
> That aside: I still think complaining about "hallucination" is a pretty big "tell".
And I think that sentence is a pretty big tell, so ...
The embittered tone and obvious bad faith of your comments here make it clear how seriously we should take your opinions about AI.
That bar is industry standard in the hype machine. Altman and others have set it:
https://www.windowscentral.com/software-apps/sam-altman-ai-w...
https://brianchristner.io/how-cursor-ai-can-make-developers-...
https://thenewstack.io/the-future-belongs-to-ai-augmented-10...
I'm getting a lot of side-quest productivity out of AI. There's always a bunch of things I could do, but they are tedious. Yet they are still things I wish I could get done. Those kinds of things AI is fantastic at. Building a mock, making tests, abstracting a few things into libraries, documentation.
So it's not like I'm delivering features in one day that would have taken two weeks. But I am delivering features in two weeks that have a bunch of extra niceties attached to them. Reality being what it is, we often release things before they are perfect. Now things are a bit closer to perfect when they are released.
I hope some of that extra work that's done reduces future bug-finding sessions.
> making tests
What I'm about to discuss is about me, not you. I have no idea what kind of systems you build, what your codebase looks like, use case, business requirements etc. etc. etc. So it is possible writing tests is a great application for LLMs for you.
In my day to day work... I wish that developers where I work would stop using LLMs to write tests.
The most typical problem with LLM-generated tests on the codebase where I work is that the test code is almost extremely tightly coupled to the implementation code. Heavy use of test spies is a common anti-pattern. The result is a test suite that is testing implementation details, rather than "user-facing" behaviour (user could be a code-level consumer of the thing you are testing).
The problem with that type of a test is that is a fragile test. One of the key benefits of automated tests is that they give you a safety net to refactor implementation to your heart's content without fear of having broken something. If you change an implementation detail, and the "user-facing" behaviour does not change, your tests should pass. When tests are tightly coupled to implementation, they will fail and now your tests, in the worst of cases, might actually be creating negative value for you ... since you every code change now requires you to keep tests up to date even when what you actually care about testing "is this thing working correctly?" hasn't changed.
The root of this problem isn't even the LLM, it's just that the LLM makes it a million times worse. Developers often feel like writing tests are a menial chore that needs to be done after the fact to satisfy code coverage policy. Few developers, at many organizations, have ever truly worked TDD or learned testing best practices, how to write easy to test implementation code etc.
There are some patterns you can use that help a bit with this problem. Lowest hanging fruit is to tell the LLM that its tests should test only through public interfaces where possible. Next after that is to add a "check if any non-public interfaces were used in places where a public interface exposes the same functionality the not-yet-committed tests - if so, rewrite tests to use only publicly exposed interfaces" step to the workflow. You could likely also add linter rules, though sometimes you genuinely need to test something like error conditions that can't reasonably be tested only through public interfaces.
Oh don't get me wrong. I'm sure that an LLM can write a decent test that doesn't have the problems I described. The problem is that LLMs are making a preexisting problem much, MUCH worse.
That problem statement is:
- Not all tests add value
- Some tests can even create dis-value (ex: slow to run, thus increasing CI bills for the business without actually testing anything important)
- Few developers understand what good automated testing looks like
- Developers are incentivized to write tests just to satisfy code coverage metrics
- Therefore writing tests is a chore and an afterthought
- So they reach for an LLM because it solves what they perceive as a problem
- The tests run and pass, and they are completely oblivious to the anti-patterns just introduced and the problems those will create over time
- The LLMs are generating hundreds, if not thousands, of these problems
So yeah, the problem is 100% the developers who don't understand how to evaluate the output of a tool that they are using.
But unlike functional code, these tests are - in many cases - arguably creating disvalue for the business. At least the functional code is a) more likely to be reviewed and code quality problems addressed and b) even if not, it's still providing features for the end user and thus adding some value.
Force the LLM to write property-based tests (depends on the language you use whether or not good libraries are available -- but if they are available 100% make use of them). Iterate with the LLM on the invariants.
Forcing the discussion of invariants, and property-based testing -- seems to improve on the issues you're mentioning (when using e.g. Opus 4), especially when combined with the "use the public API" or interface abstractions.
Side-quest productivity is a great way to put it... It does feel like AI effectively enables the opposite of "death by a thousand cuts" (life by a thousand bandaids?)
I like that "side quests" framing.
For much of what I build with AI, I'm not saving two weeks. I'm saving infinity weeks — if LLMs didn't exist I would have never built this tool in the first place.
The expectations are higher than reality, but LLMs are quite useful in many circumstances. You can characterize their use by "level of zoom", from "vibe coding" on the high end, to "write this function given its arguments and what it should return" at the low end. The more 'zoomed in' you are, the better it works, in my experience.
Plus there are use-cases for LLMs that go beyond augmenting your ability to produce code, especially for learning new technologies. The yield depends on the distribution of tasks you have in your role. For example, if you are in lots of meetings, or have lots of administrative overhead to push code, LLMs will help less. (Although I think applying LLMs to pull request workflow, commit cleanup and reordering, will come soon).
Say you want to create a web app, but you don't know any web dev. You spend a couple of months reading front-end and back-end dev, incrementally create something, and after half a year you've made a web app you like. Say you spent 4 hours a day, 5 days a week, for 6 weeks, going from zero to a functional web app. So you spent 120 hours in total.
Now let's say you use Claude code, or whatever, and you're able to create the same web app over a weekend. You spend 6 hours a day on Saturday and Sunday, in total 12 hours.
That's 10x increase in productivity right there. Did it make you a 10x better programmer? Nope, probably not. But your productivity went up by a tenfold.
And at least to me, that's sort of how it has worked. Things I didn't have motivation or energy to get into before, I can get into over a weekend.
However, in the first case you learned something which probably is useful when you want to change said app in any way or make another project...
Depends on how you learn.
For me it's 50-50 reading other people's code and getting a feel for the patterns and actually writing the code.
I'm not sure that math makes sense over the long run. Sure, at first you scaffold together an app from scratch, but I suspect over time, the LLM's capability of maintaining it precipitously drops. At some point, you will like reach a productivity level of zero, as now your application has become too complex to fit in a context window and you have no idea how it actually works. So what is the productivity multiplier then?
The issue is that it‘ll absolutely _suck_. If I tell Claude Code to scaffold a web app from 0 outside of React it‘s terrible.
So no, imho people with no app dev skills cannot just build something over a weekend, at least something that won‘t break when the first user logs in.
They will build it, deploy it, get hacked and leak user data.
But at the same time you basically outsourced your brain and any learning that would come from the exercise. So while you now have an app, you've experienced close to 0 learning or growth along the way.
you're going to push that straight to production? Cmon man, its not the same thing, not by a long shot. That's a crap measure. I don't think we can even reliably measure 1x developer output which makes multiplying it even more nonsensical.
This, I agree to this 100%. I was able to get at least 2 apps, 2 SAAS products out by pairing up with AI. I was able to learn this as I go and get an app running in the matter of hours than months. Great for prototype to production. -> learn-> fix -> ship -> learn more -> fix things -> ship more.
That being said, I am a generalist with 10+ years of experience and can spot the good parts from bad parts and can wear many hats. Sure, I do not know everything, but, hey did I know everything when AI was not there? I took help from SO, Reddit and other places. Now, I go to AI, see if it makes sense, apply the fix, learn and move on.
This is true, for small projects, one-offs and prototypes AI is great, it will save you loads of time.
However most paid jobs don't fall into this category.
I don't believe that literal typing of code is the limiting factor in development work. There is the research and planning and figuring out what it is even you need to develop in the first place. By the time you know what questions to even ask an LLM you are not saving much time in my opinion. On top of that you introduce the risk of LLM hallucination when you could have looked it up from a normal web search yourself in slightly more time.
Overall it feels negligible too me in its current state.
> There is the research and planning and figuring out what it is even you need to develop in the first place.
This is where I have found LLMs to be most useful. I have never been able to figure out how to get it to write code that isn't a complete unusable disaster zone. But if you throw your problem at it, it can offer great direction in plain English.
I have decades of research, planning, and figuring things out under my belt, though. That may give me an advantage in guiding it just the right way, whereas the junior might not be able to get anything practical from it, and thus that might explain their focus on code generation instead?
I think it depends a lot on the task. While you’re right that just typing is rarely a bottleneck, I would say that derivative implementations often are.
Things like: build a settings system with org, user, and project level settings, and the UI to edit them.
A task like that doesn’t require a lot of thinking and planning, and is well within most developers’ abilities, but it can still take significant time. Maybe you need to create like 10 new files across backend and frontend, choose a couple libraries to help with different aspects, style components for the UI and spend some time getting the UX smooth, make some changes to the webpack config, and so on. None of it is difficult, per se, but it all takes time, and you can run into little problems along the way.
A task like that is like 10-20% planning, and 80-90% going through the motions to implement a lot of unoriginal functionality. In my experience, these kinds of tasks are very common, and the speedup LLMs can bring to them, when prompted well, is pretty dramatic.
I'm consistently baffled at why software engineering is the only engineering to obsess over a mythical "10x" contributor. Mechanical, electrical, civil, and chemical engineers do not have this concept.
What makes an excellent engineer is risk mitigation and designing systems under a variety of possible constraints. This design is performed using models of the domains involved and understanding when and where these models hold and where they break down. There's no "10x". There is just being accountable for designing excellent systems to perform as desired.
If there were a "10x" software engineer, such an engineer would prevent data breaches from occurring, which is a common failure mode in software to the detriment of society. I want to see 10x less of that.
They might not have that concept but they absolutely have those people. I worked with many mechanical and electrical engineers building complex machines. Some people are just much better than others. This might not happen with more cookie-cutter work but any creative work in these domains absolute has 10x or even 100x engineers. That said collaboration really helps here, i.e. the good engineer can help others solve problems and be more productive then they'd be on their own. In software this seems to be harder for various reasons, one of which is that it's easier to demonstrate/see a good solution in the other domains but software tends to be a lot fuzzier.
>Mechanical, electrical, civil, and chemical engineers do not have this concept.
>What makes an excellent engineer is risk mitigation and designing systems under a variety of possible constraints.
I take it that those fields also don't live by the "move fast and break things" motto?
In a week, Claude Code and I have built a PoC Rails App for a significant business use case. I intend to formally demo it for buy-in tomorrow after already doing a short "is this kind of what you're looking for?" walkthrough last week. From here, I intend to "throw it over the fence" for my staff, RoR and full-stack devs, to pick it apart and/or improve what they want to in order to bring it from 80-100% over the next two months. If they want to rewrite it from scratch, that's on the table.
It's not a ground-breaking app, its CRUD and background jobs and CSV/XLSX exports and reporting, but I found that I was able to "wireframe" with real code and thus come up with unanswered questions, new requirements, etc. extremely early in the project.
Does that make me a 10x engineer? Idk. If I wasn't confident working with CC, I would have pushed back on the project in the first place unless management was willing to devote significant resources to this. I.e. "is this really a P1 project or just a nice to have?" If these tools didn't exist I would have written spec's and excalidraw or Sketch/Figma wireframes that would have taken me at least the same amount of time or more, but there'd be less functional code for my team to use as a resource.
If you think your CC wireframe has taken approx as much time as it'd have taken you with another tool like Figma + spec-writing, and one of your engineering team's options is "rewrite it from scratch" (without a spec), has the use of CC saved your company any time at all?
It reads like this project would have taken your company 9 weeks before, and now will take the company 9 weeks.
I think the comment was showing that the project takes 9 weeks either way, but coming to that determination was much more confident and convincing with a functional demo versus a hand-wavy figma + guesstimate.
> was much more confident and convincing with a functional demo versus a hand-wavy figma + guesstimate.
Except it also blurs the lines and sets incorrect expectations.
Management often see code being developed quickly (without full understanding of the fine line between PoC and production ready) and soon they expect it to be done with CC in 1/2 the time or less.
Figma on the other hand makes it very clear it is not code.
Which is why I like balsamiq. It looks like hand sketches but can be interactive. I can create any UI for brainstorming in a matter of minutes with it. Once the discussion is settled, we can move to figma for actual UI design (colors, spacing,…).
Yeah. The prototyping is neat. But in past lives I would literally sketch the "POC" on paper.
I sort of want to get back to that... it was really good at getting ideas across.
I use AI all the time. Usually I'm a curmudgeon but I decided to go all in on LLM AI stuff and have used ChatGPT and other models extensively to write code. Having thought about it a lot, I think the magic here is that AI combines three things:
1. googling stuff about how APIs work 2. writing boilerplate 3. typing syntax correctly
These three things combined make up a huge amount of programming. But when real cognition is required I find I'm still thinking just as hard in basically the same ways I've always thought about programming: identifying appropriate abstractions, minimizing dependencies between things, pulling pieces together towards a long term goal. As far as I can tell, AI still isn't really capable of helping much with this. It can even get in the way, because writing a lot of code before key abstractions are clearly understood can be counterproductive and AI tends to have a monolithic rather than decoupled understanding of how to program. But if you use it right it can make certain tasks less boring and maybe a little faster.
This article nails it. The claim 10x is in my opinion one of these tactics used by large corporations to force engineers into submission. The idea that you could be replaced with an AI is frightening enough to keep people in check, when negotiating your salary. AI is a wonderful tool that I use everyday, and I have been able to implement stuff that I would have considered too cumbersome to even start working on. But, it doesn't make you a 10x more efficient engineer. It gives you an edge when you start a new project, which is already a lot. But don't expect your whole project of 100,000 lines to be handled by the machine. It won't happen any time soon.
Funnily, you probably won't see in news the idea that 10x increase in productivity should lead to 10x increase in compensation (with the exception of CEOs and very top engineers, that get even bigger multiplier).
A few things need to happen very soon(if the signs are not here already):
1. Tech Company's should be able to accelerate and supplant the FAANGs of this world. Like even if 10x was discounted to 5x. It would mean that 10 human years of work would be shrunk down to 2 to make multi-billion dollar companies. This is not happening right now. If this does not start happening with the current series of model, murphy's law (e.g. interest rate spike at some point) or just damn show me the money brutal questions would tell people if it is "working".
2. I think Anthropic's honcho did a back of the envelope number of 600$ for every human in the US(I think just it was just the US) was necessary to justify Nvidia's market Cap. This should play out by the end of this year or in Q3 report.
Extremely anecdotal but all I keep seeing is relatively stable services (the google one comes to mind) having major outages. I assume its not AI related or directly ai related at least, but you'd think these outages would be less common if AI was adding so much value.
This was the best insight in the article: Do 10x engineers actually exist? "This debate isn't something I want to weigh in on but I might have to. My answer is sometimes, kinda. When I have had engineers who were 10x as valuable as others it was primarily due to their ability to prevent unnecessary work. Talking a PM down from a task that was never feasible. Getting another engineer to not build that unnecessary microservice. Making developer experience investments that save everyone just a bit of time on every task. Documenting your work so that every future engineer can jump in faster. These things can add up over time to one engineer saving 10x the time company wide than what they took to build it."
So true, a lot of value and gains are had when tech leads can effectively negotiate and creatively offer less costly solutions to all aspects of a feature.
The existence of 10x engineers is something no one believes until they meet one, and they are extremely rare so I can believe many people have never met one.
The co-founder of a company I worked at was one for a period (he is not a 10xer anymore - I don't think someone can maintain that output forever with life constraints). He literally wrote the bulk of a multi-million line system, most of the code is still running today without much change and powering a unicorn level business.
I literally wouldn't believe it, but I was there for it when it happened.
Ran into one more who I thought might be one, but he left the company too early to really tell.
I don't think AI is going to produce any 10x engineers because what made that co-founder so great was he had some kind of sixth sense for architecture, that for most of us mortals we need to take more time or learn by trial and error how to do. For him, he was just writing code and writing code and it came out right on the first try, so to speak. Truly something unique. AI can produce well specified code, but it can't do the specifying very well today, and it can't reason about large architectures and keep that reasoning in its context through the implementation of hundreds of features.
> He literally wrote the bulk of a multi-million line system, most of the code is still running today without much change and powering a unicorn level business
I've been a bit of that engineer (though not at the same scale), like say wrote 70% of a 50k+ loc greenfield service. But I'm not sure it really means I'm 10x. Sometime this comes from just being the person allowed to do it, that doesn't get questioned in it's design choices, decisions of how to structure and write the code, that doesn't get any push back on having massive PRs where others almost just paper stamp it.
And you can really only do this at the greenfield phase, when things are not yet in production, and there's so much baseline stuff that's needed in the code.
But it ends up being the 80/20 rule, I did the 80% of the work in 20% of the time it'll take to go to prod, because that 20% remaining will eat up 80% of the time.
> Talking a PM down from a task that was never feasible
One of our EMs did this this week. He did a lot of homework: spoke to quite a few experts and pretty soon realised this task was too hard for his team to ever accomplish, if it was even possible. Lobbied the PM and, a VP and a C-level, but managed to stop a lot of wasted work from being done.
Sometimes the most important language to know as a dev is English*
s/English/YourLanguageOfChoice/g
An aside, but I am curious: as an old hat today, I now find that using the Perl RE (though some of it lives on through sed) syntax as "we used to do back in the day" in regular communications confuses most people. People are usually unfamiliar with it, so I am slowly phasing it out.
What's your experience? And what do the "kids" use these days to indicate alternative options (as above — though for that, I use bash {} syntax too) or to signal "I changed my mind" or "let me fix that for you"?
I've never used Perl and I am not confused. It's just an eyeroll-inducing referential joke, and ironically a perfect example of OP's point. See also: $BIGCORP, Day_Job, etc
They could have just said "the most important language [...] is spoken language".
/s/ is kind of a skeuomorph for me. I have never used sed but I understand this syntax.
The Fred Brooks insight about 10x was that the best programmers were 10x as productive as the worst programmers, not the average programmer.
I guess this leaves open question about the distribution of productivity across programmers and the difference between the min and the mean. Is productivity normally distributed? Log normal? Some kind of power law?
The worst programmers i have worked with have negative productivity in that they leave a mess of work for everyone else, so 10x programmers must be the reason any of us are employed!
The way that I have seen 10x engineers in my career is as kind like this:
Junior: 100 total lines of code a day
Senior: 10,000 total lines of code a day
Guru: -100 total lines of code a day
No one is writing 10,000 lines of code every day nor should they be.
I’m not sure if 10x engineers exist, but I do know 0.1x engineers exist. Being on a team with them makes a typical engineer seem like they’re driving 10x the expected impact.
Totally agree, the best work isn't very visible.
AI is making me 100x productive in some tasks, and 2-4x in others, and 0x in some. Knowing which tasks AI is great at and delegating is like 95% of the battle.
>and 0x in some
As in, it's now completely preventing you from doing things you could have before?
Not preventing, but def wasting time trying to do something that I think it's able to do, but isn't.
You're doing it wrong. AI is making me 2375600x productive in all tasks.
Including writing comments on HN.
I beg to differ. My diffs are 10x bigger than before though i don't have any more time to review them.
This is the point that the author is making. 10x bigger diff is probably not leading to 10x productivity.
[In fact you can sometimes find that 10x bigger diff leads to decreased productivity down the line...]
It took me a month , despite having done prompt engineering for work the 2 years prior, to hit the real starting line of Claude code productivity
Basically, the ability to order my thoughts into a task list long & clear enough for the LLM to follow that I can be working on 3 or so of these in parallel, and maybe email. Any individual run may be faster or slower than I can do it manually, but critically, they take less total human time / attention. No individual technique is fundamentally tricky here, but it is still a real skill.
If you read the article, the author is simply not there, and sees what they know as only 1 weeks worth of knowledge. So for their learning rate .. maybe they need 3x longer of learning & experience?
The other day I asked chatgpt (o3) to help me compare a bunch of task orchestration systems and arrange them according to some variables I care about (popularity, feature richness, durability, whether can be self-hosted, etc.). I ended up using https://www.inngest.com/ -- which was new to me -- and that single tool sped up my particular task by at least 10x for the week. That was a one-off project, so it won't generalize in a clean way, but I keep finding individual cases where the particular strengths of LLMs can save me a whole bunch of time. (another example: creating and evaluating responses to technical interview questions). I don't expect that these are easy to quantify, but they are significant.
This is not to disagree with the OP, but to point out that, even for engineers, the speedups might not appear where you expect. [EDIT I see like 4 other comments making the same point :)]
Hi there, I was wondering if you'd be willing to share that ChatGPT chat with me (or everyone). I'm the CEO of a competing product (DBOS) and I'm just curious what your question and responses were that let you elsewhere. Thanks!
Such an insightful article. The tools are allowing us to 10x-100x productivity in shorter bursts, which makes total sense. There's a lot more to software engineering beyond those bits, and that's why the 10x engineer imposter syndrome.
I am a dinosaur but still feel strongly enough to post this PSA: please go back and read "No Silver Bullet" (and his follow up) again. You should probably schedule a re-read every 2-5 years, just to keep your sanity in these crazy, exhausting times.
I believe his original thesis remains true: "There is no single development, in either technology or management technique, which by itself promises even one order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity."
Over the years this has been misrepresented or misinterpreted to suggest it's false but it sure feels like "Agentic Coding" is a single development promising a massive multiplier in improvement that once again is, another accidental tool that can be helpful but is definitely not a silver bullet.
I used to agree with this except for one exception, sitting and working right beside your end user(s). If you can colocate with them it is a silver bullet.
I'm not sure about agentic coding. Need another month at it.
Agreed. Here's one of many HN posts on "No Silver Bullet": https://news.ycombinator.com/item?id=32423356
> What LLMs produce is often broken, hallucinated, or below codebase standards.
With enough rules and good prompting this is not true. The code I generate is usually better than what I'd do by hand.
The reason the code is better all the extra polish and gold plating is essentially free.
Everything I generate comes out commented great error handling, logging, SOLID, and united tested using established patterns in the code base.
> The code I generate is usually better than what I'd do by hand.
I'm always baffled by this. If you can't do it that well by hand, how can you discriminate its quality so confidently?
I get there is a artist/art consumer analogy to be made (i.e. you can see a piece is good without knowing how to paint), but I'm not convinced it is transferrable to code.
Also, not really my experience when dealing with IaC or (complex) data related code.
You have misinterpreted GP's "better than what I would do" as "better than what I could do".
That would be a more plaussible explanation. Not sure if that disambiguation can be inferred from the comment though.
It’s clear from the original comment that’s what they mean. (Literally that’s what the comment says)
I can do it by hand but it takes time.
With AI the extra quality and polish is basically free and instantaneous.
Ah, alright, that makes a lot more sense, like another poster said I read "'d" as "could".
Point still remains for junior and semi-senior devs though, or any dev trying to leap over a knowledge barrier with LLMs. Emphasis on good pipelines and human (eventually maybe also LLM based) peer-reviews will be very important in the years to come.
You underestimate how lazy people are. I always take shortcuts and skip taking edge cases into account. LLMs have no problem writing tedious guards and creating abstractions without hacks, which means the code becomes more robust than if I would do it by hand.
What an odd question. For the exact same reason people who write prose professionally usually have someone else edit their work: because editing your own work is harder, and everybody slips up sometimes.
I didn't find it odd at all and it seems more odd to liken an LLM to a human editor.
I'm not getting this analogy. Editors can't normally discriminate if the content itself is good (after all, the writer is the SME), but rather, only perfect its form (syntax, grammar, etc).
Well-written bullshit in perfect prose is still bullshit.
ehhhhhhh yeah but this is like hiring Reddit to do your prose editing, considering generated code is slightly worse than what you'd find on r/programming
You can believe that or not believe that without changing the implication of the previous question, which was that someone who routinely slips while writing code would be incapable of determining whether the LLM got it right. Obviously not.
You're forgetting that code quality also requires time. Developers make tradeoffs all the time on how much time to invest into improving the quality of what they write, for both new and existing code. When someone claims that LLMs can produce higher-quality code it can include quality levels that may be unjustifiably slow to hand-craft depending on constraints and needs.
Related - agentic LLMs may be slow to produce output but they are parallelizable by an individual unlike hand-written work.
I get that. I'm exclusively talking about code quality verification after it being coded by a human or an LLM, in fact I don't really care by whom. Mainly because I do care about introducing tech debt and/or hidden balloning costs.
I am pattern matching your last statement with what I've seen with my teammates who are more AI-oriented: I suspect this is a matter of making the metrics the goal. I would rather maintain something that is simple, works, and have targeted comments than something messy that meets the metrics you list.
The code I generate with LLMs is clean and looks as good if not better than what I'd write by hand.
I don't get all the prompt vibe coding going around. I don't use prompts to generate code.
I use "tab-tab" auto complete to speed through refactorings and adding new fields / plumbing.
It's easily a 3x productivity gain. On a good day it might be 10x.
It gets me through boring tedium. It gets strings and method names right for languages that aren't statically typed. For languages that are statically typed, it's still better than the best IDE AST understanding.
It won't replace the design and engineering work I do to scope out active-active systems of record, but it'll help me when time comes to build.
I use tab auto complete, and i think it's a 5% productivity gain. On a good day, maybe 10%. I haven't put much effort into optimizing the setup or learning advanced usage patterns or anything. I'm using stock copilot, provided by my employer. If I had to pay for it, I wouldn't be using it, as it doesn't justify the cost.
Really, what are you making that a 5% increase in productivity doesn’t justify a Copilot subscription?
That's not a rigorously measured number.
The 5% is an increase in straight-ahead code speed. I spend a small fraction of my time typing code. Smaller than I'd like.
And it very well might be an economically rational subscription. For me personally, I'm subscription averse based on the overhead of remembering that I have a subscription and managing it.
> For languages that are statically typed, it's still better than the best IDE AST understanding.
This is emphatically NOT my experience with a large C++ codebase.
I can't attest to C++, but we've got a large Rust monorepo, and it's magical.
It expands match blocks against highly complex enums from different crates, then tab completes test cases after I write the first one. Sometimes even before that.
We may be at different levels of "large" (and "gnarly") - this code-base has existed in some form since 1985, through various automated translations Pascal -> C -> C++.
Just by virtue of Rust being relatively short-lived I would guess that your code base is modular enough to live inside reasonable context limits, and written following mostly standard practice.
One of the main files I work on is ~40k lines of code, and one of the main proprietary API headers I consume is ~40k lines of code.
My attempts at getting the models available to Copilot to author functions for me have often failed spectacularly - as in I can't even get it to generate edits at prescribed places in the source code, follow examples from prescribed places. And the hallucination issue is EXTREME when trying to use the big C API I alluded to.
That said Claude Code (which I don't have access to at work) has been pretty impressive (although not what I would call "magical") on personal C++ projects. I don't have Opus, though.
Prompts are worth mastering. AI autocomplete is better than older autocomplete systems but of course it only works based on what you started to type.
Prompts are especially good for building a new template of structure for a new code module or basic boilerplate for some of the more verbose environments. eg. Android Java programming can be a mess, huge amounts of code for something simple like an efficient scrolling view. AI takes care of this - it's obvious code, no thought, but it's still over 100 lines scattered in XML (the view definitions), resources, and in multiple Java files.
Do you really want to be copying boilerplate like this across to many different files? Prompts that are well integrated to the IDE (they give a diff to add the code) are great (also old style Android before Jetpack sucked) https://stackoverflow.com/questions/40584424/simple-android-...
Do you have a link to some of the code that you have produced using this approach? I am yet to see a public or private repo with non-trivial generated code that is not fundamentally flawed.
This one was a huge success:
https://github.com/micahscopes/radix_immutable
I took an existing MIT licensed prefix tree crate and had Claude+Gemini rewrite it to support immutable quickly comparable views. The execution took about one day's work, following two or three weeks thinking about the problem part time. I scoured the prefix tree libraries available in rust, as well as the various existing immutable collections libraries and found that nothing like this existed. I wanted O(1) comparable views into a prefix tree. This implementation has decently comprehensive tests and benchmarks.
No code for the next two but definitely results...
Tabu search guided graph layout:
https://bsky.app/profile/micahscopes.bsky.social/post/3luh4d...
https://bsky.app/profile/micahscopes.bsky.social/post/3luh4s...
Fast Gaussian blue noise with wgpu:
https://bsky.app/profile/micahscopes.bsky.social/post/3ls3bz...
In both these examples, I leaned on Claude to set up the boilerplate, the GUI, etc, which gave me more mental budget for playing with the challenging aspects of the problem. For example, the tabu graph layout is inspired by several papers, but I was able to iterate really quickly with claude on new ideas from my own creative imagination with the problem. A few of them actually turned out really well.
https://github.com/wglb/gemini-chat Almost entirely generated by gemini based on my english language description. Several rounds with me adding requirements.
(edit)
I asked it to generate a changelog: https://github.com/wglb/gemini-chat/blob/main/CHANGELOG.md
Not the OP, not my code. But here is Mitchel Hashimoto showing his workflow and code in Zig, created with AI agent assistance: https://youtu.be/XyQ4ZTS5dGw
I think this still is some kind of 'fight' between assisted and more towards 'vibe'. Vibe for me means not reading the generated code, just trying it and the other extreme is writing all without AI. I don't think people here are talking about assisted : they are taking about vibe or almost vibe coding. And its fairly terrible if the llm does not have tons of info. It can loop, hang, remove tons of features, break random things etc all while being cheerful and saying 'this is production code now, ready to deploy'. And people believe it. When you use it to assist, it is great imho.
That's disingenuous or naive. Almost nobody decides to expressly highlight the section of code (or whole files generated by ai) they just get on with the job when there's real deadlines and it's not about coding for the sake of the art form...
If the generated implementation is not good, you're trading short-term "getting on with the job" and "real deadlines" for mid-to-long-term slowdown and missed deadlines.
In other words, it matters whether the AI is creating technical debt.
If you're creating technical debt, you're creating technical debt.
That has nothing to do with AI/LLMs.
If you can't understand what the tool spits out either; learn, throw it away, or get it to make something you can understand.
Do you want to clarify your original comment, then? I just read it again, and it really sounds like you're saying that asking to review AI-generated code is "disingenuous or naive".
I am talking about correctness, not style, coding isn't just about being able to show activity (code produced), but rather producing a system that is correctly performing the intended task
Yes, and frankly you should be spending time writing large integration tests correctly not microscopic tests that forgot how tools interact.
It's not about lines of code or quality it's about solving a problem. If the problem creates another problem then it's bad code. If it solves the problem without causing that then great. Move onto the next problem.
Same as pretending that vibe coding isn't producing tons of slop. "Just improve your prompt bro" doesn't work for most real codebases. The recent TEA app leak is a good example of vibe coding gone wrong, I wish I had as much copium as vibe coders to be blind to these things, as most of them clearly are like "it happened to them but surely won't happen to ME."
> The recent TEA app leak is a good example of vibe coding gone wrong
Weren't there 2 or 3 dating apps that were launched before the "vibecoding" craze that went extremely popular and got extremely hacked weeks/months in? I also distinctly remember a social network having firebase global tokens on the clientside, also a few years ago.
So that's an excuse for AI getting it wrong? It should know better if its so much better.
Not an excuse, no. I agree it should be better. And it will get better. Just pointing out that some mistakes were systematically happening before vibecoding became a thing.
We went from "this thing is a stochastic parrot that gives you poems and famous people styled text, but not much else" to "here's a fullstack app, it may have some security issues but otherwise it mainly works" in 2.5 years. People expect perfection, and move the goalposts. Give it a second. Learn what it can do today, adapt, prepare for what it can do tomorrow.
No one is moving the goalposts. There are a ton of people and companies trying to replace large swathes of workers with AI. So it's very reasonable to point out ways in which the AI's output does not measure up to that of those workers.
I thought the idea was that AI would make us collectively better off, not flood the zone with technical debt as if thousands of newly minted CS/bootcamp graduates were unleashed without any supervision.
LLMs are still stochastic parrots, though highly impressive and occasionally useful ones. LLMs are not going to solve problems like "what is the correct security model for this application given this use case".
AI might get there at some point, but it won't be solely based on LLMs.
> "what is the correct security model for this application given this use case".
Frankly I've seen LLMs answer better than people trained in security theatre so be very careful where you draw the line.
If you're trying to say they struggle with what they've not seen before. Yes, provided that what is new isn't within the phase space they've been trained over. Remember there's no photographs of cats riding dinosaurs but SD models can generate them.
Saying that they aren't worse than an incompetent human isn't a ringing endorsement.
LLMs are not meant to be infallible it's meant to be faster.
Repeat after me, token prediction is not intelligence.
I've heard this multiple times (Tea being an example of problems with vibe coding) but my understanding was that the Tea app issues well predated vibe coding.
I have experimented with vibe coding. With Claude Code I could produce a useful and usable small React/TS application, but it was hard to maintain and extend beyond a fairly low level of complexity. I totally agree that vibe coding (at the moment) is producing a lot of slop code, I just don't think Tea is an example of it from what I understand.
Easily 99% of comments generated by LLMs are useless.
That's how I detect who is using LLMs at work.
# loop over the images
for filename in images_filenames:
# download the image
image = download_image(filename)
# resize the image
resize_image(image)
# upload the image
upload_image(image)
I've noticed this too. They are often restatements of the line in verbal form, or intended for me, the LLM-reader about the prompt, vice a code maintainer.
Not what I have found with gemini.
What is particularly useful is the comments about reasoning about new code added at my request.
Very often comments generated by humans are also useless. The reason for this are mandated comment policies, e.g., 'every public method should have a comment'. An utterly disgusting practice. One should only have a comment if one has something interesting to say. In a not-overly-complex code base there should maybe be a comment perhaps every 100 lines or so. In many cases it makes more sense to comment the unit tests than the code.
I think the rules for comments on public method is to use something like doxygen to extract the reference. And most IDE can display them upon hovering. And comments can remind the caller of pre- and post-conditions.
I am pretty far to one end of the spectrum on need for comments. Very rarely is a comment useful to help you/another developer decipher the intent and function of a piece of code.
Then tell it to write better comments...
Ah, so it's good enough to write code on its own without time-consuming, excessive hand-holding. But it's not good enough to write comments on its own.
If you put in the work to write rules and give good prompts you get good results just like every other tool created by mankind.
How often do you use coding LLMs?
I can't speak to comments rules specifically but I am a heavy user of "agentic" coding and use rules files and while they help they are simply not that reliable. For something like comments that's probably not that big of a deal because some extra bad comments isn't the end of the world.
But I have rules that are quite important for successfully completing a task by my standards and it's very frustrating when the LLM randomly ignores them. In a previous comment I explained my experiences in more detail but depending on the circumstances instruction compliance is 9/10 times at best, with some instructions/tasks as poor as 6/10 in the most "demanding" scenarios particularly as the context window fills up during a longer agentic run.
They're often repetitive if you're reading the code, but they're useful context that feeds back into the LLM. Often once the code is clear enough I'll delete them before pushing to production.
do you have proof of this being useful for llm? wouldn't you rather it re-read the actual code it generated instead of assuming that the potentially wishful thinking or stale comment is going to lead it astray?
it reads both, so with the comments it more or less parrots the desired outcome I explained... and it sometimes catches the mismatch between code and comment itself before I even mention it
I read and understand 100% of the code it outputs, so I'm not so worried about falling too far astray...
being too prescriptive about it (like prompting "don't write comments") makes the output worse in my experience
99% of comments are not needed as they just re-express what the code below does.
I prefer to push for self documenting code anyway, never saw the need for docs other than for an API when I'm calling something like a black box.
I think its because LLMs are often trained with data from code tutorial sites and forums like stackoverflow, and not always production code
They comment on the how, not the why.
Could you share an example ?
These conversations on AI code good, vs AI code bad constantly keep cropping up.
I feel we need to build a cultural norm to share examples places of succeeded, and failures, so that we can get to some sort of comparison and categorization.
The sharing also has to be made non-contentious, so that we get a multitude of examples. Otherwise we’d get nerd-sniped into arguing the specifics of a single case.
Let’s talk about rules and docs, shall we? What makes a good rule for AI to keep it on task? What are your setups for docs and attaching them to the context (do you need to? Or just the location?)
Let’s boil this down to an easy set of reproducible steps any engineer can take to wrangle some sense from their AI trip.
The company I work at (https://getunblocked.com) is built to give tools like Claude Code and Cursor context based on all your docs, issues, code, and chat threads from Slack and soon Teams. Happy to give you a demo sometime if you're interested!
There are tons of these guides around the internet. I'm only using what other people have already published.
Aka, let's train people how to use the tool...
You seem to be against the idea. Yet you yourself were trained. Weird.
Let's check that the claim matches the evidence first!
In my experience, unit tests and logging code generated by LLMs tend to be overly verbose, miss meaningful assertions, and often produce boilerplate that looks correct but doesn’t test or log anything useful. It’s easy to get misled by the surface structure.
I do think a lot of the discourse in this space can be summed up as: people are arguing about two non-overlapping segments of a distribution having no idea the other segment even exists; instead they just assume the other side is [hype/pessimistic].
It makes me wince a little
Can you point to an example repo with enough rules and good prompts?
Yeah that fucker Claude is tireless when it comes to checking return types, checking for null, etc etc
I agree. With plenty of prompts (leave them in documents) you can get pretty good results.
First thing I do is tell llm to stop writing useless docstrings and comments and instead follow clean code principles where each variable is a noun and function call a verb.
Here's my workflow (if I feel like using claude)
Me: Here's the relevant part of the code, add this simple feature.
Opus: here's the modified code blah blah bs bs
Me: Will this work?
Opus: There's a fundamental flaw in blah bleh bs bs here's the fix, but I only generate part of the code, go hunt for the lines to make the changes yourself.
Me: did you change anything from the original logic?
Opus: I added this part, do you want me to leave it as it was?
Me: closes chat
Sorry to be that guy, but you're using it wrong. The best flows right now are architect -> act -> test. First you have a session in "architect" / "plan" mode (depending on your ide/tool) where you discuss, ask questions, etc. Then, when everything is clear in "chat" mode, you ask the model to make a plan. You verify the plan, and then you tell it to start implementing it. You still get to approve tools, calls, tests, etc. You can also provide feedback on the way if you missed something (i.e. use uv instead of pip, etc).
Coding in a chat interface, and expecting the same results as with dedicated tools is ... 1-1.5 years old at this point. It might work, but your results will be subpar.
Nah it's good thanks for your input. I saw people use plan.md and todo.md and ide/commandline for this before. manus.ai demonstrates this via its chat interface as well.
> With enough rules and good prompting this is not true.
There are atleast 10 posts on HN these days with the same discussion in circle.
1. AI sucks at code
2. you are not using my magic prompting technique
It's not magic. The techniques are well established and widely shared.
Yeah there's so many now it's hard to settle on one. YouTube is littered with them. Agent OS, amp.code, BMAD. I'm probably trying BMAD in earnest next ...
Each of the "tools" does things slightly differently but the techniques to use them effectively are largely the same now (rules, planning, context management, good prompting).
You know like when the loom came out there were probably quite a few models but using it was similar. Like cars are now.
I've been finding actual human-written bugs and correcting them with Claude, so I find the "often broken" claims a load of nonsense... I've been fixing dozens of minor bugs in our codebase that no one's been arsed to fix for years due to bigger priorities (which tbh is generating more features and tech debt).
It may change in the future, but AI is without a doubt improving our codebase right now. Maybe not 10X but it can easily 2X as long as you actually understand your codebase enough to explain it in writing.
What a scary time it is for devs. We spent all this time learning this obscure skill and now when I play with claude or even chatgpt it makes really good code. I just asked it to write me a video game and it did it. Perfect godot code. I was stunned it didn't hallucinate and when I asked for clarification on a snippet of code, it perfectly answered.
I think its only a matter of time until our roles are commoditized and vibe-coding becomes the norm in most industries.
Vibe coding being a dismissive term on developing a new skillset. For example we'll be doing more planning and testing and such instead of writing code. The same way, say, sysadmins just spin up k8s instead of racking servers or car mechanics read diagnosis codes from readers and, often, just replace an electric part instead of hand-tuning carbs or gapping spark plugs and such. That is to say, a level of skill is being abstracted away.
I think we just have to see this, most likely, as how things will get done going forward.
Could you at least mention what the video game was, or why it was such a good implementation? Also, what was "perfect" about the code? "Perfect" is not a word I would ever use to describe code.
This reads like empty hype to me, and there's more than one claim like this in these threads, where AI magically creates an app, but any description of the app itself is always conspicuously missing.
Especially in a world where creating a repo in GitHub (or other forges) is frictionless.
Let me guess... Flappy Bird or Pong. Whoa, it one-shotted it, amazing!
Yes Im exaggerating and its not writing a AAA game from a prompt but I asked it to make a game like Zelda and it figured it out and walked me through all the aspects of it. That's a lot more than I expected. I'm not a games programmer, so I'm probably a lot more impresse than I should be, but I went from not knowing anything about godot to having a framework up to build a 2d rpg-esque game fairly quickly and me learning as it gave me the code. Note, I used the new chatgpt study mode, so that's may be different than just regular prompts. I fully expected just broken code and random AI musings, but instead I got a very solid implementation of a game, albeit a simple one. Or at least as simple as I asked for, I imagine I can keep building out more with its help.
I also have never used godot before, and I was surprised at how well it navigated and taught me the interface as well.
At least the horror stories about "all the code is broken and hallucinations" isn't really true for me and my uses so far. If LLM's will succeed anywhere it will be in the overly logical and predictable worlds of programming languages, but that's just a guess on my part, but thus far whenever I reach out for code from LLM's, its been a fairly positive experience.
Thanks for elaborating, this puts things into perspective, although the complexity of the end product is still unclear to me.
I do still disagree with your assessment, I think the syntactic tokens in programming languages have a kind of impedance mismatch with the tokens that LLMs, and that the formal semantics of programming languages are a bad fit with the fuzzy statistical LLMs. I firmly believe that increased LLM usage will drive software safety and quality down, simply because a) no semblance of semantic reasoning or formal verification has been applied to the code and b) a software developer will have an incomplete understanding of code not written by themself.
But our opinions can co-exist, good luck in your game development journey!
I'm playing with it still and now am adding more scenes and more logic. I think the complexity here is whatever my goals are. I'm not sure what the practical limits here are, or at least they exceed my own ability in games development right now. This is just a toy game, but as I reach into claude and gpt, I can keep going, which is nice. I already have coding experience so I'm not exactly a 'vibe coder' but I think professionally, I dont think people with zero coding experience are getting dev roles, but instead the role will change like my example of the modern mechanic or modern sysadmin above.
As far as QA goes, we then circle back to the tool itself being the cure for the problems the tool brings in, which is typical in technology. The same way agile/'break things' programming's solution to QA was to fire the 'hands on' QA department and then programmatically do QA. Mostly for cost savings, but partly because manual QA couldn't keep up.
I think like all artifacts in capitalism, this is 'good enough,' and as such the market will accept it. The same way my laggy buggy Windows computer would be laughable to some in the past. I know if you gave me this Win11 computer when I was big into low-footprint GUI linux desktop, I would have been very unimpressed, but now I'm used to it. Funny enough, I'm migrating back to kubuntu because Windows has become unfun and bloaty and every windows update feels a bit like gambling. But that's me. I'm not the typical market.
I think your concerns are real and correct factually and ideologically, but in terms of a capitalist market will not really matter in the end, and AI code is probably here to stay because it serves the capital owning class (lower labor costs/faster product = more profit for them). How the working class fares or if the consumer product isn't as good as it was will not matter either unless there's a huge pushback, which thus far hasn't happened (coders arent unionizing, consumers seem to accept bloaty buggy software as the norm). If anything the right-wing drift of STEM workers and the 'break things' ideology of development has primed the market for lower-quality AI products and AI-based workforces.
So you're not a developer any more but a tenant who pays rent. Speaking of which, I have a bridge to sell you…
I'm not even sure what this is supposed to say.
I've had days where it really does feel like 5x or 10x...
Here's what the 5x to 10x flow looks like:
1. Plan out the tasks (maybe with the help of AI)
2. Open a Git worktree, launch Claude Code in the worktree, give it the task, let it work. It gets instructions to push to a Github pull request when it's done. Claude gets to work. It has access to a whole bunch of local tools, test suites, and lots of documentation.
3. While that terminal is running, I go start more tasks. Ideally there are 3 to 5 tasks running at a time.
4. Periodically check on the tabs to make sure they're not stuck or lost their minds.
5. Finally, review the finished pull requests and merge them when they are ready. If they have issues then go back to the related chat and tell it to work on it some more.
With that flow it's reasonable to merge 10 to 20 pull requests every day. I'm sure someone will respond "oh just because there are a lot of pull requests, doesn't mean you are productive!" I don't know how to prove to you that the PRs are productive other than just say that they are each basically equivalent to what one human does in one small PR.
A few notes about the flow:
- For the AI to work independently, it really needs tasks that are easy to medium difficulty. There are definitely 'hard' tasks that need a lot of human attention in order to get done successfully.
- This does take a lot of initial investment in tooling and documentation. Basically every "best practice" or code pattern that you want to use use in the project must be written down. And the tests must be as extensive as possible.
Anyway the linked article talks about the time it takes to review pull requests. I don't think it needs to take that long, because you can automate a lot..
- Code style issues are fully automated by the linter.
- Other checks like unit test coverage can be checked in the PR as well.
- When you have a ton of automated tests that are checked in the PR, that also reduces how much you need to worry about as a code reviewer.
With all those checks in place, I think it can pretty fast to review a PR. As the human you just need to scan for really bad code patterns, and maybe zoom in on highly critical areas, but most of the code can be eyeballed pretty quickly.
What type of software are you building with this workflow? Does it handle PII, need data to be exact, or have any security implications?
Because I might just not have a great imagination, but it's very hard for me to see how you basically automate the review process on anything that is business critical or has legal risks.
Mainly working on a dev tool / SaaS app right now. The PII is user names & email.
On the security layer, I wrote that code mostly by hand, with some 'pair programming' with Claude to get the Oauth handling working.
When I have the agent working on tasks independently, it's usually working on feature-specific business logic in the API and frontend. For that work it has a lot of standard helper functions to read/write data for the current authenticated user. With that scaffolding it's harder (not impossible) for the bot to mess up.
It's definitely a concern though, I've been brainstorming some creative ways to add extra tests and more auditing to look out for security issues. Overall I think the key for extremely fast development is to have an extremely good testing strategy.
I appreciate the helpful reply, honestly. One other question - are people currently using the app?
I think where I've become very hesitant is a lot of the programs that I touch has customer data belonging to clients with pretty hard-nosed legal teams. So it's quite difficult for me to imagine not reviewing the production code by hand.
No this app isn't launched yet. And yeah, customer data is definitely a very valid thing to be concerned about.
Interesting that title of this post was changed. I think I have seen this happening 2nd time now. It seems Hacker News does not favor AI negative narratives.
Has happened to me before. It seems they change anything that has a negative connotation to try to take something more positive out of it. I don't love that they do that without asking or confirming with the author. But this title is also fine with me. I actually thought about naming it "Curing your AI 10x Imposter Syndrome", but it felt like a stretch that someone would understand what the content would be about.
I think AI is going to make senior engineers at big tech companies 10x more productive.
A lot of senior engineers in the big tech companies spend most of their time in meetings. They're still brilliant. For instance, they read papers and map out the core ideas, but they haven't been in the weeds for a long time. They don't necessarily know all the day-to-day stuff anymore.
Things like: which config service is standard now? What's the right Terraform template to use? How do I write that gnarly PromQL query? How do I spin up a new service that talks to 20 different systems? Or in general, how do I map my idea to deployable and testable code in the company's environment?
They used to have to grab a junior engineer to handle all that boilerplate and operational work. Now, they can just use an AI to bridge that gap and build it themselves.
The core value of LLMs is simple: sometimes you need to write code, but what you really want is to design, experiment, or just get something usable.
Even when you do write code, you often only care about specific aspects—you just want to automate the rest.
This is hard to reconcile with modern business models. If you tell someone that a software engineer can also design, they’ll just fire the designer and pile more work on the engineer. But it doesn’t change the underlying truth: a single engineer who can touch many parts of the software with low cognitive friction is simply a better kind of engineer.
I find myself largely agreeing with this post.
In some cases, LLMs can be a real speed boost. Most of the time, that has to do with writing boilerplate and prototyping a new "thing" I want to try out.
Inevitably, if I like the prototype, I end up re-writing large swaths of it to make it even half way productizable. Fundamentally, LLMs are bad at keeping an end goal in mind while working on a specific feature and it's terrible at holding enough context to avoid code duplication and spaghetti.
I'd like to see them get better and better, but they really are limited to whatever code they can ingest on the internet. A LOT of important code is just not open for consumption in sufficient quantities for it to learn. For this reason, I suspect LLMs will really never be all that good for non-web based engineering. Wheres all the training data gonna come from?
Claude max is $200 a month.
Consider a fully loaded cost of 200k for an engineer or $16,666 per month. They only have to be >1.012x engineer for the "AI" to be worth it. Of course that $200 dollars per month is probably VC subsidized right now but there is lots of money on the table for <2x improvement.
This is precisely what I suggest, companies should pay for team plans like the one you are describing and see what comes of it.
My MacBook depreciates at higher than $200/month.
One thing I've been wondering recently: has the experience of using software (specifically web apps) been getting better? It seems like a natural extension of significantly increased productivity would lead to fewer buggy websites and apps, more intuitive UIs, etc.
Linear was a very early-stage product I tested a few months after their launch where I was genuinely blown away by the polish and experience relative to their team size. That was in 2020, pre-LLMs.
I have yet to see an equally polished and impressive early-stage product in the past few years, despite claims of 10x productivity.
There was a recent study concluding that AI made experienced developers 20% SLOWER to complete tasks rather than any faster !
https://arxiv.org/abs/2507.09089
Obviously it depends on what you are using the AI to do, and how good a job you do of creating/providing all the context to give it the best chance of being successful in what you are asking.
Maybe a bit like someone using a leaf blower to blow a couple of leaves back and forth across the driveway for 30 sec rather than just bending down to pick them up.... It seems people find LLMs interesting, and want to report success in using them, so they'll spend a ton of time trying over and over to tweak the context and fix up what the AI generated, then report how great it was, even though it'd have been quicker to do it themselves.
I think agentic AI may also lead to this illusion of, or reported, AI productivity ... you task an agent to do something and it goes off and 30 min later creates what you could have done in 20 min while you are chilling and talking to your workmates about how amazing this new AI is ...
Depending how they used them. You can say similar thing about having junior developers in the team that you have to delegate tasks to. It takes time to explain to them what needs to be done, nudge into right solution, check etc.
But maybe another thing is not considered - while things may take longer, they ease cognitive load. If you have to write a lot of boilerplate or you have a task to do, but there are too many ways to do it, you can ask AI to play it out for you.
What benefit I can see the most is that I no longer use Google and things like Stack Overflow, but actual books and LLMs instead.
I don't think the junior developer comparison holds up too well ...
1) The junior developer is able to learn from experience and feedback, and has a whole brain to use for this purpose. You may have to provide multiple pointers, and it may take them a while to settle into the team and get productive, but sooner or later they will get it, and at least provide a workable solution if not what you may have come up with yourself (how much that matters depends on how wisely you've delegated tasks to them). The LLM can't learn from one day to the next - it's groundhog day every day, and if you have to give up with the LLM after 20 attempts it'd be the exact same thing tomorrow if you were so foolish to try again. Companies like Anthropic apparently aren't even addressing the need for continual learning, since they think that a larger context with context compression will work as an alternative, which it won't ... memory isn't the same thing as learning to do a task (learning to predict the actions that will lead to a given outcome).
2) The junior developer, even if they are only marginally useful to begin with, will learn and become proficient, and the next generation of senior developer. It's a good investment training junior developers, both for your own team and for the industry in general.
Valid points but one could argue OPUS learned by going from 4 to 4.1 today.
Yes, but pre-training of any sort is no substitute for being able to learn how to act from your own experience, such as learning on the job.
An LLM is an auto-regressive model - it is trying to predict continuations of training samples purely based on the training samples. It has no idea what were the real-world circumstances of the human who wrote a training sample when they wrote it, or what the real-world consequences were, if any, of them writing it.
For an AI to learn on the job, it would need to learn to predict it's own actions in any specific circumstance (e.g. circumstance = "I'm seeing/experiencing X, and I want to do Y"), based on it's own history of success and failure in similar circumstances... what actions led to a step towards the goal Y? It'd get feedback from the real world, same as we do, and therefore be able to update it's prediction for next time (in effect "that didn't work as expected, so next time I'll try something different", or "cool, that worked, I'll remember that for next time").
Even if a pre-trained LLM/AI did have access to what was in the mind of someone when they wrote a training sample, and what the result of this writing action was, it would not help, since the AI needs to learn how to act based on what is in it's own (ever changing) "mind", which is all it has to go on when selecting an action to take.
The feedback loop is also critical - it's no good just learning what action to take/predict (i.e what actions others took in the training set), unless you also have the feedback loop of what the outcome of that action was, and whether that matches what you predicted to happen. No amount of pre-training can remove the need for continual learning for the AI to correct it's own on-the-job mistakes, and learn from it's own experience.
> When I have had engineers who were 10x as valuable as others it was primarily due to their ability to prevent unnecessary work. Talking a PM down from a task that was never feasible. Getting another engineer to not build that unnecessary microservice. Making developer experience investments that save everyone just a bit of time on every task. Documenting your work so that every future engineer can jump in faster. These things can add up over time to one engineer saving 10x the time company wide than what they took to build it.
What about just noticing that coworkers are repeatedly doing something that could easily be automated?
I'd call that "Making developer experience investments that save everyone just a bit of time on every task."
Ah, I guess I missed your meaning, then.
The article is spot on, however, who is claiming a 10x speed up from AI? I have heard many crazy claims so far but nothing that bad.
In addition to the article, I'd like to add that most DEV jobs I have been in had me coding only 50% of my time at most. The rest of the time was spent in meetings, gathering requirements and investigating Prod issues.
There is no doubt in my mind that AI makes me more productive and gets me back at least to the level Google did when it still worked.
Don't tell that to the investors investing hundreds of billions in AI
I’m actually infinity more productive because I wouldn’t start projects without AI. (Just lazy and burned out from the tedium of codin g) but I’m enjoying it if the AI does a lot of the tedium.
Yep, I feel you. Let me just explain what I want in detail and one little piece at a time, and AI, make my words become code and I will watch you do it to make sure you don't mess up.
LLMs still leave something to be desired for DevOps related work; infrastructure code. There is still not really enough context available when crossing the division between the hardware, OS, and software.
For Terraform, specifically, Claude 4 can get thrown into infinite recursive loops trying to solve certain issues within the bounds of the language. Claude still tries to add completely invalid procedures into things like templates.
It does seem to work a bit better for standard application programming tasks.
It's not surprising to me that it struggles with a language where there aren't billions of lines of code available to use as training data.
I wonder if that's all it is, or if the lack of context you mention is a more fundamental issue.
The problem with the 10x engineer myth is that there is no baseline to what an engineer is, as much as there is no baseline for what a human is.
Any tool can be shown to increase performance in closed conditions and within specific environments, but when you try to generalize things do not behave consistently.
Regardless, I would always argue that trying new tech / tools / workflows is always better than being stiff in your ways, regardless of the productivity results. I do like holding up on new things until things mature down a bit before trying though.
The only people who get 10x productivity are people who are either:
- solo projects
- startups with few engineers doing very little intense code review if any at all
- people who don't know how to code themselves.
Nobody else is realistically able to get 10x multipliers. But that doesn't mean you can't get a 1.5-2x multiplier. I'd say even myself at a large company that moves slow have been able to realize this type of multiplier on my work using cursor/claude code. But as mentioned in the article the real bottleneck becomes processes and reviews. These have not gotten any faster - so in real terms time to ship/deliver isn't much different than before.
The only attempt that we should make at minimizing review times is by making them higher priority than development itself. Technically this should already be the case but in my experience almost no engineer outside of really disciplined companies and not in FAANG actually makes reviews a high priority, because unfortunately code reviews are not usually part of someones performance review and slows down your own projects. And usually your project manager couldn't give two shits about someone elses work being slow.
Processes are where we can make the biggest dent. Most companies as they get large have processes that get in the way of forward velocity. AI first companies will minimize anything that slows time to ship. Companies simply utilizing AI and expecting 10x engineers without actually putting in the work to rally around AI as a first class citizen will fall behind.
10x has always been an exaggeration, but I know from repeated experience it is possible to complete projects far quicker than 2x the speed of the typical team on a modern web stack. The way you do it is by writing less code. Typically this is done by using mature software as a starting point, rather than screwing around with the hot new thing. Seems fairly obvious when stated plainly, and yet so many teams make the same mistake of building from near scratch. Even worse, what teams come up with is usually slower to iterate with than existing software, because they approach it from the perspective of building a singular app rather than designing something to build generalized solutions upon.
The important things to remember about these claims/articles is that LLMs are useful for a wide variety of tasks. An engineer doesn’t only code, but also has to learn, search, gather / define requirements, write tests, troubleshoot, read/review other people's code, deal with project management tools, document (both for developers and for customers).
Also, one underestimated aspect is that LLMs don’t get writer’s block or get tired (so long as you can pay to keep the tokens flowing).
Also, one of the more useful benefits of coding with LLMs is that you are explicitly defining the requirements/specs in English before coding. This effectively means LLM-first code is likely written via Behavior Driven Development, so it is easier to review, troubleshoot, upgrade. This leads to lower total cost of ownership compared to code which is just cowboyed/YOLOed into existence.
things that might actually make me several x faster:
* if my Github actions ran 10x faster, so I don't start reading about "ai" on hackernews while waiting to test my deployment and not noticing the workflow was done an hour ago
* if the Google cloud console deployment page had 1 instead of 10 vertical scroll bars and wasn't so slow and janky in Firefox
* if people started answering my peculiar but well-researched stackoverflow questions instead of nitpicking and discussing whether they belong on superuser vs unix vs ubuntu vs hermeneutics vs serverfault
* if MS Teams died
anyway, nice to see others having the same feeling about llm's
I largely agree with the gist of this article but its calculation about productivity is very flawed as it doesn’t account for the time things sat on the backlog, or the things that wouldn’t have been done at all.
Where I see major productivity gains are on small, tech debt like tasks, that I could not justify before. Things that I can start with an async agent, let sit until I’ve got some downtime on my main tasks (the ones that involve all that coordination). Then I can take the time to clean them up and shepherd them through.
The very best case of these are things where I can move a class of problem from manually verified to automatically verified as that kick starts a virtuous cycle that makes the ai system more productive.
But many of them are boring refactors that are just beyond what a traditional refactoring tool can do.
It's allowed me to be even more of a perfectionist. I'm quite enjoying it, and I review and revise everything that's produced line by line.
I doubt that's the commonly desired outcome, but it is what I want! If AI gets too expensive overnight (say 100x), then I'll be able to keep chugging along. I would miss it (claude-code), but I'm betting that by then a second tier AI would fit my process nearly as well.
I think the same class of programmers that yak shave about their editor, will also yak shave about their AI. For me, it's just augmenting how I like to work, which is probably different than most other people like to work. IMO just make it fit your personal work style... although I guess that's problematic for a large team... look, even more reasons not to have a large team!
Only vibe-coding influencers were ever talking about 10x multipliers.
Internally we expected 15%-25%. A big-3 consultancy told senior leadership "35%-50%" (and then tried to upsell an AI Adoption project). And indeed we are seeing 15%-35% depending on which part of the org you look and how you measure the gains.
What are you measuring?
Anytime you start talking about massive speedups, it's important to go re-read Amdahl's law.
The key isn't how much you can speed up the scalable/parallelizable portions, it's how limited you are by the non-scalable/parallelizable aspects.
> If listening to a 70 year old disk makes you happier, just do it. You'll listen to more music if you do that than you would by forcing yourself to use the more "productive" streaming service.
Ironically, when I listen to vinyl instead of streaming, I listen to less music.
If I'm in the zone, I will often go minutes between flipping the record or choosing another one; even though my record player is right next to me.
Listening to a good album is an immersive experience. I often don't have the urge to directly play another one. If I do, they often similar thematically or by the same artist.
> Listening to a good album is an immersive experience.
That's when/if you're giving it your full attention. I used to do that when I was younger, but much less frequently now.
That being said, there's something hypnotic about watching a record spin, and seeing the needle in the groove. I don't do it now that I'm older, but my kids used to specifically ask me to play a record just so they could see it spin.
AI never says it doesn’t know. It’ll always have an answer even if it’s wrong or misleading
This really depends on the prompting. I experienced it multiple times that claude code couldn't figure out how to fix a bug and just gave up. Instead of getting stuck in an infinite loop.
Usually I get the loop, which runs me out of compute, and have to wait till tmrw. Glad there is actually a way to stop it from confidently “fixing” the same issue.
> When you write code, how much of your time do you truly spend pushing buttons on the keyboard? It's probably less than you think. Much of your prime coding time is actually reading and thinking
Totally agree, IMO there's a lot of potential for these tools to help with code understanding and not just generation. Shameless plug for a code understanding tool we've been working on that helps with this: https://github.com/sourcebot-dev/sourcebot
Thanks colton. Man, you just made me feel 10x better :) And ahh yes I said 10x. :P
I'm happy to hear that! A lot of people posting their hot takes here about how AI is actually great or actually awful, but I was hoping to have more conversations like this in the comments. I'm glad I can help people feel better.
Personally, he made me feel 100x better!
The central theme is very hard to disgree with; claims of productivity increase being self-reported are oftem misleading. The way forward is math and meaningful measurement, this bears repeating.
I find that getting from zero to 80-90% functionality on just about anything software these days is exceedingly easy. So, I wonder if AI just rides that wave. Software development is maturing now such that making software with or without AI feels 10-100x faster. I suspect it is partially due to the profound leap that has been made with collaborative tools, compilers, languages, and open source methodology, etc..
> It's not good at keeping up with the standards and utilities of your codebase.
Not my experience.
You can instruct Claude Code to respect standards and practices of your codebase.
In fact I noticed that Claude Code has forced me to make few genuinely important things like documenting more, writing more E2E tests and tracking architectural and style changes.
Not only I am forcing myself to a consistent (and well thought styling), but I also need it later to feed it to the AI itself.
Seriously, I don't want to offend no one, but if you believe that AI doesn't make you more productive you've got skill issues in adopting and using new tools at what they are good at.
Yes, I'm forced to be a real senior dev now, rigid specs, documentation, enforced coverage. Was easier before just hiring really smart people that didn't have to have everything spelled out for them.
I’ve mentioned this elsewhere, but I think a better question is: do you find AI makes your coworkers N times better.
It makes everyone “produce more code” but your worst dev producing 10X the code is not 10X more productive.
There’s also a bit of a dunning Kruger effect where the most careless people are the most likely to YOLO thousands of lines of vibecode into prod. While a more meticulous engineer might take a lot more time to read the changes, figure out where the AI is wrong, and remove unnecessary code. But the second engineer would be seen as much much less productive than the first in this case
> Thus, AI's best use case for me remains writing one-off scripts. Especially when I have no interest in learning deeper fundamentals for a single script, like when writing a custom ESLint rule.
Perfectly put. I've been using a lot of AI for shell scripting. Granted I should probably have better knowledge of shell but frankly I think it's a terrible language and only use it because it enjoys wide system support and is important for pipelining. I prefer TS (and will try to write scripts and such in it if I can) and for that I don't use AI almost at all.
Takes on dev-focused AI is so divided right now. Appears some people just don’t understand the workflows that are effective. It actually takes a lot of work to set it up. It’s not as simple as typing in prompts.
Would you say they're holding it wrong?
You are meant to hold it upside down with the screen facing away, otherwise you cant possibly expect it to work.
Yea. I feel likes its a waste of time and energy to read/argue about it. Either it works or it doesn't. Everyone already has exposure to it and opinions on it. There's no need to convince anyone. Reality will bear out the results.
It's lik monads. You either know how to use it or you don't. Those who claim to know seem unable to explain it so that those who don't can reproduce the success.
Do you have an example FOSS project you can share with the requisite AI guardrail files, and example prompts?
its an illusion that you have discovered some golden promoting workflows. If you had then you would share it instead of being handwavy and secretive.
Couldn't agree more. So fed up of these stupid marketers talking about how they built a SAAS solution in days, solved all their technical problems. AI puts everything magically together. All you need is rules files. I cant even build a consistent backend I am ok with any technology it likes. Once you hit a bug it struggles to fix it without breaking 10 other things. I hope some technique comes in that can help you manage what it outputs. In its current state I would be very careful with what is being pushed to production.
I don’t think AI makes me 10x more productive. It does make me close to 10x less bored though.
Much of production software engineering is writing boiler plate, building out test matrices and harnesses, scaffolding structure. And often, it’s for very similarly shaped problems at their core regardless of the company, organization, or product.
AI lets me get a lot of that out of the way and focus on more interesting work.
One might argue that’s a failure of tools or even my own technique. That might be true, but it doesn’t change the fact that I’m less bored than I used to be.
I'm happy to hear that! I hope you felt seen by this line from the article:
> Oh, and this exact argument works in reverse. If you feel good doing AI coding, just do it. If you feel so excited that you code more than ever before, that's awesome. I want everyone to feel that way, regardless of how they get there.
100%. It's made me like dev again because my head can be used for things other than remembering arcania - this may be a curse of using languages like Ruby and Elixir which mostly don't have great tooling.
I enjoyed the article, fwiw. Twitter was insufferable before Elon bought it, but the AI bro scene is just...wow. An entire scene who only communicate in histrionics.
This mostly mimics my own experience. I’ve mostly gotten value out of handing off planned/scoped coding tasks to LLMs. It’s faster to have the LLM generate code and the quality is usually fine if the task is properly scoped.
Actually writing software was only like 15-20% of my time though so the efficiency wins from having an LLM write the code is somewhat limited. It’s still another tool that makes me more productive but I’ve not figured out a way to really multiplicatively increase my productivity.
> 10x productivity means ten times the outcomes, not ten times the lines of code. This means what you used to ship in a quarter you now ship in a week and a half.
This assumes the acceleration happens on all tasks. Amdahl's law states that the overall acceleration is constrained by the portion of the accelerated work. Probably it's just unclear if the "engineer" or "productivity" means the programming part or the overall process.
Based on my own experience and reading a ton of HN posts, i would summarize it as:
- vibe coding is fun, but not production-ready software engineering
- LLMs like CC today moderately boost your performance. A lot of attention is still needed.
- some TDD style is needed for the AI tool to converge
- based on the growth of the last few months, it is quite likely that these tools will increase IC productivity substantially
- fully autonomous agentic coding will take more time as the error rate needs to decline significantly
I recently used Google's Gemini to review and debug some code that was performing some runtime code generation in .NET. It pointed out some issues I was aware of, some cases I hadn't considered but would have eventually hit with some testing, and then helped debug why some tests were failing. Pretty impressive actually. Probably wasn't a 10x savings, but definitely 2x or more in some cases, and some of the tasks it eliminated were the tedious ones, which is a big help for motivation.
I could not have built EACL[^1] without AI. We use EACL at work at now, so arguably AI has made me 10x more productive, but only because I know how to write specs for AI to get what I want.
Context: EACL is an embedded ReBAC authorization library based on SpiceDB-compatible*, built in Clojure and backed by Datomic.
If writing code - or unfamiliar tasks - is the constraint (often true in greenfield dev).. congratulations, with AI that constraint is gone.
Because AI gets you to the next constraint even faster :)
Is anybody (who has the data) actually claiming use of AI will make a single average engineering 10x faster/better?
Or the data showing something else... possibly, a company starts telling engineers to use AI, then RIFs a huge portion, and expects the remaining engineers to pick up the slack. They now claim "we're more efficient!" when they've just asked their employees to work more weekends.
I have points completed over a six week period at 4.3 per day average, similar architecture was 3.5 points per week before Claude code. The average with Claude is slowing though, need a couple of more months to make any conclusions but management won't let me stop now lol.
Does it need to enable 10x productivity? Part of the job of a developer is the constant pursuit of new tools to make you more efficient. The developers who do not evolve all the time are eventually passed by a younger generation. If it makes you more productive you should use it. Obviously there is going to be a ton of hype, just ignore it.
I’ve noticed it slowing our developers down and causing some brain rot when they have to work unassisted. It’s frustrating and sad to see.
I was making a VB script for excel for merging individual workbooks to single workbook . Normally I would design the script myself. But this time I used copilot to do it. It would take me 30-min to 1 hour normally. But with copilot it took 15 minutes and a lot less brain cells. And less skill.
It is not making us 10x productive. It is making it 10x easier.
> 10x productivity means ten times the outcomes, not ten times the lines of code. This means what you used to ship in a quarter you now ship in a week and a half.
Exactly. I spend less than 20% of my time writing code. If LLMs 1,000,000,000-x'd my code writing, it would make me 1.25x as efficient overall, not 10x as efficient. It's all influencer hype nonsense, just like pair programming and microservices and no-code companies and blockchain.
> You can't compress the back and forth of 3 months of code review into 1.5 weeks.
If your organization is routinely spending 3 months on a code review, it sounds like there's probably a 10 to 100x improvement you can extract from fixing your process before you even start using AI.
I think I may have worded this poorly. I mean the total amount of code review time that goes into 3 months of work (likely on hundreds of PRs) can't be compressed into 1.5 weeks at the same portion of time being allocated to code review. Each code review has a "floor" time, a minimum amount of time loss due to context switching, reading, writing, etc.
When my friend bullied me into using Cursor and I got that vscode fork set up with good enough vim bindings to not make me rip my hair out, it was like a first hit of a good drug. I couldn't believe how productive I was, and how much brain power I was saving by chilling at my desk and watching youtube while Cursor agented its way through some code that I would occasionally check in on and tweak. I got new modals done, new scientific charts (that I'd been terrified to implement since it was my job to engineer them, though they were chemistry charts so I didn't really understand them all that well), a full design rewrite, new components, oh man it felt great.
Then it came time to make a change to one of the charts. Team members were asking me questions about it. "How can we make this axis display only for existing data rather than range?" I'm scrolling through code in a screenshare that I absolutely reviewed, I remember doing it, I remember clicking the green arrow in Cursor, but I'm panicking because this doesn't look like code I've ever seen, and I'm seeing gaping mistakes and stupid patterns and a ton of duplicated code. Yeah I reviewed it, but bit by bit, never really all at once. I'd never grocked the entire file. They're asking me questions to which I don't have answers, for code "I'd just written." Man it was embarrassing!
And then to make the change, the AI completely failed at it. Plotly.js's type definitions are super out of date and the Python library is more fleshed out, so the AI started hallucinating things that exist on Python and not in JS - so now I gotta head to the docs anyway. I had to get much more manual, and the autocomplete of cursor was nice while doing so, but sometimes I'd spend more time tab/backspacing after realizing the thing it recommended was actually wrong, than I'd have spent just quickly typing the entire whatever thing.
And just like a hit, now I'm chasing the dragon. I'd love to get that feeling back of entering a new era of programming, where I'm hugely augmented. I'm trying out all the different AI tools, and desperately wishing there was an autocomplete as fast and multi-line and as good as jumping around as Cursor, available in nvim. But they all let me down. Now that I'm paying more attention, I'm realizing the code really isn't good at all. I think it's still very useful to have Claude generate a lot of boilerplate, or come in and make some tedious changes for me, or just write all my tests, but beyond that, I don't know. I think it's improved my productivity maybe 20%, all things considered. Still amazing! I just wish it was good as I thought it was when I first tried it.
It turns out, everyone was already a 10-100x engineer in a greenfield project.
> When I have had engineers who were 10x as valuable as others it was primarily due to their ability to prevent unnecessary work.
Interesting observation. I am inclined to agree with this myself. I'm more of a 10^0 kind of developer though.
Whatever productivity gains I get is usually replacing documentation discovery and lookup rather than typing code
What makes this "AI will replace you with a 10x AI-based engineer" narrative a non-starter is actually having a 10x non-AI engineer on the team.
It is finally starting to feel like the craze and hype bubble is being rightfully questioned left and right.
Sorry if this is grumpy, but I'm tired of seeing so many blogposts making the same conclusion from the dev's side
LLMs make writing code quick, that's it. There's nothing more to this. LLMs aren't solutioning nor are they smart. If you know what you want to build, you can build quick. Not good, quick.
That said, if managers don't care about code quality (because customer's don't care either) then who am I to judge them. I don't care.
I'm on the edge of just blacklisting the word AI from my feed.
I measured quite carefully on a greenfield project and saw 4.3x for the first three weeks which was incredible. Now it's about 2x, I really need to improve my context wrangling.
Agree with a lot of arguments of the author.
When you're not sure if what someone says makes sense, trust common sense, your own experience, and your thinking.
Google became Expert exchange become stackoverflow became Google again became Chatgpt became Google again. Every time, so much more faster to get your boilerplate.
The last month on our team: https://s.h4x.club/RBuDv0jd
The full year is just the more of the above.
AI reduces the time required to complete certain tasks, but that time is then re-allocated to additional validation than would not have been supposed necessary otherwise. It also increases the quantity and rapidity of output expected in a set amount of time and makes me ``lazier", i.e. I sit and watch the code get produced instead of divert my attention elsewhere.
It’s true the traditional software development team structure won’t scale 10x
You have to change the organization.
- no peer code review, u review the AI output and that’s enough
- devs need authority to change code anywhere in the company. No more team A owns service A and team B owns service B
- every dev and ops person needs to be colocated, no more waiting for timezones
- PMs and engineers are the same role now
Will it work for every company? No , if you are building a pacemaker , don’t use AI . Will things break? Yes sometimes but you can roll back.
Will things be somewhat chaotic? Yes somewhat but what did you think going 10x would feel like?
as someone who’s been coding most of his entire life I have to admit.. LLMs kind of killed the magic of programming for me… It’s not as cool when I create something with the LLM.. It just feels like you had someone else do the work for you… It’s sad kind of…
Learning to use AI well feels like a whole new job—it's not just coding anymore, it's prompting, testing, and debugging the AI too.
I wonder if the gap between perception and reality, from that recent study, is because AIs are still so slow? The modern equivalent of https://xkcd.com/303/ might be "Waiting for the agent to complete!" So then you know you're more productive (and can spend more ten minute chunks of time reading HN) but your boss doesn't see it....
The worst of all is that we used to spend time thinking about real issues. Now 70% of thinking, blogging, and (in some companies) programming time is spent on how to make inferior and nondeterministic tools accomplish something.
It's like discussing in a gaming guild how to reach the next level. It isn't real.
Was pleased to see Austen Allred catching strays in this article. May he never live it down.
For me the main benefit is that AI chat provides at least 10x better results than Google ~search~ Ad results
Ad-free LLM output won't last. Or at least you'll pay a premium for it. Personally, I do pay for a search engine subscription (Kagi) in an attempt to align my interests with my search results.
I often wonder how much of 10x engineers is circumstance vs talent/skill. Separate from the issue of LLMs.
If I can write blue sky / green field, code. Brand new code in a new repo, no rules just write code, I can write tons of code. What bogs me down are things like tests. It can take more time to write tests than the code itself in my actual work project. Of course I know the tests are important and maybe the LLM can help here. I'm just saying that they slow me down. Waiting for code reviews slows me down. Again, they're super useful but coming from a place where the first 20-25 years of my career I didn't have them they are a drag on my performance. Another is just the size of the project I'm on. > 500 programmers on my current large project. Assume it's an OS. It's just hard to make progress on such a large project compared to a small one. And yet another which is part of the first, other people's code. If I write the whole thing or most of it, then I know exactly what to change. I've written features in code I know in days that someone who was not familiar with the code I believe would have taken months. But, me editing someone else's code without the entire state of the code base in my head is 10x slower.
That's a long way of saying, many 10xers might just be in the right circumstance to provide 10x. You're then compared against them but you're not in the same circumstance so you get different results.
Nah, I've worked with maybe 2 people I'd say were "10x", or at least 5x, and it was definitely skill.
I used to not really believe people like that existed but it turned out they're just rare enough that I hadn't worked with any yet. You could definitely go a whole career without ever working with any 10x engineers.
And also it's not like they're actually necessary for a project to succeed. They're very good but it's extremely unlikely that a project will succeed on the back of one or two very good engineers. The project I worked with them on failed for reasons nothing to do with us.
Human goals are more important. I think conceptually the idea should always be strong goals set by humans and then sub goals each with a praticularly well defined *plan* for meeting them. This needs to be the conceptual basis, if you are having to plan for 50% or 75%(gasp) of the time for a feature and then AI just writes code, that is not intelligence much less a 10x engineer.
My use case is not for a 10x engineer but instead for *cognitive load sharing*. I use AI in a "non-linear" fashion. Do you? Here is what that means:
1. Brainstorm an idea and write down detailed enough plan. Like tell me how I might implement something or here is what I am thinking can you critique and compare it with other approaches. Then I quickly meet with 2 more devs and make a design decision for which one to use.
2. Start manual coding and let AI "fill the gaps": Write these test for my code or follow this already existing API and create the routes from this new spec. This is non-linear because I would complete 50-75% of the feature and let the rest be completed by AI.
3. I am tired and about to end my shift and there is this last bug, I go read the docs but I also ask AI to read my screen and come up with some hypothesis to come up with. I decide which hypothesis are most promising after some reading and then ask the AI to just test that(not fix it on auto mode).
4. Voice mode: I have a shortcut that triggers claude code and uses it like a quick "lookup/search" in my code base. This avoids context switching.
The part I agree about: Software engineering is about more than writing code, so accelerating coding by 10X doesn't accelerate a software engineer by 10X.
The part I disagree about: I've never worked at a company that has a 3 month cycle from code-written to code-review-complete. That sounds insane and dysfunctional. AI won't fix an organization like that
Perhaps I was not clear here. My point isn't to say that one PR gets merged in 3 months. My point is to say that, lets say, 15 PRs from one dev get merged per quarter in the old days, for a 10x productivity boost that means that roughly 15 PRs get merged per 7 business days now. My point is simply that the amount of time that goes into the basic lag cycle involved in code review can't be compressed to 7 days.
I think focusing on end-to-end time confuses things more than it helps. A system can have 10X throughput with the latency being unchanged. You don't need to reduce latency or cycle time to have a 10X increase in throughput.
The better argument is that Software Engineers spend a lot of time doing things that aren't writing code and arent being accelerated by any AI code assistant
While I agree with some components of this blog, I also think that the author is speaking from a specific vantage point. If you are working at a large company on a pre-existing codebase, you likely have to deal with complexity that has compounded over many product cycles, pull requests, and engineer turnover. From my experience, AI has increased my performance roughly by 20%. This is primarily due to LLMs bypassing much of the human slop that has accumulated over the years on Google.
For newer languages, packages, and hardware-specific code, I have yet to use a single frontier model that has not slowed me down by 50%. It is clear to me that LLMs are regurgitating machines, and no amount of thinking will save the fact that the transformer architecture (all ML really) poorly extrapolates beyond what is in the training canon.
However, on zero-to-one projects that are unconstrained by my mag-seven employer, I am absolutely 10x faster. I can churn through boilerplate code, have faster iterations across system design, and generally move extremely fast. I don't use agentic coding tools as I have had bad experiences in how the complexity scales, but it is clear to me that startups will be able to move at lightning pace relative to the large tech behemoths.
I love days where someone else has written down what I wish I could. This is a brilliantly written and sober look into using LLMs as a software engineer.
I've been using Claude Code professionally for the past 2 months, with limited agent use prior to that (via Windsurf). I would say I've seen a 30% boost in productivity overall, with significant spikes in particular types of work.
Where CC has excelled:
- New well-defined feature built upon existing conventions (10x+ boost)
- Performing similar mid-level changes across multiple files (10x+ boost)
- Quickly performing large refactors or architecture changes (10x+ boost)
- Performing analysis of existing codebases to help build my personal understanding (x10+ boost)
- Correctly configuring UI layouts (makes sense: this is still pattern-matching, but the required patterns can get more complex than a lot of humans can quickly intuit)
Where CC has floundered or wasted time:
- Anything involving temporal glitches in UI or logic. The feedback loop just can't be accomplished yet with normal tooling.
- Fixing state issues in general. Again, the feedback loop is too immature for CC to even understand what to fix unless your tooling or descriptive ability is stellar.
- Solving classes of smallish problems that require a lot of trial-and-error, aren't covered by automated tests, or require a steady flow of subjective feedback. Sometimes it's just not worth setting up the context for CC to succeed.
- Adhering to unusual or poorly-documented coding/architecture conventions. It's going to fight you the whole way, because it's been trained on conventional approaches.
Productivity hacks:
- These agents are automated, meaning you can literally have work being performed in parallel. Actual multitasking. This is actually more mentally exhausting, but I've seen my perceived productivity gains increase due to having 2+ projects going at once. CC may not beat a single engineer for many tasks, but it can literally do multiple things at once. I think this is where the real potential comes into play. Monitoring multiple projects and maintaining your own human mental context for each? That's a real challenge. - Invest in good context documents as early as possible, and don't hesitate to ask CC to insert new info and insights in its documents as you go. This is how you can help CC "learn" from its mistakes: document the right way and the wrong way when a mistake occurs.
Background: I'm a 16yoe senior fullstack engineer at a startup, working with React/Remix, native iOS (UIKit), native Android (Jetpack Compose), backends in TypeScript/Node, and lots of GraphQL and Postgres. I've also had success using Claude Code to generate Elixir code for my personal projects.
Number of people that read the article before counting: 2
I did read it, but anyone who didn't isn't missing anything.
To summarize, LLM agents are not the silver bullet those promoting them suggest they are. The headline is all that was needed.
I thought AI is making actual experienced developers 19% less productive?
https://www.businessinsider.com/ai-coding-tools-may-decrease...
It's optimizing the part that is easy at the cost of the part that is hard.
Nitpick (since I couldn't leave a comment there, hopefully author reads it):
> It tends to struggle with languages like Terraform
The language is called HCL (HashiCorp Configuration Language).
I've tested Gemini Pro 2.5 yesterday with a function I had troubles with. It wasn't something I can't do, just one of those things easy to get wrong that I postponed because I lacked focus that day due to a heat wave. The AI spit out a perfect function with working tests after the first prompt.
Now I don't want to sound like a doomsayer but it appears to me that application programming and corresponding software companies are likely to disappear within the next 10 years or so. We're now in a transitional phase were companies who can afford enough AI compute time have an advantage. However, this phase won't last long.
Unless there is a principal block to further enhance AI programming, not just simple functions but whole apps can be created with a prompt. However, this is not where it is going to stop. Soon, there will be no need for apps in the traditional sense. End users will use AI to manipulate and visualize data and operating systems will integrate the AI services needed for this. "Apps" can be created on the fly and are constantly adjusted to the users' needs.
Creating apps will not remain a profitable business. If there is an app X someone likes, they can prompt their AI to create an app with the same features, but perhaps with these or those small changes, and the AI will create it for them, including thorough tests and quality assurance.
Right now, in the transitional phase, senior engineers might feel they are safe because someone has to monitor and check the AI output. But there is no reason why humans would be needed for that step in the long run. It's cheaper to have 3 AIs quality test and improve the outputs of one generating AI. I'm sure many companies are already experimenting with this, and at some point the output of such iterative design procedures will have far less bugs than any code produced by humans. Only safety critical essential features such as operating systems and banking will continue to be supervised by humans, though perhaps mostly for legal reasons.
Although I hope it's not but to me the end of software development seems a logical long-term consequence of current AI development. Perhaps I've missed something, I'd be interested in hearing from people who disagree.
It's ironic because in my great wisdom I chose to quit my day job in academia recently to fulfill my lifelong dream of bootstrapping a software company. I'll see if I can find a niche, maybe some people appreciate hand-crafted software in the future for its quirks and originality...
You just said we won’t need to develop apps anymore.. What will be the AI need to create at all?
Isn't this a strawman? I follow the AI coding space, and I've never found anyone claiming it made them 10x as productive.
I think this is a strawman argument that is conflating uses for AI. I posted a video not long ago where Andrew Ng makes the claim to the AI Startup school that in testing they are seeing ~10x improvement for greenfield prototypes and 30%-50% improvement in existing production code bases.
So two groups are talking past one another. Someone has a completely new idea, starts with nothing and vibe codes a barely working MVP. They claim they were able to go from 0 to MVP ~10x faster than if they had written the code themselves.
Then some seasoned programmer hears that claim, scoffs and takes the agent into a legacy code base. They run `/init` and make 0 changes to the auto-generated CLAUDE.md. They add no additional context files or rules about the project. They ask completely unstructured questions and prompt the first thing that comes into their minds. After 1 or 2 days of getting terrible results they don't change their usage or try to find a better way, they instead write a long blog post claiming AI hype is unfounded.
What they ignore is that even the maximalists are stating: 30%-50% improvement on legacy code bases. And that is if you use the tool well.
This author gets terrible results and then says: "Dark warnings that if I didn't start using AI now I'd be hopelessly behind proved unfounded. Using AI to code is not hard to learn." How sure is the author that they actually learned to use it? "A competent engineer will figure this stuff out in less than a week of moderate AI usage." One of the most interesting things about learning are those things that are easy to learn and hard to master. You can teach a child chess, it is easy to learn but it is hard to master.
Has any company done their yearly release - like Apple Google I/O where they released 10x as many products or the releases were 10x as ambitious?
Really enjoyed this post. I think this is the best mindset to have around the future of AI programming.
Here's a thesis:
Maybe LLMs make you 10x faster at using boilerplate-heavy things like Shadcn/ui or Tanstack.
...which is still only about half as fast as using a sane ecosystem.
IMO this is why there's so many diverging opinions about the productivity of AI tools.
....But all of these so called "green" companies are using thousands of gallons of water and sucking down clean energy that could have been used to take coal plants offline.
When I asked ChatGPT about this topic it claimed that AI can make a software developer up to about 50% more productive on average. Sounds more reasonable to me. I often write custom tools to generate code. Sometimes when stars are aligned I get that 100x feeling. And sometimes I regret it so hard a couple of years later.
if you are working in a domain you know well, ai will not save you any time. if you are not, you will have to carefully review the ai while not building any intuition or background sources anyway so you end up not saving time and developing a new dependency.
but every company is going to enshittify everything they can to pidgeonhole ai use to justify the grifters costs
i look forward to years out when these companies trying to save money at any cost have to pay senior developers to rip all this garbage out
Maybe 1x engineer is not going to 10x.
But is a 10x going to 100x?
AI is making experineced developers with architecture experience quite a bit faster.
Ingesting legacy code, understanding it, looking at potential ways to rework it, and then putting in place the axioms to first work with it yourself, and then for others to join in has been able to get down from months to weeks and days.
Developing green field from scratch, statically typed languages seem to work a bit better than not.
Putting enough information around the requirements and how to structure undertake them is critical or it can turn into cowboy coding pretty easily, or default AI is leaning towards the average of it's corpus, not the best. That's where the developer comes in.
If you wanted to make me 10x as productive in the past the best thing you could have done is quit forcing use of shared services and infrastructure I don't have control over.
Any codebase that's difficult for me to read would be way too large to use an LLM on.
This is not a surprising conclusion for anyone that works in the field and uses the current gen of LLMs.
The problem is...
1. there is an enormous investment in $$$ produces a too big to fail scenario where extravagant claims will be made regardless
2. leadership has made big promises around productivity and velocity for eng
the end result of this is going to be a lot of squinting at the problem, ignoring reality and declaring victory. These AI tools are very useful in automating chore and grunt tasks.
Product guys with no technical experience are getting one-shotted by VC dollars making them think they can create projects themselves. It's an admirable goal but will never happen.
Now for senior developers, AI has been tremendous. Example: I'm building a project where I hit the backend in liveview, and internally I have to make N requests to different APIs in parallel and present the results back. My initial version to test the idea had no loading state, waiting for all requests to finish before sending back.
I knew that I could use Phoenix Channels, and Elixir Tasks, and websockets to push the results as they came in. But I didn't want to write all that code. I could already taste it and explain it. Why couldn't I just snap my fingers?
Well AI did just that. I wrote what I wanted in depth, and bada bing, the solution I would have written is there.
Vibe coders are not gonna make it.
Engineers are having the time of their lives. It's freeing!
If you’re writing the same code you would have written 2 years ago, you won’t see much speedup
But if your system records internal state in english and generates code while handling requests, complex systems can become much simpler. You can build things that were impossible before
I dont use ai code generation tools, I just use claude as a search engine. It hasn't changed the output rate of my code but I believe that its improved the quality of it by exposing me to patterns and features that I otherwise may not have. I used to take a very object oriented approach to code, but when I would ask claude to look at my code and critique it, it would often lead me into more functional patterns, with result type returns and eliminating global state. Ive completely stopped using exceptions and functional programming has GREATLY increased the confidence I have in my code to the point where I write 2000 lines at a time and get a successful first test nearly every time.
The credit lies with a more functional style of C++ and typescript (the languages i use for hobbies and work, respectively), but claude has sort of taken me out of the bubble I was brought up in and introduced new ideas to me.
However, I've also noticed that LLM products also tend to reinforce your biases. If you dont ask it to critique you or push back, it often tells you what a great job you did and how incredible your code is. You see this with people who have gotten into a kind of psychotic feedback loop with ChatGPT and who now believe they can escape the matrix.
I think LLMs are powerful, but only for a handful of use cases. I think the majority of what theyre marketed for right now is techno-solutionism and theres an impending collapse in VC funding for companies that are plugging in chatgpt APIs for everything from insurance claims to medical advice
> I dont use ai code generation tools
Then unfortunately you're leaving yourself at a serious disadvantage.
Good for you if you're able to live without a calculator, but frankly the automated tool is faster and leaves you less exhausted so you should be taking advantage of it.
That's a very narrow perspective. Will the tool deskill me? Will the tool lower my work quality? Will the tool make a bigger share of my work reviewing vs thinking and creating? Will the tool make the work overall less interesting (which motivates me)? Etc. Etc. This is even assuming that the FOMO is justified. So fat studies don't show this is the case, but things might change.
Is typing speed a bottle neck for people? Because otherwise you’re offloading thinking to the LLM. Unless you can understand code faster than you can write it (which I’ve never experienced - best case scenario I can understand as fast as I read).
As a junior I used to think it was ok to spend much less time on the review than the writing, but unless the author has diligently detailed their entire process a good review often takes nearly as long. And unsurprisingly enough working with an AI effectively requires that detail in a format the AI can understand (which often takes longer than just doing it).
> Is typing speed a bottle neck for people?
Yes, if it isn't your being overpaid in the view of a lot of people. Step out of the way and let an expert use the keyboard.
How can you not read and understand code but spend time writing it? That's bad code in that situation.
Source: try working with assembly and binary objects only which really do require working out what's going on. Code is meant to be human readable remember...
They are using them, just in a curated and deliberate way.
I use it similar to the parent poster when I am working with an unfamiliar API, in that I will ask for simple examples of functionality that I can easily verify are correct and then build upon them quickly.
Also, let me know when your calculator regularly hallucinates. I find it exhausting to have an LLM dump out a "finished" implementation and have to spend more time reviewing it than it would take to complete it myself from scratch.
Writing code is the only part of my job that I like. Im not exhausted by coding because I love coding. Im exhausted by pointless meetings, talking to clients, and trying to bring non technical people up to speed with what im working on. Allowing a machine to write code for me and then manually editing it sounds like a really really miserable way to spend the precious little time I have on this earth
> 10x productivity means ten times the outcomes, not ten times the lines of code. This means what you used to ship in a quarter you now ship in a week and a half.
Not really? That's defining productivity as latency, but it's at least as valid to define productivity as throughput.
And then all the examples that are just about time spent waiting become irrelevant. When blocked waiting on something external, you just work on other things.
I mean throughput, not latency. As in if you ship 10 meaningful changes in a month before you now ship 100.
My point around waiting for things like code review is that it creates a natural time floor, the context switching takes time and slows down other work. If you have 10x as much stuff to get reviewed, all the time loss to context switching is multiplied by 10x.
Words of wisdom:
There is no secret herbal medicine that prevents all disease sitting out in the
open if you just follow the right Facebook groups. There is no AI coding
revolution available if you just start vibing. You are not missing anything.
Trust yourself. You are enough.
Oh, and don't scroll LinkedIn. Or Twitter. Ever.
Or shit you know what… Hacker News for that matter…
hopefully we all remember Amdahl's law and reflect on how much time a software engineer actually spends on the "typing code" part of the job of "delivering software that solves some business need".
Those who are not aware of the mythical man month's silver bullet are condemned to rewrite it.
The amount of product ideation, story point negotiation, bugfixing, code
review, waiting for deployments, testing, and QA in that go into what was
traditionally 3 months of work is now getting done in 7 work days? For that
to happen each and every one of these bottlenecks has to also seen have 10x
productivity gains.
AI is making 10x developers 10x more productive and is making 0.1x devs 0.1x times as productive.
When I use Claude Code on my personal projects, it's like it can read my mind. As if my project is coding itself. It's very succinct and consistent. I just write my prompt and then I'm just tapping the enter key; yes, yes, yes, yes.
I also used Claude Code on someone else's code and it was not the same experience. It kept trying to implement dirty hacks to fix stuff but couldn't get very far with that approach. I had to keep reminding it "Please address the root cause" or "No hacks" or "Please take a step back and think harder about this problem." There was a lot of back-and-forth where I had to ask it to undo stuff and I had to step in and manually make certain changes.
I think part of the issue is that LLMs are better at adding complexity than at removing it. When I was working on the bad codebase, the times I had to manually intervene, the solution usually involved deleting some code or CSS. Sometimes the solution was really simple and just a matter of deleting a couple of lines of CSS but it couldn't figure it out no matter how I wrote the prompt or even if I hinted at the solution; it kept trying to solve problems by adding more code on top.
It's making everyone faster.
That means that good developers are more productive, and bad developers create more work for everyone else at an very rapid pace.
>Oh, and don't scroll LinkedIn. Or Twitter. Ever.
This is all you have to takeaway from this article. Social media is a cesspool of engagement farmers dropping BS takes to get you to engage out of FOMO or anger. Every time I'm on there, I am instantly reminded why I quit going there. It's not genuine and it's designed to capture your attention away from more important things.
I've been using LLMs on my own for the past few years and we just recently started our own first party model that we can now use for work. I'm starting to get into agentic actions where I can integrate with confluence, github, jira, etc. It's a learning curve for sure but I can see where it will lead to some productivity gains but the road blocks are still real, especially when working with other teams. Whether you're waiting for feedback or a ticket to be worked on, the LLM might speed run you to a solution but you better be ready with the next thing and the next thing while you're waiting.