The main thrust of the article is that codebases can grow too large to be manageable by LLMs.
> It simply will not fit the context window, and README files are of limited use.
I think many useful applications can be built without reaching current context window limits, which will certainly grow. Besides, there are many tricks that Claude Code and Codex use for getting around this problem, such as compacting and sharding a task across many agents.
If you want proof that there's a serious issue with vibe-coding over the long-term, all you need to do is be a Claude Code user and see how for every release they make they either create 5 new bugs, or re-introduce 5 they've already patched 15 times over the last year.
The creators of Claude can't even vibe-code well. Claude Code is one of the sloppiest, least stable tools I've ever used. Anthropic has already proudly boasted about Claude Code is entirely vibe-coded and vibe-maintained. It's not a flex. It's a signal not to trust it.
If they want to stop feature development and focus on stability, they can.
But given how few people are working on Claude Code and how many features it keeps on adding (https://news.ycombinator.com/item?id=47495527), I think Claude Code is doing fine.
Any specific example? Because in regular use, it's hassle free.
As a user of Claude Code...
I wouldn't use it to write anything mission critical like calculating the trajectory of a NASA spacecraft or something.
But there's tons of software out there that's equal or lower quality to Claude Code itself, or to what you can make with Claude Code, that is useful and serving a purpose. "Quality" is always relative to what is needed, what the market will bear, etc.
I think the existing comments already cover it most, also, I would argue that we are seeing a new emerging group of coders come into the realm of programming and we are judging them at their worst and comparing them to our best. It is quite insane to me to expect someone who just started to fully build google.com and all of it's infra,security,etc.
You're right that there are a new group of coders that are coming in, which is opening its own new can of worms. However, even experienced coders are still producing slop. The difference between slop and quality seems to be how much you baby the LLM, carefully pay attention to its outputs and its behavior, and stringently test everything produced. The more auto-pilot, the worse the result. The larger the code-base, the worse the result. LLMs are death by a thousand cuts unless you take the effort to manually comb through and remove the tech debt at checkpoints.
My mental model of "Claude is a 19 year old intern who never gets tired but is very overconfident" never fails.
Would you hand off some of your well defined tasks to your diligent 19 year old intern? Sure! Would you check their work? Of course!
Would you hand off all of a major tech company to be entirely built by interns? Of course not!
Kurt von Hammerstein-Equord:
I distinguish four types. There are clever, hardworking, stupid, and lazy officers. Usually two characteristics are combined. Some are clever and hardworking; their place is the General Staff. The next ones are stupid and lazy; they make up 90 percent of every army and are suited to routine duties. Anyone who is both clever and lazy is qualified for the highest leadership duties, because he possesses the mental clarity and strength of nerve necessary for difficult decisions. One must beware of anyone who is both stupid and hardworking; he must not be entrusted with any responsibility because he will always only cause damage.
Stupid but industrious and prolific. Also very confident.
> I would argue that we are seeing a new emerging group of coders come into the realm of programming and we are judging them at their worst and comparing them to our best.
Maybe, but the world seems to be inviting this comparison by acting as though they are going to disrupt and replace the established experienced coders
The judgement and pushback is pretty warranted
It's a little more nuanced than this. Claude can't actually replace an experienced coder, but in two steps:
1. Claude makes every experienced coder more productive, 2. The industry decides to hire fewer experienced coders to get the same level of productivity,
We have now accomplished putting some large percentage of experienced coders out of work without actually replicating what they do.
It is, however, making me a bit crazy that the industry's response to (presumed!) increased productivity has been to cut costs rather than invest more broadly and deeply in software.
I have long questioned the mantra that software needs more investments or even produce positive returns when it gets it. Just look at what many of these big companies with untold resources and investments have actually produced in recent years... Maybe cost optimisation and freeing up capital for something else is the correct move.
I like to think of the Facebook iOS app which had 18,000 classes in it. It was so large that it couldn't be loaded into Xcode. Imagine if those programmers had Claude back then, they could have produced ten times as much code.
> It is, however, making me a bit crazy that the industry's response to (presumed!) increased productivity has been to cut costs rather than invest more broadly and deeply in software
It's almost like they don't actually believe (or care if) it is increasing productivity and are just using it as an excuse to cut costs
Still doesn't make sense to me then, even ignoring AI -- why are they cutting costs while making record revenues and profits?
Because they're seeing the writing on the wall - the economic is going to shit.
Their "record revenues and profits" just come from squeezing their customers, already at breaking point, to the max, to just move the needle in the stock market. They know this is not sustainable at all.
I would argue that we are seeing a new emerging group of coders come into the realm of programming and we are judging them at their worst and comparing them to our best
Nyes. I think what we're doing is that these new guys are coming in and using AI and trying to tell us how super awesome and powerful they are because of AI and that nothing could ever go wrong. It is quite insane to me to expect someone who just started to fully build google.com and all of it's infra,security,etc.
But it's not us expecting them to do it. It's them telling us they can do it coz they have AI.Look, I've been using Claude and Codex agents for about 6 months now, full time for coding (when I code) essentially (coz I can't ask my people to use a tool I have no experience with myself, so I purposefully forced myself to use the agent and only the agents as much as I could bear, only resolving to manual changes in very very few instances. And there have been many many frustrations, believe me).
The amount of times that even Seniors have just verbatim pasted Claude analyses as truth to me, when it was apparent after the first read through of the output, that it wasn't true is amazing. How we expect juniors that have way less developed "spidey senses" to successfully navigate that is beyond me. Most people are trusting by default. They shouldn't be, but it's human nature for most of us. For some it isn't, like myself. I'm already the dude that asks too many questions of humans when they're not clear on what they assumed vs. have verified.
Like, example, I showed an analysis, full page in a slack thread recently to one of my Seniors (made by some other Senior) and tell me where they think it shows that it's BS and not true. He couldn't do it. He tried over and over and he was unable to. I read it and the second paragraph out of lots of them was BS and just not true. Easy to verify. Claude didn't have access to the actual information (because of various circumstances) but just made something up. Said the relevant code was deployed, thus XYZ was true. Listed lots of extra analysis after that, which sounded reasonable and probably was, if the premise was correct. Just it wasn't. The code had never been released at that point.
I've been doing the same kind of "spidey senses are tingling" comments and questions back to people for lots and lots of years. And others are usually not good with that sort of thing (exceptions prove the rule). Coz people do the same kind of "BS-ing" that Claude et. al. do. Claude is generally "better" about questioning his/her (yes, it works both ways) judgement actually than people, which in many cases have feelings attached to their investigations (even if they very blatantly didn't check something and just assumed it - pre AI - all by themselves).
$EMPLOYER has decided longevity matters so little that we no longer "do reliability"; ship it, boys.
Sounds like zenthewayisthegoalcoding
Wake up babe! New system prompt just dropped!
time will tell. you can set reasonable constraints and review the code. unless you are disqualifying that as vibecoding.
I think definitionally "vibe coding" means you feel out of control, in fact I would say Karpathy is deliberately trying to bring these feelings out in people.
If you are using an AI assistant with your feet on the ground, like as a coding buddy that you pair with, you're not "vibe coding"
[dead]
> If you keep vibe-adding features, and somehow keep getting customers to pay for this thing, what happens once the codebase becomes so complex that an LLM cannot fit it inside its “brain”?
you realize this point is well, well beyond what a human can "fit" in their brain as well? you start making shorthands and assumptions about your systems once they get too large.
One of the main weaknesses with current AI is they don't know how to modularize unless you explicitly say it in their prompt, or they will modularize but "forget" they included a feature in file B, so they redundantly type it in file A, causing features to break further down the line.
Modularizing code is important and a lot of devs will learn this, I once had 2k-line files at the beginning of my career (this was before AI) and I now usually keep files between 100 and 500 lines (but not just because of AI).
While I rarely use AI on my code, if I want to type my program into a local LLM that only has between 8-32k context (depends on the LLM), I need to keep it small to allow space for my prompt and other things.
Even as a human it's much easier to edit the code when it's modular. I used to like everything in one file but not anymore, since with a modular codebase you can import a function into 2 different files, so changing it in one place will change it everywhere.
TLDR: Modularizing your code makes it easier for both you (as a human) and an AI assistant to review your codebase, and reduces the risk of redundant development, which AI frequently does unknowingly.
There needs to be a better harness than what we have. It feels like we're in the stone age with Claude Code etc.
Having control over the harness locally, and combining it with local inference and analysis, seems to be the way forward.
The modularization and maintaining the abstraction are the main things that result in slop. That also requires deterministic memory though.