The space is moving so fast that, if I wrote down my workflows and workarounds just two months ago, so much of it would be stale today. I think all these recommendations need to list the models and harnesses being described front and center.
> The space is moving so fast that, if I wrote down my workflows and workarounds just two months ago, so much of it would be stale today.
There is also the problem that none of these workflows were validated or verified. Everyone is free to go on social media or personal blogs and advertise their snake oil. Thus in a scenario where these workflows are found to be lacking, the perceived staleness might actually be ineffectiveness beyond self promotion.
Yeah. I have Claude 4 correcting code that Claude 3.5 wrote a few months ago.
I haven’t tried Claude 4. Maybe I’d give it a spin to see if it can improve my design document
hehehe - same :)
Very important comment. My workflow changed dramatically with the increased capabilities of these tools.
I think none of these offer much useful insight beyond the overarching idea of peer programming beating just vibe coding.
The best structure I've found which leverages this idea is called BMAD, and treats the LLM as though it were a whole development team in an orchestrated way that you have full control over.
https://youtu.be/E_QJ8j74U_0 https://github.com/bmadcode/BMAD-METHOD
Looks like an elevated vibe coding method for UI development. Does this work for non-web/UI development?
You're always limited by the knowledge gaps of the underlying LLM. The method is otherwise the most coherent way to work to the strengths of the LLM through disciplined context and role management. Nothing about it is UI focused, and leans more on general agile team structures than anything else.
is it really more efficient to have an LLM generate code, then review that code, fix errors and spend some time to fully understand it? I wish there were tangible stats and metrics around this. Is it really more efficient than just writing the code yourself, but using LLMs to look up things or demo solutions?
Lately I look for opportunities to have the LLM do some easy small work while I go work on some harder task in parallel.
I've also tried asking the LLM to come up with a proposed solution while I work on my own implementation at the same time.
LLMs can also be much faster if a task requires some repetitive work. When I recognize a task like that, I try coding the first version and then ask the LLM to follow my pattern for the other areas where I need to repeat the work.
How did we end up here, just accepting that some coding tasks require repetitive work, and turning to a probabilistic text synthesizer that requires massive training data to automate that? We're having this discussion on a site whose founder famously wrote, over 20 years ago, that succinctness is power, and even wrote this site in a programming language that he designed to take that principle as far as he could. Now, two decades later, why have we so completely retreated from that dream?
I bear some responsibility for this, since I was one of the people who basically said, in the 2010s, that we should just give up and use popular languages like JavaScript because they're popular. I regret that now.
We haven't retreated from that dream. We're doing both things in parallel. But I think it will always be the case that some things are repetitive, even as we continuously expand the frontier of eliminating those things. It's good to have tools that help automate repetitive tasks, and it's also good to create more powerful abstractions. There is no contradiction.
Some things are better with more code. Sets of data in particular (strongly typed). Sometimes you need to modify those sets of data and it doesn’t require enough work to write a whole script for, but you still don’t really want to spend the time manually modifying everything. LLM’s are really nice in those instances.
Plain data in all glory but I’d rather (deterministically) generate and modify it than having big blobs checked in. If I have too much copy pasted data I often forget to modify until the test case runs and I realize it. This creates a false sense of confidence since there could be tests that pass and nobody ever checks they’re wrong. Essentially, minimize the number of human steps to verify it looks right which typically means optimizing for the least amount of code.
Flashback to when I committed a suite of tests in Python that were indented one tab too much, resulting in them not running at all. This passed code review (on a FAANG company) and was discovered months later from an unrelated bug. The point is even unit tests have a very human element to them.
> I some responsibility for this, since I was one of the people who basically said, in the 2010s, that we should just give up and use popular languages like JavaScript because they're popular. I regret that now.
In 2010s the move was towards more concise languages and programming techniques in e.g. JavaScript scene too. CoffeeScript is a prime example of this.
But then came the enterprise software people pushing their Javaisms and now we have verbose bondage and ceremony mess like TypeScript and ES6 modules.
And in a tragicomic turn after we made expressing programmer intent formally so difficult, we are turning into writing bad novels for LLMs in a crapshoot trial and error hoping it writes the pointless boilerplate correctly.
Agree with each of these points so much!
That’s why I really like copilot agent and codex right now.
Even more parallel stuff and from my phone when I’m just thinking of ideas.
> is it really more efficient to have an LLM generate code, then review that code, fix errors and spend some time to fully understand it?
The answer is always "it depends". There are some drudge work tasks that are brilliantly done by LLMs, such as generating unit tests or documentation. Often the first attempt is not great, but iterating over them is so fast that you can regenerate everything from scratch a dozen times before you spend as much time as you would do if you wrote them yourself.
It also depends on what scope you're working on. Small iterations have better results than grand redesigns.
Context is also critical. if your codebase is neatly organized with squeaky clean code then LLMs generate better recommendations. If your codebase is a mess with inconsistent styles in spaghetti code then your prompts tend to generate more of the same.
Depends on the code, but often yes. The less you care about that specific result, the more efficient it is. One-off tools under 2k lines where you can easily verify the result? Why would I not generate that and save time for more interesting stuff?
One-off tools seem to be more ops then dev, to me.
I haven't heard the "scripting is not programming" or similar take since newsgroups. It's really time to let it die.
One of my pet hypotheses is that the top X% of Excel users vastly outperform the same bottom % of programmers in getting to usable results.
Totally agree! I know a guy whose main job is basically to be an Excel expert (nominally he works in logistics), and seeing some of his work convinced me that he's actually a programmer who uses excel as an IDE - akin to visual programming.
We should adopt the term “tabular programmers” or “tabular engineers”!
That’s not what I wrote. It’s just that in my dev work I only rarely have the need for one-off tools.
That used to be the case with me too before LLMs.
It was because writing one off tools took time and you needed it to do more for it to be worth the time.
Now a lot more are getting written because it takes a lot less effort. :-)
It just depends on the environment. Some areas get to experiment with different approaches more than others. On one extreme if I was writing yet another crud app, it's unlikely is need any such tools. On the other, when dealing with data processing/visualisation/ml, it's small experimental tools all over the place.
Do you not debug, optimize, analyze ...? Copilot is especially valuable for throwaway (or nearly throwaway) log parsing/analysis, for example.
I do, but I have all the software I need for that. I may use LLMs in that context, but for the actual analysis, not for generating one-off tools.
More like it's time to bring it back...
I’ve used Gemini Pro 2.5 to generate code syntax rewrite tools from me. I asked it to use a compiler SDK that I know from experience is fiddly and frustrating to use. It gave me a working tool in about 10 minutes, while I was actively listening to a meeting.
Not all scripts are “ops”!
What if every piece of software any consumer needed could be described this way? Outside of system code this could be everything we ever need. This world is nearly upon us and it is super exciting.
(Disclaimer: I built and sell a product around that workflow)
It often is, if you pick the right tasks (and more tasks fall into that bucket every few weeks).
You can get a simple but fully-working app out of a single prompt, though quality varies widely unless you’re very specific.
Once you have a codebase, agent output quality comes down to architecture and tests.
If you have a scalable architecture with well-separated concerns, a solid integration test harness with examples, and good documentation (features, stack, procedures, design constraints), then getting the exact change you want is a matter of how well you can articulate what you want.
One more asterisk, the development environment has to support the agent: like a human, agents work well with compiler feedback, and better with testing tools and documentation/internet access (yes my agents have these).
I use CheepCode to work on itself, but I am still building up the test library and preview environments to de-risk merging non-trivial PRs that I haven’t pulled down and run locally. I also use it to work on other apps that I'm building, and since those are far more self-contained / easier to test, I get much better results there.
If you want to put less effort into describing what you want, have a chat with an AI to generate tickets. Then paste those tickets into Linear and let CheepCode agents rip through them. I’ve got tooling in the works that will make that much easier, but I can only be in so many places at once as a bootstrapped founder :-)
If you can identify blocks of code you need to write that are easy to define reasonably well, easy to review/verify that it is written correctly but still burdensome to actually write, LLMs are your new best friend. I don't know about how other people think/write, but I seem to have a lot of that kind of stuff on my table. The difficult part to outsource to LLMs is how to connect these easy blocks, but luckily thats the part I find fun in coding, not so much writing the boring simple stuff.
LLMs tend to type a little faster than humans... so for straightforward code, yes?
Depends how good you are at reading and reviewing code. If you're already good at that LLMs are a huge productivity boost.
Yes, but the bar for skepticism is higher than that, because LLMs also compile code and catch errors, and generate and run tests; compile errors and assertion failures are just more prompts to an LLM agent.
When used that way they also regularly get into loops where they lose track of what they were supposed to do.
The last time I set Cursor on something without watching it very very closely it spun for a while fixing tests and when it finally stopped and I looked what it had done it had coded special cases in to pass the specific failing tests in a way that didn't generalize at all to the actual problem. Another recent time I had to pull the plug on it installing a bunch of brand new dependencies that it decided would somehow fix the failing tests. It had some kind of complete rewrite planned.
Claude Code is even worse when it gets into this mode because it'll do something totally absurd like that and then at the end you have to `git reset` and you're also on the hook for the $5 of tokens that it managed to spend in 5 minutes.
I still find them useful, but it takes a lot of practice to figure out when they'll be useful and when they'll be a total waste of time.
It happens to me every once in awhile, but I'm not sure why I would care. I usually set it off on some question and go tab away to something else while it flails. When I come back, I have a better-than-average shot at a workable solution, which is a science fiction result.
When I first began programming as a teenager, one of the mental hurdles I had to get over was asking the computer to "too much"; like, I would feel bad writing a nested loop --- that can't possibly be the right answer! What a chore for the computer! It didn't take me too long to figure out that was the whole point of computers. To me, it's the same thing with LLMs spinning on something. Who gives a shit? It's not me wasting that time.
It can be a science fiction result and still not actually result in time saved for the human operator on the whole. For me the jury is definitely still out on whether it results in net time saved and it's not for lack of trying.
Whether it ends up getting good enough in the near future that it does become a net positive both isn't the question being discussed and still remains to be seen.
I’m now at the point where I tell it to do something _and then drive to Wisconsin_ giving feedback at the Marz cheese castle and continuing my trip.
I feel like you might know where I'm coming from being confused at people's reaction to this stuff. This is science fiction. I think it's just not sinking in with people. If I could bet on this, I would bet everything I could on "skill with LLMs" being the high order bit of being an effective software developer 5 years from now.
I don't deny that, under the right circumstances, these tools can produce results that feel indistinguishable from magic, or like science fiction as you put it. But I don't think it's worth the costs. To me, the two most concerning costs are the unreliability, and the massive amounts of stolen training data and underpaid labor (the RLHF process) required for these models. I'm not comfortable relying on a tool built on such foundations.
My bet, and I realize this might just be wishful thinking, is that the high order bit for being an effective software developer in the near future will be skill at using more reliable and non-exploitative automation tools, such as programming languages with powerful macro systems and other high-level abstractions, to stay competitive with developers who sling LLM-generated code. So I'd better get started developing that skill myself.
3
To be fair, TDD has three steps: Red, Green, Refactor. Sounds like you got to Green. /s
Citation needed. Here is copilot “fixing” a failing test: https://github.com/dotnet/runtime/commit/fe173fc8f44dbd0a9ed...
It rewrote some comments, changed the test name and added extra assertions to the test. Baby sitting something like that seems like an absolute waste of time.
You want a citation for things so many people are doing daily with LLMs?
Just because they can't fix most failures doesn't mean they can't fix many.
It’s a failure that it created and when told to fix it did this. It’s beyond bad.
> It's a failure that it created and when told to fix it did this. It’s beyond bad.
No one's disputing this was bad. People are merely claiming it can also be good. I've dealt with plenty of humans this bad - it's not an argument that humans can't program.
It seems like the underlying issue is trust.competent programmer - even a junior - and trust them to finish the task correctly. It might take multiple tries, and they may ask for clarification, but since they’re human, we trust they are intelligent.
There are some people who fall into the bucket that we can’t trust them to finish the task correctly, or within a time frame or level of effort on our part to make the task offloading exercise have a positive benefit.
If we view LLMs in the same light, IMO currently they fall into “not trust” category to really give they a task and trust them to finish it correctly, with us being confident we don’t really need to understand their implementation.
If one day LLMs or some other solution reaches that point, then it definitely won’t look like a bubble, but a real revolution.
Very well put. The trick is to do either of the following:
1. Find simpler tasks for which the trust in LLMs is high.
2. Give tasks to the LLMs that have a very low cost to verify (even when the task is not simple) - particularly one off scripts.
I once had a colleague who was in the "not trust" bucket for the work we were doing. So we found something he was good at that was a pain for me to do, and re-assigned him to do those things and take that burden off of us.
In the last few months I've had the LLM solve (simple) problems via code that had been in my head for years. At any point I could have done them, but they were a chore. If the LLM failed for one of these tasks - it's not a big deal - not much time was lost. But they tend to succeed fairly often, because they are simple tasks.
I almost never let the LLM write production code, because of the extra burden that you and others allude to. But I do let it write code I rely on in my personal life, because frankly I tend to write pretty poor code for my personal use - I can't justify the time it would take to write things well - life is too busy. I welcome the code quality I get from Sonnet or Gemini 2.5 Pro.
That's my point in this thread. Writing code is a pretty diverse discipline, and many are dismissing it simply because it doesn't do one particular use case (high quality production code) well.
I didn't take LLM coding seriously until I found well respected, well known SW engineers speak positively about them. Then I tried it and ... oh wow. People dismissing them is dismissing not only a lot of average developers' reality, but also a lot of experts' daily reality.
Just look at the other submission:
https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...
He used an LLM to find a security vulnerability in the kernel. To quote him:
> Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.
I use the llm as a glorified search engine. Instead of googling I ask it stuff. Its fine for that but its a hit or miss. Often the output is garbage and its better to just use google.
I dont use it much to generate code, I ask it higher level questions more often. Like when I need a math formula.
My most common use is similar: when I'm working on problems in a somewhat unfamiliar domain finding out what their "terms of art" are. The chances that I've just come up with a completely new concept are pretty low so it's just a matter of helping me formulate my questions in the language of the existing body of knowledge.
You should go a step further and integrate search (tavili, searxng, etc) into your flow. You'll get better results, and you can refine sources and gradually build a scored list of trusted sources.
I've been experimenting with LLMs for coding for the past year - some wins, plenty of frustrations. Instead of writing another "AI will change everything" post, I collected practical insights from other senior engineers who've figured out what actually works. No hype, just real experiences from people in the trenches.
I would have said that Harper Reed's workflow (brainstorm spec, then co-plan a plan, then execute using LLM codegen) is basically best practice today and I'm surprised that the author adds that "I’ve not been successful using this technique to build a complete feature or prototype."
Here's an example of using this pattern with Brokk to solve a real world bug: https://www.youtube.com/watch?v=t_7MqowT638
This is showing the workflow of your tool quite well, but would be way more convincing & impressive if you had actually fixed the bug and linked to the merged PR.
> Peer Programming with LLMs, For Senior+ Engineers
> [...] a collection of blog posts written by other senior or staff+ engineers exploring the use of LLM in their work
It seems to be by senior engineers if anything, I don't see anything in the linked articles indicating they're for senior engineers, seems programmers of all seniority could find them useful, if they find LLMs useful.
Yes, although those who are not senior engineers will not preemptively see the value in the documented approaches. One has to be a senior to preemptively appreciate the value in them.
I write a lot of “defensive” C# code in my day job expecting that someone very inexperienced / offshore will be working with it in the future (and I will be reviewing it four months later when no longer on the project). I call it “corporate coding”. Lots of interfaces that must be adhered to, ioc, injection and annoyingly strong patterns. Anything that makes going off the rails a lot of work — the path of most resistance — glaring in code reviews. But…key logic concentrated in a few taller files so none of the drilling through abstraction so easy to comprehend for a newbie. I want to take some time with a defensive coding approach and LLMs. Particularly scoping it to a certain project or folder in a layered architecture. Why let it know of the front end, back end, database all at once? Of course it’ll get discombobulated.
I’ve also been experimenting with giving an LLM coins and a budget. “You have 10 coins to spend doing x, you earn coins if you m,n,o and lose coins if you j,k,l” this has reduced slop and increased succinctness. It will come back, recount what it’s done explaining the economy and spending. I’ve had it ask “All done boss I have 2 left how can i earn some more coins?” It’s fun to spy on the thinking model working through the choices “if I do this it’ll cost me this so maybe I should do this instead in 1 line of code and I’ll earn 3 coins!”
Thanks for sharing pmabanugo, a couple of those posts are new to me too. If you’re taking submissions, I’ve been exploring how to make the most of these tools for the past few months, here’s my latest post:
https://blog.scottlogic.com/2025/05/08/new-tools-new-flow-th...
My main feeling is that its great as long as I constrain it to working in a conceptual boundary that I can reason about, like a single system component where I am telling it the API. That way each piece that gets built up I have an understanding of it. If you try to let it go to wide it starts to make mistakes and I lose my mental model.
Well put. That’s my challenge too - losing the mental model of my entire codebase. Sometimes it feels like the time I saved using an LLM I then give right back when reassembling the mental model.
Though I haven't tried it, I would probably enjoy peer programming with an LLM more than I do with a real person (which I have tried and hated).
I could assign the LLM the simple drudgery that I don't really want to do, such as writing tests, without feeling bad about it.
I could tell the LLM "that's the stupidest fucking thing I've ever seen" whereas I would not say that to a real person.
I really want the LLM to do the opposite. To tell me that’s the stupidest fucking thing it’s ever seen. They’re surprisingly bad at that though.
That’s what the recent Copilot feature on GitHub can do. You assign it tasks and it comes back with a PR. You could also assign it to review a PR.
It seems like we need to use forceful language with these things now. I've had copilot censor everything I asked it. Finally I had to to say, "listen you cracked up piece of shit, help me generate a uuid matcher. "
We’ve blocked your response because it matches public code.
(Site is unreadable for me on Firefox 138, but the text is still there if you select all. Qutebrowser based on Chromium 130 doesn't render it either.)
No problems here, both the normal view and reader mode seem to work well.
What are some of the differences between Peer Programming with LLMs and Vibe Coding?
> What are some of the differences between Peer Programming with LLMs and Vibe Coding?
"Vibe Coding" is specifically using the LLM instead of programming anything, barely caring about the output. If something is wrong, don't even open the file, just ask the LLM. Basically "prompting while blindfolded" I guess you could say.
Peer programming with an LLM would be to use it as another tool in the toolbox. You still own your program and your code. Edit away, let the LLM do some parts that are either too tricky, or too trite to implement, or anything in-between. Prompts usually are more specific, like "Seems X is broken, look into Y and figure out if Z could be the reason".
I think the consensus boils down to: you're vibe coding if you don't understand the code before you merge it.
This is the origin of vibe coding: https://x.com/karpathy/status/1886192184808149383
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. (...) I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. (...)
Pair programming still very much deals with code and decisions.
So, pair programming continues to emphasize software quality (especially with LLMs) but "vibe coding" is more of a "whoo, I'm a reckless magician" (in a less risky application domain) sort of thing?
But doesn't a 'vibe-coding' "we'll just sort out the engineering challenges later" ensure that there will be re-work and thus less overall efficiency?
I would say that the difference is taking an engineering approach to the process itself. Iterating on the context, putting the system into various states, etc. Treating the AI like a very knowledgeable intern who also has a very fixed short term memory and can’t form new long term memories but can be taught to write things down like in Memento. The thing is, though, it has a much much larger short term memory than me.
I want to note that the headlines gave me an idea for a nonprofit: "Peer Programming with LLM's for Seniors."
Somebody jump on that. It's yours. :)
re-reading the title makes me feel like I used a wrong title.
Could be a good idea for a non-profit like you said. I know someone who’s exploring something similar but for disabled folks who aren’t tech-savvy (for-profit)
That's kind of them. I'll pray their effort succeeds.