• mortsnort 31 minutes ago

    It says they can be "fine tuned," but it looks like the agents are all using the same model with different system prompts? This would be more intriguing if they trained a debugger model from the ground up that could be used for the debugger agent. I suspect we'll get there eventually.

    • pjm331 11 hours ago

      I've made a few attempts at manually doing this w/ mcp and took a brief look at "claude swarm" https://github.com/parruda/claude-swarm - but in the short time I spent on it I wasn't having much success - admittedly I probably went a little too far into the "build an entire org chart of agents" territory

      the main problem I have is that the agents just aren't used

      For example, I set up a code reviewer agent today and then asked claude to review code, and it went off and did it by itself without using the agent

      in one of anthropic's own examples they are specifically telling claude which agents to use which is exactly what I don't want to have to do:

      > First use the code-analyzer sub agent to find performance issues, then use the optimizer sub agent to fix them

      My working theory is that while Claude has been extensively trained on tool use and is often eager to use whatever tools are available, agents are just different enough that they don't quite fit - maybe asking another agent to do something "feels" very close to asking the user to do something, which is counter to their training

      but maybe I just haven't spent enough time trying it out and tweaking the descriptions

      • conception 11 hours ago

        Roo code does this really well with their orchestration mode, there’s probably a way to have a claude.md to do this as well. The only issue with roo is it’s “single threaded” but you do get the specific loaded context and rules for a specific task which is really nice.

        • oc1 8 hours ago

          the same problem with mcp. as well as claude md. most of the time they aren't used when it would be appropriate. what's the point of this agents and standards when you can't make them reliably being used by your model..

        • bomewish 17 hours ago

          Has CC become much stupider in recent weeks, or is it me? Any anecdata out there?

          • _--__--__ 15 hours ago

            People speculate somewhat seriously that Claude (especially given its French name) picked up at some point that you aren't supposed to work as hard in July and August.

            • sunaookami 8 hours ago

              That one guy on Twitter that posted this wrote it as a joke and everyone took it seriously. It's not true. It works the same for me.

              • oc1 8 hours ago

                How do you know? It acts much lazier in the recent summer months for me..

                • stavros 6 hours ago

                  How have you disproved the hypothesis that it recently got dumber and it just happens to be summer?

                  • AbstractH24 5 hours ago

                    Clearly, it compared performance to last summer

                    (Just to be clear, I have no idea what on this thread to take seriously and not and who is. I'm joking at least.)

                    • stavros 5 hours ago

                      That won't do it, though, you'd have to observe it being dumber on June 1 and smart again on September 1 for years.

              • madrox 14 hours ago

                How long before we hire psychiatrists instead of engineers to debug AI

                • OrsonSmelles 13 hours ago

                  Well, we could start with some ELIZA instances.

                  • lubujackson 11 hours ago

                    I see that you feel we could start with some ELIZA instances. Can you tell me more about that?

                  • taneq 5 hours ago

                    Robopsychologists, you say?

                    • nialse 9 hours ago

                      To be frank psychiatrists, being MDs, would likely prescribe medication and I’m not sure how that would help. As a licensed psychologist I have ideas on how to debug AI though.

                      • AbstractH24 5 hours ago

                        Why, we'll just have specialized agents for ingesting Prozac and that'll magically solve everything.

                  • nico 16 hours ago

                    I don’t know about stupider, but definitely less reliable/available

                    A couple days ago I was getting so many api errors/timeouts I decided to upgrade from the $20 to the $100 plan (as I was also regularly hitting rate limits as well)

                    It seemed to fix the issue immediately. But today, the errors came back for about half an hour

                    • SOLAR_FIELDS 15 hours ago

                      It goes down usually around 1400-1500 UTC. Europeans are still awake and once the west coast joins in the fray Anthropic falls over.

                      Pretty rare to get a 529 outside of that time window in my personal experience, at least during the USA day.

                      • data-ottawa 11 hours ago

                        Their status page for the week is rough. They’re down to 98% uptime.

                        Hopefully they work out whatever issue is going on.

                        https://status.anthropic.com/

                      • illusive4080 17 hours ago

                        Not for me. It gets worse when context is nearly full. I like to compact or clear context more often than it does automatically.

                        • nico 16 hours ago

                          Do you do this via settings or just keep track of it and manually ask it to do it more often?

                          • furyofantares an hour ago

                            (Not the person you're responding to, but) It says how close it is to compacting in bottom right, once it's getting close at least (30% left or something?)

                            Whenever I see that I think about whether I can find a good point to compact or clear. I also just try to clear whenever it makes sense to avoid getting there and try to give smaller tasks that can be cleared after they're done when possible.

                            Oh, I guess one thing I do is sometimes have it write a file with what was done, if I'm not actually sure if I want to clear or might want to come back to it. I also sometimes do this rather than compact during a large task - document status and clear.

                        • audinobs 4 hours ago

                          I think it is like with a gambling game that you get on hot and cold streaks, runs based on chance.

                          The model feels like it has got stupid when you get on a cold streak after a hot hand.

                          • laborcontract 13 hours ago

                            Insert something to the tune of: “never read files in slices. Instead, whenever accessing a file, you must read a file in entirety[..]” at the beginning of every conversation or whenever you’re down to burn more credits/get better results.

                            A great deal of claude stupidity is due to context engineering, specifically due to the fact that it tries its hardest to pick out just the slice of code it needs to fulfill the task.

                            A lot of the annoying “you’re absolute right!” come from CC incrementally discovering that you have more than 10 lines of code in that file that pertains to your task.

                            I don’t believe conspiracies about dumbed down models. Its all context pruning.

                            • oc1 8 hours ago

                              so claude code does the same shit like cursor?

                            • slantaclaus 13 hours ago

                              I feel like it’s gotten better recently

                            • Garlef 10 hours ago

                              One nice realization I had when using a similar feature in roo:

                              You don't need a full agent library to write LLM workflows.

                              Rather: A general purpose agent with a custom addition to the system prompt can be instructed to call other such agents.

                              (Of course explicitly mamaging everything is the better choice depending on your business case. But i think it would be always cheaper to at least build a prototype using this method.)

                              • lvl155 6 hours ago

                                Here my main problem with sub-agents WITHIN Claude Code. They don’t allow you to use other models. Let’s be honest it’s 99% Sonnet.

                                • furyofantares 21 minutes ago

                                  I haven't used them yet but it says they can use MCPs. The only MCP server I use is zen-mcp-server for routing stuff to o3 and gemini.

                                  • stillsut 3 hours ago

                                    Great point, I've found Sonnet really can't be beat on many tasks, but increasingly finding Gemini-Pro and o3 handle the tough bugs and refactors best.

                                    That's why I've been using agro to launch agents from each of the main LLM vendors and checking their results when I'm stuck: https://github.com/sutt/agro/blob/master/docs/index.md

                                  • Dlanv 13 hours ago

                                    I wonder if this is also a good way to create experts for specific tasks/features of a codebase.

                                    For example, a sub-agent for adding a new stat to an RPG. It could know how to integrate with various systems like items, character stats component, metrics, and so on without having to do as much research into the codebase patterns.

                                    • T0Bi 19 hours ago

                                      So everything claude-flow¹ already does but worse (I guess?).

                                      ¹ https://github.com/ruvnet/claude-flow

                                      • jampa 13 hours ago

                                        > IMPORTANT: Claude Code must be installed first:

                                        > [...]

                                        > # 2. Activate Claude Code with permissions

                                        > claude --dangerously-skip-permissions

                                        Bypassing all permissions and connecting with MCPs, can't wait for "Claude flow deleted all my files and leaked my CI credentials" blog post

                                        • T0Bi 8 hours ago

                                          There are already several of such blog posts.

                                          I use the .devcontainer¹ from the claude-code repository. It works great with VSC and let's you work in your docker container without any issues. And as long as you use some sort of version control (git) you cannot really lose anything.

                                          ¹ https://github.com/anthropics/claude-code/tree/main/.devcont...

                                          • data-ottawa 11 hours ago

                                            I would like a simple tool to run Claude in a container with only read/write access to provided folders.

                                            I’ve set it up bespoke but the auth flow gets broken.

                                            • SOLAR_FIELDS 43 minutes ago

                                              Claudebox is what I was playing with. You need to mount the oauth access token in as an env. It’s not some crazy vibe coded framework, just around 1k lines of shell helpers to set it up.

                                              • T0Bi 8 hours ago

                                                I use the .devcontainer¹ from the claude-code repository. It works great with VSC and let's you work in your docker container without any issues. And as long as you use some sort of version control (git) you cannot really lose anything.

                                                ¹ https://github.com/anthropics/claude-code/tree/main/.devcont...

                                                • oarsinsync 10 hours ago

                                                  Have you considered asking Claude code to write this for you?

                                              • SOLAR_FIELDS 15 hours ago

                                                That guy doesn't even understand how his own software works. Is anyone actually using this thing and putting their code into production?

                                                • lubujackson 11 hours ago

                                                  It's extreme dogfooding where he is making a mashed potato volcano where Claude agents are the potatoes and your sanity is the gravy.

                                                  • AbstractH24 5 hours ago

                                                    Not only are people using them, they are building startups based on them. And then selling said startups.

                                                  • dchuk 17 hours ago

                                                    I’ll admit this looks comprehensive, but man oh man does this seem complicated and over doing it

                                                    • nazgul17 17 hours ago

                                                      Except it's not in alpha phase

                                                      • dazzaji 10 hours ago

                                                        Ruv (of Claude Flow) seems to like the new Claude Agents a lot, and already is leveraging them in Claude Flow. He waxes positively on the topic here: https://www.linkedin.com/posts/reuvencohen_spent-the-afterno...

                                                        • himeexcelanta 15 hours ago

                                                          This looks like a yarn ball (in not a good way)

                                                          • lvl155 6 hours ago

                                                            What did you make me read. Right off the bat, it says v2 alpha.

                                                            Bro…