• Areena_28 3 hours ago

    I know even we hit the same thing building internal security tooling. our model kept formatting output like documentation, not like how we would or any person in place of us would read in a terminal at 2am during an incident.

    I am a bit curious, did you find this behavior consistent across models or is it more pronounced with certain ones?

    • noemit 2 hours ago

      I ran into it while building - I should have tested different temps too - I was just trying to get cli style tool calls to be more reliable

    • shomp 10 hours ago

      Great observation. The brain of a programmer is still a "black box" to the feed-forward network of nodes . But in theory, if you pumped a lot of the live-coding videos from something like youtube into the process, you could get a bit of that "what's your approach"-erism to bleed into the model. There might not be enough material there to truly "train it to think" but it would be interesting to try and "fill the gaps" of black-box-ed-ness in the LLM with supplemental "here was the process that got us there" video feeds. The next natural move might actually be recording thousands of hours of footage of developers working with the LLMs directly like in Cursor or another IDE that has LLM live-pair-programming , maybe calling it "pair programming" is generous , but it might be a reasonable foray into teaching the next generation of LLMs the "thought process" behind things. In reality you'd be teaching it which files to inspect, which windows to open/close, which tools to switch to and focus on. And while it might be imperfect, it might just be enough.

      • acters 3 hours ago

        Instead of telling the LLM that "run"works like a cli, maybe just tell the LLM that "run" will execute sh/bash/zsh/etc scripts?

        • noemit 3 hours ago

          I tried over 20 variations of different system prompts. Once I changed my tool to expect the colon, it also felt like it was running/calling tools faster, but I need to do a larger test to be sure.

        • seertaak 4 hours ago

          Is that really true? I would have expected by now that AI companies nowadays are doing RL on git histories, not just on the HEAD.

          • noemit 2 hours ago

            I also expected this. Please run some experiments and maybe other models are different

          • mpalmer 6 hours ago

            The novice came to the master. "I have figured it out, the rules for how LLMs understand CLIs. It gives the right commands, but adds colons. It was trained on the visual shape of terminals, not keystrokes."

            "Clear the session," the master said. "Run the same prompt again."

            The novice pressed return. The model output: `ls -R /tmp`

            "The colons are gone," the novice said. "But my theory explained them perfectly."

            "You built a cage for a cloud," the master said. "Do not mistake a single roll of the dice for the rulebook."

            • noemit 2 hours ago

              I ran tests of 100 attempts with different prompt/scenario combinations. Each "attempt"/theory had 3 different system prompts wordings. Most of the prompts did not mention a colon, but it kept appearing. When I added negative instructions against using a colon, the quality went down (most of the tool calls were malformed, one common issue was markdown ticks in front) It was only when my system prompt acted like colons were normal that I kept getting 100/100 perfect expected tool calls. I ranked my system prompts by which returned the most consistent commands.

            • Art9681 7 hours ago

              Is "how programmers work" a useful and provable metric? No? Then it belongs in philosophy discussions. How you work and how I work is different. Your work may have ended up in the LLM training and my work did not. Or vice versa.

              Can you objectively analyze how VSCode adapts to your way of working without our interference?

              Did you test your theory with the actual frontier LLMs (which Kimi K2.5 is not BTW?)