« BackShow HN: Cursor for Userscriptsgithub.comSubmitted by mifydev 4 hours ago
  • _false 17 minutes ago

    Love the decision to edit DOM directly. More LLM tools should carefully consider their training environments instead of treating LLMs like AI Gods.

    • rahimnathwani an hour ago

      It would be cool if you could make this work with Gemini Flash, with keys from AI Studio. I imagine that would expand the set of people who would try it out, because they could use 'free' keys and not worry about unexpected bills.

      • mifydev 44 minutes ago

        That's a good point, I'll add support for other models shortly.

      • Akranazon 2 hours ago

        I'm working on a version of this, https://www.quillmonkey.com/ so you got ahead of me. I imagine there are many versions of this coming. Interesting what set of tools you went with.

        • mifydev 2 hours ago

          Oh that's cool! I've just used wxt to pack extension for firefox and chrome and just used typescript and plain anthropic api. My goal is to make this run fully inside the browser, without any helper binaries, like I've seen with others.

          • Akranazon an hour ago

            Your project seems pretty close to where mine was a couple weeks ago, where I was focused on a BYOK solution (user-entered Anthropic API key). I saw there was another similar extension already released in the app store (RobotMonkey) which hooks up to their own backend service, and offers subscriptions. For my project, I think that's the right way to go.

            It's funny what details about our designs are similar through accident. And what other things are completely different. I can show you my design potentially.

            Representing websites in a virtual filesystem is creative and definitely makes it easier for the agent to collect information about the page. But I'm confused between the `Bash` and the `Edit` tools. It seems like one uses the chrome executeScript API, and the other updates the file system. But if it's just doing file writes, are those edits visible in the browser, and persistent across sessions?

            • mifydev an hour ago

              Backend service is definitely way to go if you want to serve models for the user.

              So Bash and Edit tools are a bit weird, Bash tool is essentially JS execution, and Edit tool automatically generates a script that performs the edits on the page. These tools are needed for the model to explore the page, whatever it does at the end it creates a separate script that will be applied on the page load.

              • Akranazon 23 minutes ago

                Oh neat. So the edit tool is like a convenient API/wrapper for it to eg add HTML to some element? I guess theoretically that can also be achieved through Bash as well, but the tool fits closer to an interface we know exiting agents are good at.

        • Esophagus4 2 hours ago

          Awesome! So the agent has access to the DOM/JS running in the browser?

          That’s one of my biggest headaches writing user scripts currently: I write the script in an IDE with Claude then copy it to the browser / manually test it in the browser, then copy the results back to Claude or tell it what went wrong.

          Looking forward to trying this.

          • Zekio an hour ago

            to my knowledge all the major userscript extensions, at least allow watching for file changes so you don't have to copy it manually, so you can just refresh the page to test

            • mifydev 2 hours ago

              Yup, full access to DOM! Still needs a lot of optimizations, but the trick is that the agent reads the DOM as file, so it can grep parts of it naturally.