• saadn92 2 days ago

    I built something like this but much worse. No extension, no recording, I literally sit there with Chrome devtools open, do the action manually, copy the 3-4 network requests into a Python script, and replay them with urllib and a cookie jar.

    It's absurd but it works. Gumroad's cover image upload for example, their actual API can't do it, but the browser makes 3 requests (presign to their Rails Active Storage endpoint, PUT the binary to S3, POST the signed_blob_id to attach it). Captured those once in April, been replaying them since. I uploaded covers and thumbnails to 9 products today without opening a browser.

    Obviously falls apart the second they change anything.

    • arjunchint 2 days ago

      Yes exactly imagine now anyone, even non-technical people, can just prompt and interact with this hidden/deeper layer of the web, all in their regular browser!

      • nearestnabors 2 days ago

        Oh yes indeed

    • rvz 3 days ago

      Aren't there just many ways for the website to just break the automation?

      Does this work on sites that have protection against LLMs such as captchas, LLM tarpits and PoW challenges?

      I just see this as a never ending cat and mouse game.

      • acoyfellow 3 days ago

        It is. They are saying “we are willing to chase the mouse for you for money”.

        • arjunchint 2 days ago

          The bigger goal is to build and maintain a global library of popular automations. Users can also quickly re-record and recreate the scripts to update.

          Since it runs inside your own browser, there should be no captchas or challenges. On failure it can fallback to our regular web agent that can solve captchas.

          Big picture wise with the launch of Mythos it might just become impossible for websites to keep up, and they will have to go like Salesforce and just expose APIs for everything.

        • tim-projects 2 days ago

          If you could take this recording and turn it into a playwright script - that would be a massive time saver.

          Having to redo recordings once they break sounds like too much hassle.

          • arjunchint 2 days ago

            Hey thats a great idea, we will take a look into exploring this export option. But how would it save time by being a Playwright script?

            Right now since we have a custom sandbox to re-execute the code in, we are using our own syntax and exposed methods. So even now you can edit the generated script.

          • JSR_FDED 2 days ago

            Maybe there’s a middle ground where a small local model can roll with the variations in a site that would break a script, while saving the per token costs?

            • arjunchint 2 days ago

              We found Gemini Flash to be the sweet spot for both agentic actions as well as writing code. Even Flash-Lite is too hit or miss.

              We are thinking through on self healing mechanisms like falling back to a live web agent and rewriting script.

            • amelius 2 days ago

              The problem: I don't trust extensions one bit.

              • quarkcarbon279 2 days ago

                The reason we open our client side code is to bring in the trust in putting rtrvr's DOM intelligence in your web apps - https://github.com/rtrvr-ai/rover/tree/main . Our monetization is super straight forward with subscription - https://www.rtrvr.ai/pricing . The experiences of some extensions shipping anything or selling user data comes in when people build them as side-gigs not when we pour more than year in building the highly accurate automation engine. We have cloud sandboxes too if you prefer executing with the same intelligence on cloud and not on your own device.

                PS: Also, our data policy if you are interested: https://www.rtrvr.ai/blog/rtrvr-ai-privacy-security-how-we-h...

                • arjunchint a day ago

                  We don't sell user data, and we are in it to build a generational company.

                  We already have 25k+ users and have an opensource extension as well: https://github.com/rtrvr-ai/rover/tree/main/apps/preview-hel...

                  • notepad0x90 2 days ago

                    auditing the code is fairly straightforward if it isn't obfuscated. so long as it doesn't execute dynamic code that is. but the big issue is you can't control when the extension itself gets an update (to my knowledge). and it isn't uncommon to sell browsing data, or the extension itself to someone more shady than the original author down the road.

                    • amelius 2 days ago

                      Yes, this exactly.

                  • daylab 2 days ago

                    oh this is clever. running in main world dodges a lot of the usual scraping pain. how do you handle sites with strict csp that block inline scripts, is the extension somehow exempt?

                    • arjunchint 2 days ago

                      We execute the code in a sandbox and proxy the fetch calls through main world!