Comments Page - Launch HN: Browser Use (YC W25)

« Back Launch HN: Browser Use (YC W25) – open-source web agentsgithub.comSubmitted by MagMueller 5 months ago

arjunchint 5 months ago
Have you inspected or thought through the security of your open source library?
You are using debugger tools such as CDP, launching playwright without a sandbox, and guiding users to launch Chrome in debugger mode to connect to browser-use on their main browser.
The debugging tools you use have active exploits that Google doesn't fix because they are supposed to be for debugging and not for production/general use. This combined with your other two design choices let an exploit to escalate and infect their main machine.
Have you considered not using all these debugging permissions to productionize your service?
- h4ny 5 months ago
  Thank you! It's constructive, helps the people who are making things while giving them the benefit of the doubt, keeps users safe, and educate those who have enough technical understanding on the topic.
- rainforest 5 months ago
  Could you go into a bit more detail about this? Why is exposing devtools to the agent a problem? What's the attack vector? That the agent might do something malicious to exfil saved passwords?
  arjunchint 5 months ago
  Forget the agent, browser-use's published setup instructions to use with your own Chrome profile and passwords [https://docs.browser-use.com/customize/real-browser, https://github.com/browser-use/browser-use/blob/495714e2dd38...] launches a Chrome session with Remote Debugging enabled.
  These tools they are guiding users to setup and execute are "inherently insecure" [https://issues.chromium.org/issues/40056642].
  So if you go to a site that can take advantage of these loopholes then your browser is likely to be compromised and could escalate from their.
  rainforest 5 months ago
  Thanks, for the benefit of others the risk is that the devtools port has no Auth so is vulnerable to XSS.
  I would surmise that this will stop being a problem if you switch to using a unix socket for the CDP.
- jerpint 5 months ago
  If they’re launching a cloud-based service, doesn’t this effectively remove the risk of running it locally ?
  arjunchint 5 months ago
  Their key offering is an open source solution that you can run on your own laptop and Chrome browser, but their approach to doing this presents a huge security risk.
  They do have a cloud offering that should not have these risks but then you have to enter your passwords into their cloud browser environment, presenting a different set of risks. Their cloud offering is basically similar to SkyVerne or even a higher cost tier subscription we have at rtrvr.ai
  mohsen1 5 months ago
  Cross origin is still a problem when those settings are off
- 101008 5 months ago
  This is very very important. It's completely unusable if this isn't solved. The agent could access a website that takes control of your machine.
- gregpr07 5 months ago
  how would that work? Can you control the browser without debug mode? Especially in production the browsers are anyway running on single instance docker containers so the file system is not accesible... are there exploits that can do harm from a virtual machine?
  gloosx 5 months ago
  Yes, you can control the browser without debug mode, and the common way to do it is ChromeDriver[1].
  [1] https://developer.chrome.com/docs/chromedriver/get-started
  arjunchint 5 months ago
  Yes, I was able to figure out a secure way to control the browser with AI Agents at rtrvr.ai without using debugger permissions/tools so it is most definitely possible.
  I meant by in production in the sense how you are advising your users to setup the local installation. Even if you launch browser use locally within a container but your restarting the user's Chrome in debug mode and controlling it with CDP from within the container, then the door is wide open to exploits and the container doesn't do anything?!
  shridharxp 5 months ago
  Injecting JS into the page or controlling it using extension APIs is not a secure way to control the browser.
  arjunchint 5 months ago
  I never mentioned injecting JS into the page, and besides injecting LLM generated code or generally remote code won't be approved by the Chrome Store https://developer.chrome.com/docs/extensions/develop/migrate....
  Your claim is analogous to saying that Apple's app store is not secure. We had to go through stringent vetting and testing by Google to list in the Chrome Store. Any basis or reasoning you can provide for your claim?
  Regardless, its a wild leap to claim a Chrome Store Chrome Extension is more insecure than this arbitrary binary?
  gregpr07 5 months ago
  Yeah, sorta feels like docker on a new instance is safer than connecting to actual browsers and injecting js code there… would love to skip cdp protocol though, it’s quite restrictive
  arjunchint 5 months ago
  Are you making a straw man argument? I am not injecting js code, we solved this problem in a secure way with minimal permissions taken by our Chrome Extension, which runs in safe and secure sandbox within the browser.
  Perhaps we are talking past each other, your literally giving instructions to your users to connect to their actual browsers: https://docs.browser-use.com/customize/real-browser Where under the hood your launching Chrome with debugging mode but with the user's credentials and passwords. This browser is then controlled via CDP by a highly insecure browser-use binary running in a container. Your users are bound to get pwned with this setup! https://github.com/browser-use/browser-use/blob/70ae758a3bfa...
  lostmsu 5 months ago
  Embed a WebView instead of launching browser?
  TingPing 5 months ago
  I don’t know what problems they wanted to solve but one issue is browsers are trusted, views are not, so you have to workaround fingerprinting.
cookiengineer 5 months ago
I've been following your progress for a while now and I'm super impressed how far you've got already.
Are you working on unifying the tools that the LLM uses with the MCP / model context protocol?
As far as I understand, lots of other providers (like Bolt/Stackblitz etc) are migrating towards this. Currently, there's not many tools available in the upstream specification other than File I/O and some minor interactions for system-use - but it would be pretty awesome if tools and services (like say, a website service) could be reflected there as it would save a lot of development overhead for the "LLM bindings".
Very interesting stuff you're building!
- gregpr07 5 months ago
  hmm, I though about this a lot. But tbh I think MCP is sort of a gimmick... probably the better way is for agents just to understand the http apis directly. Maybe I'm wrong, very happy to be convinced differently. Do you think MCP server for the cloud version would be useful?
  ec109685 5 months ago
  MCP seems nicer than requiring LLM hosts execute arbitrary curl calls to endpoints since it packages a tool into a dedicated plugin that users can opt into.
  ec109685 5 months ago
  E.g. this puppeteer MCP: https://x.com/windsurf_ai/status/1894553166609617032
  nostrebored 5 months ago
  strong agree with this -- I don't understand outside of integration with Claude Desktop why to use MCP rather than a dedicated API endpoint.
  fallinditch 5 months ago
  I believe MCP helps here because it standardizes integrations, enhances user control via opt-in plugins, and improves security by avoiding direct endpoint calls.
  @gregpr07 wouldn't adoption of MCP open up Browser-use to more use cases?
  gregpr07 5 months ago
  It probably would yeah. I was very against it but this HN post sorta points me to “people want mcp”
  anonzzzies 5 months ago
  Hmm, I don't mean this negatively, but MCP is mentioned only in this thread and 2 comments up you say 'I thought about this a lot' and now you are convinced based on 2 people saying they would like it? Isn't standardising something people want always? And MCP seems to have easily won that 'battle' user wise -> what have you been thinking about exactly?
  gregpr07 5 months ago
  What’s your take - how can we expose Browser Use to as many use cases as possible? Is there easier way than openapi config?
  elyase 5 months ago
  I want to use browser-use in Cursor but I am using another option because it doesn't support MCP integration which is the common language they support for external tools
  nsonha 5 months ago
  as building blocks we of course prefer APIs. However, interfacing directly with the browser (or desktop) can enable end users to do way more things without having integration built by devs, in theory at least. In reality, LLM may not have reached that point yet and there are security concerns.
- rahimnathwani 5 months ago
  You can write a wrapper to use it with MCP, or use one someone else has created:
  https://github.com/Saik0s/mcp-browser-use/blob/main/README.m...
agotterer 5 months ago
I’m excited about the space and intend to keep an eye on you guys. I actually gave the opened source version of browser-use a try last week and ran into two problems:
The first, it refused to correctly load the browser tab and would get stuck in a loop trying. I was able to manually override this behavior for the purpose of prototyping.
The second, it hallucinated form input values. I provided it strict instructions on how to fill out a form and when it didn’t know what to do with an address field, it just wrote 123 Main St instead of not being able to complete the form.
The thing I really want and haven’t found in any of the browser agents I’ve tried, is a feedback loop. I don’t personally know what the final format looks like. But I want to know that what I expected to happen inside the browser, actually happened, and I want it to be verifiable. Otherwise I feel like I'm sending request into a black hole.
- gregpr07 5 months ago
  Yeah - one version had that weird bug - we fixed it already (I think, super happy to tell me how to replicate if not).
  Hmm really? Maybe you could use the sensitive data api to make it more deterministic? https://docs.browser-use.com/customize/sensitive-data
  How would you imagine a perfect feedback loop?
tnolet 5 months ago
How are you different from https://www.browserbase.com/ and their Stagehand framework? [0]
[0]https://github.com/browserbase/stagehand
- undefined 5 months ago
  [deleted]
- baal80spam 5 months ago
  From the first glance, browser-use is compatible with more models, and has (much) more github stars ;)
  Coincidentally I played with it over the last weekend using Gemini model. It's quite promising!
  gregpr07 5 months ago
  Yeah, we are much bigger and work on a higher level. stagehand work step by step, we are trying to make end to end web agents.
alex1115alex 5 months ago
Awesome job launching guys! We used Browser Use last week to order burgers from our smart glasses:
https://x.com/caydengineer/status/1889835639316807980
One thing I'm hoping for is an increase in speed. Right now, the agent is slow for complex tasks, so we're still in an era where it might be better to codify popular tasks (eg: sending a WhatsApp message) instead of handling them with browser automation. Have yall looked into Groq / Cerberus?
- MagMueller 5 months ago
  One option could be for the main apps like WhatsApp to have defined custom actions, which are almost like an API to the service. I think the interplay between LLM and automation scripts will succeed here:
  Agent call 1: Send WhatsApp message (to=Magnus, text=hi) Inside, you open WhatsApp and search for Magnus (without LLM)
  Agent call 2: Select contact from all possible Magnus contacts Script 3: Type the message and click send
  So in total, 2 calls - with Gemini, you could already achieve this in 10-15 seconds.
- gregpr07 5 months ago
  That was such a cool demo man! We are working on speed, we are already 3-4x faster than operator with gpt4o
OsrsNeedsf2P 5 months ago
Does anyone have experience comparing this to Skyvern[0]? I originally thought the $30/month would be the killer feature, but it's only $30 worth of credits. Otherwise they both seem to have the same offering
[0] https://www.skyvern.com/
- gregpr07 5 months ago
  I think our cloud is much simpler (just one prompt and go). But it's also sort of a different service. The main differences come from the open source side - we are essentially building more of a framework for anytime to use and they are just a web app.
jackienotchan 5 months ago
AI agents have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN.
Do you have any built-in features that address these issues?
- MagMueller 5 months ago
  Yes, some hosting services have experienced a 100%-1000% increase in hosting costs.
  On most platforms, browser use only requires the interactive elements, which we extract, and does not need images or videos. We have not yet implemented this optimization, but it will reduce costs for both parties.
  Our goal is to abstract backend functionality from webpages. We could cache this, and only update the cache if eTags change.
  Websites that really don't want us will come up with audio captchas and new creative methods.
  Agents are different from bots. Agents are intended as a direct user clone and could also bring revenue to websites.
  erellsworth 5 months ago
  >Websites that really don't want us will come up with audio captchas and new creative methods.
  Which you or other AIs will then figure a way around. You literally mention "extract data behind login walls" as one of your use cases so it sounds like you just don't give a shit about the websites you are impacting.
  It's like saying, "If you really don't want me to break into your house and rifle through your stuff, you should just buy a more expensive security system."
  gregpr07 5 months ago
  imo if the website doesn't want us there the long term value is anyway not great (maybe exception is SERP apis or sth which live exlusively because google search api is brutally expensive).
  > extract data behind login walls
  We mean this more from a perspective of companies wanting it, but there is a login wall. For example (actual customer) - "I am a compliance company that has system from 2001 and interacting with it really painful. Let's use Browser Use to use the search bar, download data and report back to me".
  I believe in the long run agents will have to pay for the data from website providers, and then the incentives are once again aligned.
  erellsworth 5 months ago
  > imo if the website doesn't want us there the long term value is anyway not great
  Wat? You're saying if a website doesn't want your scraping their data then that data has low long-term value? Or are you saying something else because that makes no fucking sense.
  gregpr07 5 months ago
  Haha no, I am saying that if websites don’t want you there they will find a way to block you in the long run, so betting the product on extracting data from those websites is a bad business model (probably)
  xena 5 months ago
  It would be really nice if you made some easy way for administrators to tell that a client is using browser use so they can administratively block that tool. I mean, unless you want to pay for the infrastructure improvements to the websites your product is assaulting.
- deadfece 5 months ago
  In my experience these web agents are relatively expensive to run and are very slow. Admittedly I don’t browse HN frequently but I’d be interested to read some of these agent abuse stories, if any stand out to you. (I’ve been googling for ai agent website abuse stories and not finding anything so far)
- undefined 5 months ago
  [deleted]
kimoz 5 months ago
A UI based on browser-use:
https://github.com/browser-use/web-ui
rvz 5 months ago
> On the open-source side, browser use remains free. You can use any LLM, from Gemini to Sonnet, Qwen, or even DeepSeek-R1. It’s licensed under MIT, giving you full freedom to customize it.
As this project is MIT, that means companies like Amazon can deploy a managed version and can compete against you with prices going close to zero in their free-tier and with a higher quotas than what you are offering.
I predict that this project is likely going to change to AGPL or a new business license to combat against this.
- MagMueller 5 months ago
  Have you seen in the past that Amazon did that against other projects?
  rvz 5 months ago
  Previous well known victims include: Elasticsearch, MongoDB and Redis.
mritchie712 5 months ago
How do you keep your service from being blocked on LinkedIn?
LinkedIn's API sucks. I run an analytics platform[0] that uses it and it only has 10% of what our customers are asking for. It'd be great to use browser-use, but in my experience, you run into all sort of issues with browser automation on LinkedIn.
0 - https://www.definite.app/
- MagMueller 5 months ago
  If you run it locally, you can connect it to your real browser and user profile where you are already logged in. This works for me for LinkedIn automation, e.g., to send friend requests or answer messages.
  A bigger problem on LinkedIn for us is all the nested UI elements and different scrolling elements. With some configuration in our extraction layer in buildDomTree.js and some custom actions, I believe someone could build a really cool LinkedIn agent.
  edoceo 5 months ago
  Make a separate profile and launch that for the scrape. Don't have to gum up your primary profile
- undefined 5 months ago
  [deleted]
echelon 5 months ago
Gregor and Magnus, the only thing I want in the world is an agent that removes all of the bad from the web:
- No more ads. No more banner ads. No more Google search ads. No more promoted stories. No more submarine ads.
- No more spam. Low quality content is nuked.
- No more clickbait. Inauthentic headlines from dubious sources are removed.
- No more rage comments. Angry commenters are muted. I have enough to worry about in my day.
- No more low-information comments. All the "Same." and "Nice." and low informational comments are removed to help me focus.
An agent of the future is there to preserve my precious time and attention. It will safeguard it to a level never before seen, even if it threatens the very business model the internet's consumer tools are based on. You work for me. You help me. Google, Reddit, et al. are adversarial relationships.
In the future, advertisers pay me for the privilege of pitching me. No ad reaches me without my consent or payment.
Please fix the internet. We've waited thirty years for this.
- MagMueller 5 months ago
  We see people replacing UIs and using browser-use to fill out the real UI. So there could be a world where everyone has their own UI, and you could have that filter option.
  Furthermore, valid point: if Pepsi spends $1M on ads, why don't you get a piece of it if they pitch to you?
anhner 5 months ago
This looks very promising. Thank you for making it open source! At first glance, I couldn't find if you can run it using local models (ollama?). Is that at all possible?
Edit: for anyone else looking for this, it seems that you can: https://github.com/browser-use/browser-use/blob/70ae758a3bfa...
- MagMueller 5 months ago
  Yes! People love Deepseek-Chat / R1 and the new Qwen versions. It works with ChatOllama. However, Llama itself does not work very well with our tool calling and is often confused by the structured output format.
p0deje 5 months ago
Have you experimented with using text-only models and DOM/accessibility tree for interaction with a ? I'm currently working on the open-source test automation tool (https://alumnium.ai) and the accessibility tree w/o screenshots works pretty well as long as the website provides decent support for ARIA attributes or at least has proper HTML5 structure.
- MagMueller 5 months ago
  On most pages, we don't need vision, and the DOM alone is sufficient. We have not worked with the accessibility tree yet, but it's a great idea to include that. Do you have any great resources on where to get started?
  p0deje 5 months ago
  > On most pages, we don't need vision, and the DOM alone is sufficient.
  I misunderstood looking at demo videos, it seemed like you constantly update elements with borders/IDs so I assumed that's what is then passed to vision.
  > Do you have any great resources on where to get started?
  A great place to start is https://chromium.googlesource.com/chromium/src/+/main/docs/a....
undefined 5 months ago
[deleted]
darepublic 5 months ago
The title says make your website more accessible for agents... But then the quick start seemingly just acts from the agentic side to find a post on Reddit. So I didn't fully grok what this is about. My initial guess is you use agents on a website, allow them to think long, then come up with some selectors to speed up subsequent tries. But it's really not clear to me
- MagMueller 5 months ago
  We extract all the interactive elements from a page like id 1. button id 2. drop-down id 3. textarea.
  Then we present this list to the LLM with the task and the LLM outputs input_text(id 3, Hello World).
  Finally, we execute the Playwright code to perform the actual action of inputting text into this element.
- hakaneskici 5 months ago
  Could it be referring to the semantic correctness of HTML behind the scenes so AI can reasoned about the content structure? I think this is similar to a11y standards of assigning roles etc to UI elements, so non-AI agents like screen-readers can make sense of the contents.
hakaneskici 5 months ago
What is your overall vision and roadmap about automated testing for web apps by bringing value from AI into the process? When I worked on the accessibilityinsights.io team, dealing with inconsistent or complicated xPaths was also an issue. Is AI vision helping there much?
- MagMueller 5 months ago
  It could be useful to run a prompt/test once, get the xPaths, and rerun it deterministically. When it breaks, you know something is wrong, and the LLM could be used as a fallback to fix the script.
35mm 4 months ago
Are you planning on adding any stealth measures?
The first site (Idealista) I tried it on flagged and blocked me / my home IP as a bot within 10 seconds.
eob 5 months ago
This looks fantastic — congrats on the launch.
As an armchair observer, the agents + browser space feels like it’s waiting for someone to make the open source framework that everyone piles on to.
Proxy rotation sounds like a solid way to monetize for businesses.
- gregpr07 5 months ago
  Yeah that’s precisely why we introduced the cloud version!
dogman123 5 months ago
i tried the reddit quickstart example in the repo and it seemed to be incapable of completing the task.
https://pastebin.com/PnLnQ3kY
- gregpr07 5 months ago
  hmm interesting - sometimes it definitely fails yes. Will take a look!
  btw - our biggest challenge is exactly this, solving thousands of issues that arise on the fly.
  dogman123 5 months ago
  fwiw, i had it do something _far_ more complex that i am currently dealing with at work and it performed perfectly in my few test cases. i see very heavy use of this tool in my future. just figured i'd give a shot about the quickstart not functioning as planned :)
xnx 5 months ago
Is it possible to mix browser-use with traditional DOM/XPath/CSS-selector automation? e.g. Have certain automation steps that are more fuzzy/AI like "click on the image of a cat"
- gregpr07 5 months ago
  We are experimenting with this. Currently the library api is very raw but technically possible (we introduced this notion of initial actions, which are just deterministic actions before the LLM kicks in) - https://github.com/browser-use/browser-use/blob/main/example....
  The other way to achieve this with Browser Use is to save the history from `history = agent.run()` and rerun it with `agent.rerun_history(history)`.
  I'd love to see if this can of any use to you!
  cdolan 4 months ago
  Would be fantastic to convert the agent history into deterministic crawlers/scrapers (playwright typescript, etc)
  Ran into a tool called Promptwright on the Discord that was an example of this
throw03172019 5 months ago
This looks very useful for web apps. We have a use case for legacy Windows apps. How feasible is this kind of technology for performing agentic workflows in legacy native apps?
- MagMueller 5 months ago
  For Windows, Pig (https://github.com/pig-dot-dev/pig-python) or AskUI (https://github.com/askui/vision-agent) could be interesting.
- cadamsdotcom 5 months ago
  What about OmniParser ? https://www.microsoft.com/en-us/research/articles/omniparser...
  orliesaurus 5 months ago
  OmniParser is pretty amazing! Thanks for sharing! It parsed the SpaceJam 1996 website pretty well, despite that website being extremely out of date
kingkulk 5 months ago
So excited for this to be the future! Spectacular job!
shrisukhani 5 months ago
Congrats on launching guys!
So many people building with / on top of browser-use now - spawned a whole cottage industry! :)
djyaz1200 5 months ago
Congrats on the launch!
I just about fell out of my chair laughing at your cloud hosted tier with the tagline "We have to eat somehow™" aka "please pay us"
I signed up for the paid tier and I'm hopeful this can help us integrate legacy CRM's with our company's unified communication sales tool.
Either way good luck!
ramirond 5 months ago
Looks really cool! Shared it with my team so we can try it.
hugs 5 months ago
congrats on the launch! (selenium project creator here.) when/if you want to automate mobile devices (ios/android), let's chat!
orliesaurus 5 months ago
this is the best thing to ever happen to the RPA world
- MagMueller 5 months ago
  What do you think is the main problem it solves there?
  The cool thing is that we can extract xPaths from the agent runs and re-run these scripts deterministically. I think that's a big advantage over pure vision-based systems like Operator.
- undefined 5 months ago
  [deleted]
pcranaway 5 months ago
use-browser would be an amazing name too!
- nsonha 5 months ago
  from javascript? Everytime I see some function named such I assume it's something that could have been plain js but made into a hook because of amateur React devs.
- MagMueller 5 months ago
  I use browser-use. I use use-browser. I use mac-use. I use use.
moralestapia 5 months ago
Ha!
I just saw this win an AI Hackaton in Toronto but they said it was their own thing, quite dishonest. Everyone was rightfully impressed, me as well not gonna lie. I was a bit sus someone could come up with something like this in a weekend, but they were from U of Waterloo, Vector Institute and whatnot, so I said "maybe". Now I know they were just a bunch of scammers, sad.
Anyway, this is a great project, congratulations. It's so good it's making other people win already, lol. I have so many use cases for this. I truly wish you the best!
Edit: Downvote me all you want, if you love scammers so much I can send you their contact so you can "invest" in their trash. Lol.
- MagMueller 5 months ago
  For me, it simply demonstrates how easy and fast you can build these tools now. We have many fellow YC founders who build great products on top of browser-use. They don't have to quote us. I think it's awesome to enable so many new startup ideas.
- undefined 5 months ago
  [deleted]
- giarc 5 months ago
  Did they claim the project as their own, or did they use the open source to build a project?
  moralestapia 5 months ago
  They claimed the project as their own, with a title like "AI agents that do things for you".
  One of the judges explicitly asked if they actually made this thing or was it something else like "a video" showing what it would be like.
  One of the team members confidently replied it was real and that they made it all during the weekend.
  It was a bit too good to be true.
  Edit: I found a video of the thing. I initially posted it here but decided to delete it, the reason for that is I don't think they deserve to be publicly shamed. We were all having fun and they probably got a little carried away. If any of them sees this just don't do that next time. Play fair.
  baal80spam 5 months ago
  The audacity. Imagine if someone googled and exposed them.
- taytus 5 months ago
  Most hackathons are like that.
- abrichr 5 months ago
  Which hackathon?
tsvoboda 5 months ago
pretty sick stuff guys, excited to see what you accomplish
jlpom 5 months ago
Hi, great contribution! Can you highlight how does it compare to https://github.com/TaxyAI/browser-extension which also uses the DOM?
anyekwest 5 months ago
These guys are goated