• muzani 12 hours ago

    It has been done:

    https://youtube.com/watch?v=shnW3VerkiM

    https://youtube.com/watch?v=VQhS6Uh4-sI

    First one is more impressive looking. Second one more reliable.

    I think the real hard part is nobody wants to maintain these, and nobody really wants to pay to use them either. It's a lot of work and not something people do for free. It's no surprise these emerged (and won) in hackathons.

    All the major operating systems are dedicating their full efforts into this, so it doesn't make much sense to actually raise money and do it.

    • simne 7 hours ago

      As people already said, ca problem is complex problem, and need to resolve at simplest consideration, few simpler problems.

      I will list some of simpler problems:

      1. Some sort of reliable screen read, capable for all sorts of screen output (not just html-like or any other already structured markup).

      2. Some sort of universal optimizer, capable to solve any task, solvable for human in simplified computer environment.

      3. Some sort of reliable "Understanding Engine", to make queries with simplified language, easy to use by human, which we could theoretically solve using few different ways (I list only two most known).

      3a. Some deep learning AI.

      3b. Some huge implementation of semantic AI.

      • jfbfkdnxbdkdb 15 hours ago

        Because a llm != a generally Intelligent mind...

        Whilst they are a massive Step forward ... We still have a long way to go for that...

        Why not try it yourself with ollama a large model and some rented hardware ... You will get something ... But it will not be consistent...

        • jfbfkdnxbdkdb 15 hours ago

          Not to be a doubter in llms being powerful ... Just that every time I try them ... They just don't do what I want....

          • mu53 15 hours ago

            have you tried adding "please"? I found that it works wonders

            • collingreen 13 hours ago

              I can't tell if this is serious or tongue in cheek and I find that both funny and deeply discouraging about the state of the world. For some reason it's giving me Rick and Morty butter robot vibes.

              • jfbfkdnxbdkdb 12 hours ago

                Tried that ... But competently writing rust is just not a priority for the llms I chat with

          • tacostakohashi 18 hours ago

            Because the websites want to serve ads to humans, upsell you, get you to sign up for their credit card too, so their implementation are highly obfuscated and dynamic.

            If they wanted to be easy to work with, they'd offer a simple API, or plain HTML form interface.

            • wavemode 15 hours ago

              There are technical limitations, sure, (getting an AI to parse a screen and interact with it via mouse and keyboard is harder than it sounds - and it sounds hard to start with) but the main limitation is still economical. Does it really make sense to train a multi-billion-parameter AI to click buttons, if you could instead just make an API call?

              There's an intersection between "high accuracy" and "low cost" that AI has not quite reached yet for this sort of task, when compared to simpler and cheaper alternatives.

              • hildolfr 15 hours ago

                People are using huge capable LLMs to answer things like "what's five percent of 250"; I don't see a big leap in using them to skip APIs.

                On the other side, a lot of user access methods are more able than an API call equivalent, people already exploit things like autohotkey to work around such limitations -- if people are already working around things that way that must indicate the presence of some sort of market.

              • louisfialho 16 hours ago

                Thanks for the answers. Even the unexpected patterns like pop-ups etc to me feel pretty structured - I would expect models to generalize and navigate any of them I could see more websites blocking agents in the future but it seems like we're so early that this is not a limiting factor yet

                • louisfialho 18 hours ago

                  If someone is actively working on this and believes there is a path please reach out to me

                  • 42lux 18 hours ago

                    Because reality has a lot of details.

                    • MattGaiser 18 hours ago

                      I have experience with a tiny part of this problem; accessing the various websites and figuring out where to click.

                      Presently, doing this requires a fair bit of continuous work.

                      Many websites don't want bots on them and are actively using countermeasures that would block operators in the same way they block scrapers. There is a ton of stuff a website can do to break those bots and they do it. Some even feed back "phantom" data to make the process less reliable.

                      There are a lot of businesses out there where the business model breaks if someone else can see the whole board.

                      • pestatije 18 hours ago

                        you mean Alexa?