• tonyww a day ago

    One clarification since a few comments from coworkers/friends are circling this: Amazon isn’t the point here.

    We used it because it’s a dynamic, hostile UI, but the design goal is a site-agnostic control plane. That’s why the runtime avoids selectors and screenshots and instead operates on pruned semantic snapshots + verification gates.

    If the layout changes, the system doesn’t “half-work” — it fails deterministically with artifacts. That’s the behavior we’re optimizing for.

    • tomhow 14 hours ago

      Can you please clarify: is this project something that "people can play with"? I.e., can users download the code and sample data and try it out for themselves, or play with it some other way?

      That's a prerequisite for Show HN.

      I'm removing the Show HN prefix for now, until we get clarity. Then we can consider re-upping the post once we know exactly how to present it.

    • ares623 14 hours ago

      > If the layout changes, the system doesn’t “half-work” — it fails deterministically with artifacts. That’s the behavior we’re optimizing for.

      how is this different than building a scraper script that does it traditionally?

      • tonyww an hour ago

        Good question. On the surface, it does look very similar to the traditional scraper/script, but there's a subtle difference in where the logic lives and how failures are handled.

        A traditional scraper/script hard-codes selectors and control flow up front. When the layout changes, it usually breaks at an arbitrary line and you debug it manually.

        In this setup, the agent chooses actions at *runtime* from a bounded action space, and the system uses the built-in predicates (e.g. url_changes, drawer_appeared, etc) to verify the outcomes. When it fails, it fails at a specific semantic assertion with artifacts, not a missing selector.

        So it’s less “replace scripts” and more “apply test-style verification and recovery to AI-driven decisions instead of static code.”

        • blibble 12 hours ago

          it costs a lot more

      • cjbarber 17 hours ago

        looks interesting, though note:

        > Show HN is for something you've made that other people can play with.

        > Off topic: blog posts, sign-up pages, newsletters, lists, and other reading material. Those can't be tried out, so can't be Show HNs. Make a regular submission instead.

        https://news.ycombinator.com/showhn.html

        • tonyww an hour ago

          Sorry for the misunderstanding, I intended to post it as news or engineering article, which is why I didn't include *Show HN* in the title