• visarga 2 hours ago

    Interesting, a computer use environment. I made a CUA benchmark too, 200 web tasks with internal code based evaluation. You can integrate them if you want.

    https://github.com/UiPath/uipath_enterprise_benchmark

    https://arxiv.org/abs/2511.17131

    • frabonacci an hour ago

      Hey visarga - I'm the founder of Cua, we might have met at the CUA ICML workshop? The OS-agnostic VNC approach of your benchmark is smart and would make integration easy. We're open to collaborating - want to shoot me an email at f@trycua.com?