• jbellis 2 hours ago

    Love to see people leveraging static analysis for AI agents. Similar to what we're doing in Brokk but we're more tightly coupled to our own harness. (https://brokk.ai/) Would love to compare notes; if you're interested, hmu at [username]@brokk.ai.

    Quick comparison: Auditor does framework-specific stuff that Brokk does not, but Brokk is significantly faster (~1M loc per minute).

    • ThailandJohn 44 minutes ago

      Would be really cool to compare notes :D Sent from a "non tech" company email so it doesn't get filtered lol.

      My speed really depends on language and what needs indexing. On pure Python projects I get around 220k loc/min, but for deeper data flow in Node apps (TypeScript compiler overhead + framework extraction) it's roughly 50k loc/min.

      Curious what your stack is and what depth you're extracting to reach 1M/min - those are seriously impressive numbers! :D

    • digdugdirk 3 hours ago

      Cool! I've been playing with the same code -> graph concept for LLM work. Why did you decide to go for a pseudo-compiler with a ton of custom rules rather than try to interact with the AST itself?

      • ThailandJohn 2 hours ago

        Hi! Limitations of tree sitter, its insanely fast, easy to use but hits a limit on syntax/nodes only. Typescript compiler provides semantic with full type checking and cross module resolution. Its a small nightmare as I have to write every extraction and parser for it (why i call it "pseudo compiler"). Its a necessity to gain full call chain provenance across callee/caller, framework and validations, which is a "hard" requirement for the taint analysis to work. If you want to get down into code for it? The top layer is ast_parser.py which routes a few places but taking js/ts as an example? look at data_flow.ts / javascript.py which shows the ast/extraction/analyzing layers to capture and make sense of it in the database. :)

      • esafak 2 hours ago

        Lots of formal methods and verification submissions this week!

        • doganugurlu 3 hours ago

          Great idea!

          Did you consider using treesitter instead of the pseudo compiler?

          • ThailandJohn an hour ago

            Hey! Yes I did. I started with treesitter tbh. And for go, rust, bash and hcl? I still do. In my naive beginnings, i really had no idea how complex things "were supposed to be", so i was never really deterred for it and kept building it piece by piece and very quickly? (Because I wanted "everything"). I hit hard limitations with treesitter, not only for "taint resolution" but overall what I could check, what I could do...

            It "starts with symbols", you get the basic starter kit but then quickly it became "this proves it exists" but "not what it does". Which meant taint couldn't work properly because you want to track assigments, function call arguments etc to see how the data actually flows. Same thing with the rules engine. Without tracking object literals? xss detection becomes very shallow with tons of false positives because treesitter wont be able to tell you property assigments or call methods.

            And it feels like it keeps going like that for infinity with various aspects and things I wanted know and track. So all in all? Moving away from treesitter and taking on the "mountain" allowed me (after losing weeks of sanity lol) to incrementally build out virtually anything i wanted to extract or check....It does sadly leave some "money on the table" for other languages, take rust as an example? Due to treesitter the taint engine is limited to no cross module resolution and type checking. So that's why :)