• obblekk 7 hours ago

    The open source ranking library is really interesting. It's using a type of merge sort where the comparator function is an llm comparing (but doing batches >2 for fewer calls).

    Reducing problems to document ranking is effectively a type of test-time search - also very interesting!

    I wonder if this approach could be combined with GRPO to create more efficient chain of thought search...

    https://github.com/BishopFox/raink?tab=readme-ov-file#descri...

    • rahimnathwani 6 hours ago

      The article introducing the library has something about how pairwise comparisons are most reliable (i.e. for each pair of items you ask an LLM which they prefer) but computationally expensive. Doing a single LLM call (rank these items in order) is much less reliable. So they do something in between that gives enough pairwise comparisons to have a more reliable list.

      https://news.ycombinator.com/item?id=43175658

    • hexator 5 hours ago

      This furthers an idea I've had recently that we (and the media) are focusing too much on creating value by making more ever more complex LLMs, and instead we are vastly underestimating creative applications of current generation AI.

      • noperator 4 hours ago

        Agree. I think LLMs are usually not "harnessed" correctly for complex, multi-step problems—hence the `raink` CLI tool: https://github.com/noperator/raink

      • rfurmani 6 hours ago

        Very cool! This is also one of my beliefs in building tools for research, that if you can solve the problem of predicting and ranking the top references for a given idea, then you've learned to understand a lot about problem solving and decomposing problems into their ingredients. I've been pleasantly surprised by how well LLMs can rank relevance, compared to supervised training of a relevancy score. I'll read the linked paper (shameless plug, here it is on my research tools site: https://sugaku.net/oa/W4401043313/)

        • antirez 2 hours ago

          One interesting thing about LLMs, that is also related to why chain of thoughts work so well, is that they are good at sampling (saying a lot of things about a problem), and are good, when shown N solutions, to point at the potentially better one. They do these things better than zero-shot "tell me how to do that". So CoT is searching inside the space of representation + ranking, basically. So this idea is leveraging something LLMs are able to clearly do pretty well.

          • marcosdumay an hour ago

            Hum... The gotcha is that LLMs can rank for subject relevance, but not for most other kinds of quality.

            • mskar 5 hours ago

              Great article, I’ve had similar findings! LLM based “document-chunk” ranking is a core feature of PaperQA2 (https://github.com/Future-House/paper-qa) and part of why it works so well for scientific Q&A compared to traditional embedding-ranking based RAG systems.

              • noperator 5 hours ago

                That's awesome. Will take a closer look!

              • adamkhakhar 4 hours ago

                I'm curious - why is LLM ranking preferred over cosine similarity from an embedding model (in the context of this specific problem)?

                • panarky 4 hours ago

                  Because the question "does Diff A fix Vuln B" is not answered by the cosine distance between vector(Diff A) and vector(Vuln B).

                • noperator 7 hours ago

                  A concept that I've been thinking about a lot lately: transforming complex problems into document ranking problems to make them easier to solve. LLMs can assist greatly here, as I demonstrated at inaugural DistrictCon this past weekend.

                  • lifeisstillgood 6 hours ago

                    So would this be 1600 commits and one of which fixes the bug (which might be easier with commit messages?) or is this a diff between two revisions, with 1600 chunks, each chunk a “document” ?

                    I am trying to grok why we want to find the fix - is it to understand what was done so we can exploit unpatched instances in the wild?

                    Also also

                    “identifying candidate functions for fuzzing targets“ - if every function is a document I get where the list of documents is, what what is the query - how do I say “find me a function most suitable to fuzzing”

                    Apologies if that’s brusque - trying to fit new concepts in my brain :-)

                    • noperator 4 hours ago

                      Great questions. For commits or revision diffs as documents—either will work. Yes, I've applied this to N-day vulnerability identification to support exploit development and offensive security testing. And yes, for fuzzing, a sensible approach would be to dump the exported function attributes (names, source/disassembled code, other relevant context, etc.) from a built shared library, and ask, "Which of these functions most likely parses complex input and may be a good candidate for fuzzing?" I've had some success with that specific approach already.

                  • Everdred2dx 6 hours ago

                    Very interesting application of LLMs. Thanks for sharing!

                    • westurner 6 hours ago

                      Ranking (information retrieval) https://en.wikipedia.org/wiki/Ranking_(information_retrieval...

                      awesome-generative-information-retrieval > Re-ranking: https://github.com/gabriben/awesome-generative-information-r...

                      • m3kw9 5 hours ago

                        That title hurts my head to read

                        • moralestapia 4 hours ago

                          Minor nitpick,

                          Should be "document ranking reduces to these hard problems",

                          I never knew why the convention was like that, it seems backwards to me as well, but that's how it is.

                          • dwringer 3 hours ago

                            "Document ranking reduces to these hard problems" would imply that document ranking is itself an instance of a certain group of hard problems. That's not what the article is saying.

                            • moralestapia an hour ago

                              I know its counterintuitive, as I explained in my comment, but that's the correct terminology in CS world.