« BackQuery Your Python Listsgithub.comSubmitted by mkalioby 5 days ago
  • dsp_person 2 hours ago

    Interesting... I've been playing with the idea of embedding more python in my C, no cython or anything just using <Python.h> and <numpy/arrayobject.h>. From one perspective it's just "free" C-bindings to a lot of optimized packages. Trying some different C-libraries, the python code is often faster. Python almost becomes C's package manager

    E.g. sorting 2^23 random 64-bit integers: qsort: 850ms, custom radix sort: 250ms, ksort.h: 582ms, np.sort: 107ms (including PyArray_SimpleNewFromData, PyArray_Sort). Where numpy uses intel's x86-simd-sort I believe.

    E.g. inserting 8M entries into a hash table (random 64-bit keys and values): MSI-style hash table: ~100ns avg insert/lookup, cc_map: ~95ns avg insert/lookup, Python.h: 91ns insert, 60ns lookup

    I'm curious if OPs tool might fit in similarly. I've found lmdb to be quite slow even in tmpfs with no sync, etc.

    • sevensor 4 hours ago

      Having seen a lot of work come to grief because of the decision to use pandas, anything that’s not pandas has my vote. Pandas: if you’re not using it interactively, don’t use it at all. This advice goes double if your use case is “read a csv.” Standard library in Python has you covered there.

      • c0balt 4 hours ago

        Both duckdb and especially polars should also be mentioned here. Polars in particular is quite good Ime if you want a pandas-alike interface (it additionally also has a more sane interface).

        • ttyprintk 4 hours ago

          Since DuckDB can read and write Pandas from memory, a team with varying Pandas fluency can benefit from learning DuckDB.

        • glial 2 hours ago

          Interesting work. I'd be curious to know the timing relative to list comprehensions for similar queries, since that's the common standard library alternative for many of these examples.

          • abdullahkhalids 2 hours ago

            I don't understand why numeric filters are included. The library is written in python, so shouldn't a lambda function based filter be roughly as fast but much easier/clearer to write.

            • MathMonkeyMan 10 minutes ago

              I'm not the author, but this implementation has the benefit of being a JSON compatible DSL that you can serialize. Maybe that's intentional, maybe not.

              It does look like Python's comprehensions would be a better choice if you're writing them by hand anyway.

            • tempcommenttt 2 hours ago

              It’s nice it’s fast at 10k dictionary entries, but how does it scale?