Comments Page - Query Your Python Lists

« Back Query Your Python Listsgithub.comSubmitted by mkalioby 5 days ago

dsp_person 2 hours ago
Interesting... I've been playing with the idea of embedding more python in my C, no cython or anything just using <Python.h> and <numpy/arrayobject.h>. From one perspective it's just "free" C-bindings to a lot of optimized packages. Trying some different C-libraries, the python code is often faster. Python almost becomes C's package manager
E.g. sorting 2^23 random 64-bit integers: qsort: 850ms, custom radix sort: 250ms, ksort.h: 582ms, np.sort: 107ms (including PyArray_SimpleNewFromData, PyArray_Sort). Where numpy uses intel's x86-simd-sort I believe.
E.g. inserting 8M entries into a hash table (random 64-bit keys and values): MSI-style hash table: ~100ns avg insert/lookup, cc_map: ~95ns avg insert/lookup, Python.h: 91ns insert, 60ns lookup
I'm curious if OPs tool might fit in similarly. I've found lmdb to be quite slow even in tmpfs with no sync, etc.
sevensor 4 hours ago
Having seen a lot of work come to grief because of the decision to use pandas, anything that’s not pandas has my vote. Pandas: if you’re not using it interactively, don’t use it at all. This advice goes double if your use case is “read a csv.” Standard library in Python has you covered there.
- c0balt 4 hours ago
  Both duckdb and especially polars should also be mentioned here. Polars in particular is quite good Ime if you want a pandas-alike interface (it additionally also has a more sane interface).
- ttyprintk 4 hours ago
  Since DuckDB can read and write Pandas from memory, a team with varying Pandas fluency can benefit from learning DuckDB.
glial 2 hours ago
Interesting work. I'd be curious to know the timing relative to list comprehensions for similar queries, since that's the common standard library alternative for many of these examples.
abdullahkhalids 2 hours ago
I don't understand why numeric filters are included. The library is written in python, so shouldn't a lambda function based filter be roughly as fast but much easier/clearer to write.
- MathMonkeyMan 10 minutes ago
  I'm not the author, but this implementation has the benefit of being a JSON compatible DSL that you can serialize. Maybe that's intentional, maybe not.
  It does look like Python's comprehensions would be a better choice if you're writing them by hand anyway.
tempcommenttt 2 hours ago
It’s nice it’s fast at 10k dictionary entries, but how does it scale?