Very cool stuff. Love the focus on CPU-first.
Would also love to see some throughput numbers on basic VM setup.
Edit: there are some latency numbers in the paper https://arxiv.org/pdf/2507.18546
Zero-shot encoder models are so cool. I'll definitely be checking this out.
If you're looking for a zero-shot classifier, tasksource is in a similar vein.
There is another version at:
https://github.com/urchade/GLiNER
Looks like it’s still being maintained too?