• maxpert 18 minutes ago

    Kind of what I've been working on to build tenancy on top of SQLite CDC to make it a simple repayable SQLite for Marmot (https://github.com/maxpert/marmot). I personally think we have a synergy here, would drop by your discord.

    • csense 18 minutes ago

      When someone says "stream data over the Internet," my automatic reaction is "open a TCP connection."

      Adding a database, multiple components, and Kubernetes to the equation seems like massively overengineering.

      What value does S2 provide that simple TCP sockets do not?

      Is this for like "making your own Twitch" or something, where streams have to scale to thousands-to-millions of consumers?

      • shikhar 6 minutes ago

        > Is this for like "making your own Twitch" or something, where streams have to scale to thousands-to-millions of consumers?

        Yes, this can be a good building block for broadcasting data streams.

        s2-lite is single node, so to scale to that level, you'd need to add some CDN-ing on top.

        s2.dev is the elastic cloud service, and it supports high fanout reads using Cachey (https://www.reddit.com/r/databasedevelopment/comments/1nh1go...)

        • shikhar 13 minutes ago

          This is fair question. A stream here == a log. Every write with S2 implementations is durable before it is acknowledged, and it can be consumed in real-time or replayed from any position by multiple readers. The stream is granularity of discrete records, rather than a byte stream (although you can certainly layer either over the other).

        • shikhar 2 hours ago

          Shoutout to CodesInChaos for suggesting that instead of a mere emulator, should have an actually durable open source implementation – that is what we ended up building with s2-lite! https://news.ycombinator.com/item?id=42487592

          And it has the durability of object storage rather than just local. SlateDB actually lets you also use local FS, will experiment with plumbing up the full range of options - right now it's just in-memory or S3-compatible bucket.

          > So I'd try so share as much of the frontend code (e.g. the GRPC and REST handlers) as possible between these.

          Right on, this is indeed the case. The OpenAPI spec is also now generated off the REST handlers from s2-lite. We are getting rid of gRPC, s2-lite only supports the REST API (+ gRPC-like session protocol over HTTP/2: https://s2.dev/docs/api/records/overview#s2s-spec)

          • michaelmior 18 minutes ago

            > We are getting rid of gRPC

            I'm curious why and what challenges you had with gRPC. s2-lite looks cool!

            • shikhar 2 minutes ago

              We wanted S2 to be one API. Started out with gRPC, added REST - then realized REST is what is absolutely essential and what most folks care about. gRPC did give us bi-directional streaming for append/read sessions, so we added that as an optional enhancement to the corresponding POST/GET data plane endpoints (the S2S spec I linked to above).

              gRPC ecosystem is also not very uniform despite its popularity, comes with bloat, is a bit of a mess in Python. I'm hoping QUIC enables some innovation here.

          • arpinum 31 minutes ago

            Would be useful to have SlateDB WAL go to Valkey or somewhere else to reduce s3 put costs and latency.

          • DTE 2 hours ago

            Love this. Elegant and powerful. Stateful streams are surprisingly difficult to DIY and as everything becomes a stream of tokens this is super useful tool to have in the toolbox.