After a brief skim, it looks like this implementation is highly optimized for throughput and broadcasts whereas a channel has many other usecases.
Consumers subscribing to the same event type are placed in a group. There is a single lock for the whole group. When publishing, the lock is taken once and the event is replicated to each consumer's queue. Consumers take the lock and swap their entire queue buffer, which lets them consume up to 128 events per lock/unlock.
Since channels each have a lock and only take 1 element at a time, they would require a lot more locking and unlocking.
There is also some frequent polling to maintain group metadata, so this could be less ideal in low volume workloads where you want CPU to go to 0%.
Seems that the trick would be detecting if there is a queue building up and dispatching multiple events per lock if so. Double buffering is a common enough solution here. The reader gets one buffer to write to and the writer gets another, and when the read buffer is drained the write buffer is swapped.
For low traffic messages you need only send one message at a time, but if the receiver slows down the sender can avoid resorting to back pressure until the buffer is more than half full.
> Generic: Works with any type implementing the Event interface.
Isn’t this the opposite? Generic is usually implying any type would do.
OP: the readme could really benefit from a section describing the underlying methodology, and comparing it to other approaches (Go channels, LMAX, etc...)
It’s a fairly standard broadcaster based on sync.Cond.
Why are channels so much slower? I would expect a channel to operate basically like a ring buffer + a semaphore.
Channels allow for many-to-many communication.
To a first approximation, you can imagine any decently optimized concurrency primitives as being extremely highly optimized, which means on the flip side that no additional capability, like "multi-to-multi thread communication", ever comes for free versus something that doesn't offer that capability. The key to high-performance concurrency is to use as little "concurrency power" as possible.
That's not a Go-specific thing, it's a general rule.
Channels are in some sense more like the way dynamic scripting languages prioritize ease-of-use and flexibility over performance-at-all-costs. They're a very powerful primitive, and convenient in their flexibility, but also a pretty big stick to hit a problem with. Like dynamic scripting languages being suitable for many tasks despite not being the fastest things, in a lot of code they're not the performance problem, but if you are doing a ton of channel operations, and for some reason you can't do the easy thing of just sending more work at a time through them, you may need to figure out how to use simpler pieces to do what you want. A common example is, if you've just got a counter of some kind, don't send a message through a channel to another goroutine to increment it; use the atomic increment operation in the sync/atomic package.
(If you need absolute performance, you probably don't want to use Go. The runtime locks you away from the very lowest level things like memory barriers; it uses them to implement its relatively simple memory model but you can't use them directly yourself. However, it is important to be sure that you do need such things before reaching for them.)
What's a concrete example of "concurrency power" here?
I never benchmarked this, so just guessing from principles, take this with a grain of salt. Channel isn't a broadcast mechanism (except when you call close on the channel), so a naive channel-based broadcaster implementation like the one you find in bench/main.go here uses one channel for each subscriber; every event has to be sent on every subscriber channel. Condition variable on the other hand is a native broadcast mechanism. I imagine it's possible to leverage channel close as a broadcast mechanism to achieve similar performance.
Edit: https://news.ycombinator.com/item?id=44416345 seems to have done a much more detailed analysis of the code. There's likely more to this.
you can indeed use channels to implement sync.Cond functionnality, I came across this article a while ago : https://blogtitle.github.io/go-advanced-concurrency-patterns... (scroll down to Condition)
The actual code and the actual bench is very short.
Reminds me when Zeromq (scalable networked communications) promoted in-process queues for communicating between components.
It’s always worth discussing what features were thrown out to get the performance boost, whether it’s fair for those features to impose a tax on all users who don’t or rarely use those features, and whether there’s a way to rearrange the code so that the lesser used features are a low cost abstraction, one that you mostly only pay if you use those features and are cheap if not free if you don’t.
There’s a lot of spinoff libraries out there that have provoked a reaction from the core team that cuts down cost of their implementation by 25, 50%. And that’s a rising tide that lifts all boats.
This might be useful to some if you need a very light pub/sub inside one process.
I was building a small multiplayer game in Go. Started with a channel fan-out but (for no particular reason) wanted to see if we can do better. Put together this tiny event bus to test, and on my i7-13700K it delivers events in 10-40ns, roughly 4-10x faster than the plain channel loop, depending on the configuration.
I've recently wrote about something similar https://gethly.com/blog/lockless-golang
> about 4x to 10x faster than channels.
I'd be interested to learn why/how and what the underlying structural differences are that make this possible.
I didn't look, but I don't think of channels as a pub/sub mechanism. You can have a producer close() a channel to notify consumers of a value available somewhere else, or you can loop through a bunch of buffered channels and do nonblocking sends.
A different design, without channels, could improve on those.
I prefer to think of channels as a memory-sharing mechanism.
In most cases where you want to send data between concurrent goroutines, channels are a better primitive, as they allow the sender and receiver to safely and concurrently process data without needing explicit locks. (Internally, channels are protected with mutexes, but that's a single, battle-tested and likely bug-free implementation shared by all users of channels.)
The fact that channels also block on send/receive means and support buffering means that there's a lot more to them, but that's how you should think of them. The fact that channels look like a queue if you squint is a red herring that has caused many a junior developer to abuse them for that purpose, but they are a surprisingly poor fit for that. Even backpressure tends to be something you want to control manually (using intermediate buffers and so on), because channels can be fiendishly hard to debug once you chain more than a couple of them. Something forgets to close a close a channel, and your whole pipeline can stall. Channels are also slow, requiring mutex locking even in scenarios where data isn't in need of locking and could just be passed directly between functions.
Lots of libraries (such as Rill and go-stream) have sprung up that wrap channels to model data pipelines (especially with generics it's become easier to build generic operators like deduping, fan-out, buffering and so on), but I've found them to be a bad idea. Channels should remain a low-level primitive to build pipelines, but they're not what you should use as your main API surface.
> Channels should remain a low-level primitive to build pipelines, but they're not what you should use as your main API surface.
I remember hearing (not sure where) that this is a lesson that was learned early on in Go. Channels were the new hotness, so let's use them to do things that were not possible before. But it turned out that Go was better for doing what was already possible before, but more cleanly.
Interesting, I need to dig into the guts of this because this seems cool.
I'm a bit out of practice with Go but I never thought that the channels were "slow", so getting 4-10x the speed is pretty impressive. I wonder if it shares any design with LMAX Disruptor...
> I wonder if it shares any design with LMAX Disruptor...
I've recently switched from using Disruptor.NET to Channel<T> in many of my .NET implementations that require inter-thread sync primitives. Disruptor can be faster, but I really like the semantics of the built-in types.
https://learn.microsoft.com/en-us/dotnet/core/extensions/cha...
https://learn.microsoft.com/en-us/dotnet/api/system.threadin...
I've never used Disruptor.NET, only the Java version.
I personally will use traditional Java BlockingQueue for about 95% of stuff, since they're built in and more than fast enough for nearly everything, but Disruptor kicks its ass when dealing with high-throughput stuff.
The .NET version is compelling because it has a special variant called ValueDisruptor that can take a struct instead of a class. This gives it a big edge in certain use cases:
https://medium.com/@ocoanet/improving-net-disruptor-performa...
This is pretty neat, code looks minimal as well. At pico.sh we wrote our own pubsub impl in Go that leveraged channels. We primarily built it to use with https://pipe.pico.sh
https://github.com/picosh/pubsub
With this impl can you stream data or is it just for individual events?
> High Performance: Processes millions of events per second, about 4x to 10x faster than channels.
Wow - that’s a pretty impressive accomplishment. I’ve been meaning to move some workers I have to a pub/sub on https://www.typequicker.com.
I might try using this in prod. I don’t really need the insane performance benefits as I don’t have my traffic lol - but I always like experimenting with new open source libraries - especially while the site isn’t very large yet
that's a cool site. you will see me more frequently btw do some tech twitter promos.
Thank you - appreciate it! :)
> btw do some tech twitter promos.
Yes - I have plans to do. Still relatively new to the marketing, SEO, etc world. Recently quit my job to build products (TypeQuicker is the first in line) and up until now I've only ever done software dev work.
How would you suggest to do twitter promos - just post consistently about the app features and such?
"Processes millions of events per second" - yes, sure, when there is nothing to process. But that is not representative of a real app.
Add a database call or some simple data processing and then show some numbers comparing between channels or throughput.
I hate these kind of claims. Similar with web frameworks that shows reqs/s for an empty method.
How else do you compare “web frameworks” except foe comparing their overhead?
No everyone wants to write a database application. There are absolutely other types of applications in the world. Applications can be CPU and/or memory bound.
I see what you’re getting at, but if you add a database call the I/O blocking time will completely eclipse CPU time. It would not be a useful comparison, similar to if you added a time.Sleep to each event handler.
The reviews by some other people here lead me to believe that it works fairly well both when there is something to process and when it's just chattiness.
If you mean Amdahl's Law (and maybe Little), messaging is generally considered part of the unavoidable. However TCP and this library both seem to be aware that while you cannot eliminate messaging overhead, you can amortize it over dozens and dozens of messages at a time and that reduces that bit of serialized cost by more than an order of magnitude.