Interesting read, I was comparing some of these tools earlier for small web shop use while I didn't proceed to setup any of them just yet. Demoed Elastic, SigNoz and Grafana Loki, of which Alloy+Loki seemed to make most sense for my needs and didn't cause too much headache setting up on a tiny VM, so that I would have collection going in the first place and a decent method to grep through it.
Currently collecting just exception data from services to GlitchTip (Sentry fork), seems most valuable sysadmin-wise while having most security etc. concerns outsourced to managed hosting companies.
Was left curious what anomaly detection methods Elastic has built-in would take to DIY <https://www.elastic.co/guide/en/machine-learning/current/ml-...> with data frame / statistics / ML libraries (Clojure Noj).
I see lot of hype around ClickHouse these days. Few years ago I remember TimescaleDB making the rounds, arguably being predecessor for this sort of "observability on SQL" thinking. The article has short paragraph mentioning Timescale, but unfortunately it doesn't really go into comparing it to ClickHouse. How does HN see the situation these days, is ClickHouse simply overtaking Timescale on all axis? That sounds bit of a shame; I have used Timescale a bit and enjoyed it, but just on such small scale that it's operational aspects did not really come up.
ClockHouse outperforms TimescaleDB in every aspect on large volumes of data. https://benchmark.clickhouse.com/
If you have small volumes of data (let's say less than a terabyte of data), then TimescaleDB is OK to use if you are OK with not so fast query performance.
Clickhouse has been popular for many years. Even before Timescale.
Timescale "doesn't scale" - in a nutshell.
Clikchouse performance is better because it's truly column oriented and it has powerful partitioning tools.
However, Clickhouse has quirks and isn't great if you need low latency data updates or if your data is mutable.
I echo other's sentiment, ClickHouse is much more performant than TimeScale.
Interestingly, I recently interviewed Samuel Colvin, Pydantic's author, and he said when designing his observability Saas called LogFire, he tried multiple backends, including ClickHouse.
But it didn't work out.
One of the reasons is LogFire allows the users to fetch the service data with arbitrary SQL queries.
So they had to build their own backend in rust, on top of DataFusion.
I used ClickHouse myself and it's been nice, but it's easy when you get to decide what schema you need yourself. For small to medium needs, this plus Grafana works well.
But I must admit that the plug and play aspect of great services like Sentry or LogFire make it so easy to setup it's tempting to skip the whole self hosting. They are not that expensive (unlike datadog), and maintaining your observability code is not free.
What kinds of SQL queries could ClickHouse not handle? Were the limitations about expressivity of queries, performance, or something else? I'm considering using CH for storing observability (particularly tracing) data, so I'm curious about any footguns or other reasons it wouldn't be a good fit.
There is at least one basic factual error in this blog post, which makes me discount the whole thing.
"But if you will use it, keep in mind that [InfluxDB] uses Bolt as its data backend."
Simply not true. The author seems to have confused the storage that Raft consensus uses for metadata with that used for the time series data. InfluxDB has its own custom data storage layer for time series data, and has had so for many years. A simple glance at the InfluxDB docs would make this clear.
(I was once part of the core database team at InfluxDB and have edited my comment for clarity.)
Such a PITA. Unless you have a dedicated team to handle observability, you are in for pain, no matter the tech stack you use.
That's not truth. There are solutions for logging, which are very easy to setup and operate. For example, VictoriaLogs [1] (I'm its' author). It is designed from the grounds up to be easy to configure and use. It contains a single self-contained executable without external dependencies, which runs optimally on any hardware starting from Raspberry Pi and ending with a monster machine containing hundreds of CPU cores and terabytes of RAM. It accepts logs over all the popular data ingestion protocols [2]. It provides very easy to use query language for typical querying tasks over logs - LogsQL [3].
[1] https://docs.victoriametrics.com/victorialogs/
[2] https://docs.victoriametrics.com/victorialogs/data-ingestion...
Interesting, although the doc is not really user-friendly and doesn't show a lot of screenshots from the UI to get a sense of what the product can do
ClickHouse + Grafana is definitely a fantastic choice, here is another blog from ClickHouse talking about dogfooding their own technology and save millions:
https://clickhouse.com/blog/building-a-logging-platform-with...
(Full disclosure: I work for ClickHouse and love it here!)
Another project I want to give shout out to is Databend. It's built around the idea of storing your data at S3-compatible storage as Parquet files, and querying as SQL or other protocol.
Like many popular Data Lake solutions, but it's open-source and written in Rust, which means quite easy to extend for many who know it already.
Interesting! Makes me wonder how they pan out compared to datafusion, which seems to have a lot of traction.
Databend performance looks good! https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...
It looks like it has slightly worser on-disk data compression than ClickHouse, and slightly worser performance for some query types when the queried data isn't cached by the operating system page cache, according to the link above (e.g. when you query terabytes of data, which doesn't fit RAM).
Are there additional features other than S3 storage, which can convince ClickHouse user switching to Databend?
Completely rewriting a system because you don't like JSON is a bit extreme