With this tool I am wary that I'll encounter system issues that are dramatically more difficult to diagnose and troubleshoot because I'll have drifted from a standard distro configuration. And in ways I'm unaware of. Is this a reasonable hesitation?
Disclaimer: I work for Oracle, who publish this tool, though I have nothing to do with the org or engineers that created it
I've been running this for a while on my laptop. So far yet to see any particular weirdness, but also I don't know that I can state with any confidence it has a positive impact either. I've not carried out any benchmarks in either direction.
It logs all changes that it's going to make including what they were on before. Here's an example from my logs:
bpftune[1852994]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
bpftune[1852994]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 7864320) -> (4096 131072 9830400)
>"bpftune logs to syslog so /var/log/messages will contain details of any tuning carried out." (from OP GitHub readme)
The rmem example seems to allay fears that it will make changes one can't reverse.
It’s not a questions of being able to reverse. It’s a question of being able to diagnose that one of these changes even was the problem and if so which one.
If they can be reversed individually you can simply deduce by rolling back changes one by one, no?
If your staging doesn’t do capacity checks in excess of what production sees, yes.
Yes, it is. IMO, except for learning (which should not be done in prod), you shouldn’t make changes that you don’t understand.
The tools seems to mostly tweak various networking settings. You could set up a test instance with monitoring, throw load at it, and change the parameters the tool modifies (one at a time!) to see how it reacts.
I'd run such a tool on prod in "advice mode". It should suggest the tweaks, explaining the reasoning behind them, and listing the actions necessary to implement them.
Then humans would decide if they want to implement that as is, partly, modified, or not at all.
Fair point, though I didn’t see any such option with this tool.
It's developed in the open; we can create Github issue.
Fascinating!
I'd like to hear from people who are running this. Is it effective? Worth the setup time?
> bpftune is designed to be zero configuration; there are no options
On behalf of every junior administrator, overworked IT admin, and security-concerned "cattle" wrangler, thank you.
Having to learn a thousand+ knobs & dials means most will never be touched. I for one welcome automated assistance in this area, even if the results are imperfect.
I think it’s still important to know what those dials and knobs do, otherwise (as the currently top-voted comment says) when things break, you’ll be lost.
Interesting. But if tuning parameters to their best values were easy, shouldn't the kernel just do that in the first place?
I would reverse the question: if it can be done by a BPF module, why should it be in the kernel?
Distributions turning it on by default is another story. Maybe it deserves to be shiped on all the time but that's not the same thing as being part of the kernel.
Indeed!
The kernel might already be too monolithic.
This kernel parameters optimisation reminds me of PGO compilation in programs.
Yet, perhaps the kernel could come with multiple defaults config files, each being a good base for different workloads: server, embedded, laptop, mobile, database, router, etc.
I’d rather the kernel present a good-enough but extremely stable set of configs. If I’m using a distro like Arch or Gentoo, then sure, maybe run wild (though both of those would probably assume I’m tuning them anyway), but CentOS, Debian, et al.? Stable and boring. If you change something, you’d better know what it is, and why you’re doing it.
This doesn't necessarily find the best parameters, and it doesn't necessarily do it easily. From my reading, it will converge on a local optimum, and it may take some time to do that.
In theory, I don't see why the kernel couldn't have a parameter-auto-tune similar to this. In practice, I think the kernel has to work in so many different domains, it'd be impossible to land on a "globally good enough" set of tuning heuristics.
I'm far from a kernel developer, so I'm ready to be corrected here.
IMO if we ever see something like this deployed widely, it will be because a popular distribution decided to install it by default.
It depends on workload. This tool generates recommended config for that specific machine workload. App Nodes can have completely different recommendations vs Database Nodes. It will be completely different for Workstation.
Sure, but the kernel could just do the same. Of course the kernel is already too big. Is BPF the right level to make it more modular? Just thinking, I don't think I have the answer.
Two words: “feedback loop”.
That was the first idea that jumped in when thinking in what could go wrong, not because the Linux kernel, or BPF or this program, just for how it is intended to work. There might be no risk of that happening, there may be controls around that, or if they happen they might only converge to an estable state, but still it is something to have in the map.
> or if they happen they might only converge to an stable state
That one will always be dependent on the usage patterns. So the auto-tuner can't guarantee it.
Also, I imagine the risk of the feedback turning positive is related to the system load (not CPU, but the usage of the resources you are optimizing). If so, it will make your computer less able to manage load. But this can still be useful for optimizing for latency.
I wonder how effective this would be in multi-tenant environments like shared k8s clusters. On the one hand, each application running will have a different purpose and will move around between nodes over time, but on the other hand there are likely broad similarities between most applications.
BTW one can use it out of the box with CachyOS.
After installation -> CachyOS Hello -> Apps/Tweaks
Is tuning the TCP buffer size for instance worth it?
It depends. At home - probably not. On a fleet of 2000 machines where you want to keep network utilisation close to 100% with maximal throughput, and the non-optional settings translate to a not-trivial value in $ - yes.
TCP parameters are a classic example of where an autotuner might bite you in the ass...
Imagine your tuner keeps making the congestion control more aggressive, filling network links up to 99.99% to get more data through...
But then any other users of the network see super high latency and packet loss and fail because the tuner isn't aware of anything it isn't specifically measuring - and it's just been told to make this one application run as fast as possible.
It's great how it grew out of simple packet filtering into tracing and monitoring. It's one of those great tools most should know. Been using it for years.