Comments Page - Bpftune uses BPF to auto-tune Linux systems

« Back Bpftune uses BPF to auto-tune Linux systemsgithub.comSubmitted by BSDobelix 5 hours ago

gausswho 2 hours ago
With this tool I am wary that I'll encounter system issues that are dramatically more difficult to diagnose and troubleshoot because I'll have drifted from a standard distro configuration. And in ways I'm unaware of. Is this a reasonable hesitation?
- Twirrim 16 minutes ago
  Disclaimer: I work for Oracle, who publish this tool, though I have nothing to do with the org or engineers that created it
  I've been running this for a while on my laptop. So far yet to see any particular weirdness, but also I don't know that I can state with any confidence it has a positive impact either. I've not carried out any benchmarks in either direction.
  It logs all changes that it's going to make including what they were on before. Here's an example from my logs:
  bpftune[1852994]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput bpftune[1852994]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 7864320) -> (4096 131072 9830400)
- pbhjpbhj 2 hours ago
  >"bpftune logs to syslog so /var/log/messages will contain details of any tuning carried out." (from OP GitHub readme)
  The rmem example seems to allay fears that it will make changes one can't reverse.
  admax88qqq 2 hours ago
  It’s not a questions of being able to reverse. It’s a question of being able to diagnose that one of these changes even was the problem and if so which one.
  nehal3m 13 minutes ago
  If they can be reversed individually you can simply deduce by rolling back changes one by one, no?
- trelliscoded an hour ago
  If your staging doesn’t do capacity checks in excess of what production sees, yes.
- sgarland 2 hours ago
  Yes, it is. IMO, except for learning (which should not be done in prod), you shouldn’t make changes that you don’t understand.
  The tools seems to mostly tweak various networking settings. You could set up a test instance with monitoring, throw load at it, and change the parameters the tool modifies (one at a time!) to see how it reacts.
  nine_k an hour ago
  I'd run such a tool on prod in "advice mode". It should suggest the tweaks, explaining the reasoning behind them, and listing the actions necessary to implement them.
  Then humans would decide if they want to implement that as is, partly, modified, or not at all.
  sgarland an hour ago
  Fair point, though I didn’t see any such option with this tool.
  nine_k an hour ago
  It's developed in the open; we can create Github issue.
  Actually https://github.com/oracle/bpftune/issues/99
bloopernova 5 hours ago
Fascinating!
I'd like to hear from people who are running this. Is it effective? Worth the setup time?
mrbluecoat 2 hours ago
> bpftune is designed to be zero configuration; there are no options
On behalf of every junior administrator, overworked IT admin, and security-concerned "cattle" wrangler, thank you.
Having to learn a thousand+ knobs & dials means most will never be touched. I for one welcome automated assistance in this area, even if the results are imperfect.
- sgarland an hour ago
  I think it’s still important to know what those dials and knobs do, otherwise (as the currently top-voted comment says) when things break, you’ll be lost.
usr1106 4 hours ago
Interesting. But if tuning parameters to their best values were easy, shouldn't the kernel just do that in the first place?
- RandomThoughts3 4 hours ago
  I would reverse the question: if it can be done by a BPF module, why should it be in the kernel?
  Distributions turning it on by default is another story. Maybe it deserves to be shiped on all the time but that's not the same thing as being part of the kernel.
  jiehong 4 hours ago
  Indeed!
  The kernel might already be too monolithic.
  This kernel parameters optimisation reminds me of PGO compilation in programs.
  Yet, perhaps the kernel could come with multiple defaults config files, each being a good base for different workloads: server, embedded, laptop, mobile, database, router, etc.
- sgarland an hour ago
  I’d rather the kernel present a good-enough but extremely stable set of configs. If I’m using a distro like Arch or Gentoo, then sure, maybe run wild (though both of those would probably assume I’m tuning them anyway), but CentOS, Debian, et al.? Stable and boring. If you change something, you’d better know what it is, and why you’re doing it.
- onetoo 4 hours ago
  This doesn't necessarily find the best parameters, and it doesn't necessarily do it easily. From my reading, it will converge on a local optimum, and it may take some time to do that.
  In theory, I don't see why the kernel couldn't have a parameter-auto-tune similar to this. In practice, I think the kernel has to work in so many different domains, it'd be impossible to land on a "globally good enough" set of tuning heuristics.
  I'm far from a kernel developer, so I'm ready to be corrected here.
  IMO if we ever see something like this deployed widely, it will be because a popular distribution decided to install it by default.
- nitinreddy88 4 hours ago
  It depends on workload. This tool generates recommended config for that specific machine workload. App Nodes can have completely different recommendations vs Database Nodes. It will be completely different for Workstation.
  usr1106 4 hours ago
  Sure, but the kernel could just do the same. Of course the kernel is already too big. Is BPF the right level to make it more modular? Just thinking, I don't think I have the answer.
gmuslera 3 hours ago
Two words: “feedback loop”.
That was the first idea that jumped in when thinking in what could go wrong, not because the Linux kernel, or BPF or this program, just for how it is intended to work. There might be no risk of that happening, there may be controls around that, or if they happen they might only converge to an estable state, but still it is something to have in the map.
- marcosdumay 2 hours ago
  > or if they happen they might only converge to an stable state
  That one will always be dependent on the usage patterns. So the auto-tuner can't guarantee it.
  Also, I imagine the risk of the feedback turning positive is related to the system load (not CPU, but the usage of the resources you are optimizing). If so, it will make your computer less able to manage load. But this can still be useful for optimizing for latency.
nevon 3 hours ago
I wonder how effective this would be in multi-tenant environments like shared k8s clusters. On the one hand, each application running will have a different purpose and will move around between nodes over time, but on the other hand there are likely broad similarities between most applications.
BSDobelix 4 hours ago
BTW one can use it out of the box with CachyOS.
After installation -> CachyOS Hello -> Apps/Tweaks
robinhoodexe 4 hours ago
Is tuning the TCP buffer size for instance worth it?
- viraptor 4 hours ago
  It depends. At home - probably not. On a fleet of 2000 machines where you want to keep network utilisation close to 100% with maximal throughput, and the non-optional settings translate to a not-trivial value in $ - yes.
  londons_explore 2 hours ago
  TCP parameters are a classic example of where an autotuner might bite you in the ass...
  Imagine your tuner keeps making the congestion control more aggressive, filling network links up to 99.99% to get more data through...
  But then any other users of the network see super high latency and packet loss and fail because the tuner isn't aware of anything it isn't specifically measuring - and it's just been told to make this one application run as fast as possible.
bastloing 4 hours ago
It's great how it grew out of simple packet filtering into tracing and monitoring. It's one of those great tools most should know. Been using it for years.