• jchw 6 hours ago

    io_uring and Linux's many different types of file descriptors are great. I mean, I personally think that the explicit large API surface of WinNT is kinda nicer than jamming a bunch of weird functionality into files and file descriptors like Linux, but when things work, they do show some nice advantages of unifying everything to some framework, ill-fitting as it may sometimes be (Though now that I say this, it's not like WinNT Objects are really any different here, they just offer more advanced baseline functionality like ACLs). io_uring and it's ability to tie together a lot of pre-existing things in new ways is pretty cool. UNIX never really had a story for async operations, something I will not fault an OS designed 50 years ago for. However, still not having a decent story for async operations today is harder to excuse. I've been excited to learn about io_uring. I've learned a lot listening to conference talk recordings about it. While it has its issues (like the many times it (semi-?)accidentally bypassed security subsystems...) it has some really cool and substantial benefits.

    I'll tell you what I would love to see next: a successor to inotify that does not involve opening one zillion file descriptors to watch a recursive subtree. I'm sure there are valid reasons why it's not easy to just make it happen, but it feels like it will be a major improvement in a lot of use cases. And in many cases, it would probably fix the dreaded problem of users needing to fight against ulimits, especially in text editors like VSCode.

    I don't have anything of great substance to say about the actual subject of the article. It feels a bit late to finally get this functionality proper in Linux after NT had it basically forever, but any improvement is welcome. Next time I'm doing something where I want to wait on a bunch of FDs I will have to try this approach.

    • hansvm 5 hours ago

      > inotify

      A hack that should be performant enough if properly implemented would be a custom FUSE implementation over the directory. As a one-off it could just do the callbacks you want done, or as a reusable component it could implement the inotify behavior you want.

      • trws 6 hours ago

        An inotify replacement that can work at whole FS level (and doesn’t require root/admin like the existing option) would be amazing. To be honest, I don’t see a reason it would be hard at the whole filesystem or perhaps mount level unless there are security ramifications. Restricting it to a subdirectory might be tricky though.

        • iknowstuff 4 hours ago
          • jchw 3 hours ago

            It feels like last time I looked into this, fanotify was for some reason not suitable for most inotify use cases. Maybe this has changed. Would be great news if so.

        • hosh 4 hours ago

          Discussion thread in the Erlang community proposing implementing io_uring for BEAM, security issues, and a digression comparing it to FreeBSD's kqueues

          https://erlangforums.com/t/erlang-io-uring-support/765/18?pa...

          • ashvardanian 2 hours ago

            Surprisingly, I only came across Francesco's blog this month. I stumbled upon the 2021 post "Speeding up atan2f by 50x" while searching for others who have to reimplement trigonometry in SIMD every other year. I've also enjoyed "Beating the L1 cache with value speculation" from the same year, as well as the 2013 Agda sorting example.

            Highly recommend checking it out: https://mazzo.li/archive.html

            • KerrAvon 32 minutes ago

              Wikipedia:

              > In June 2023, Google's security team reported that 60% of the exploits submitted to their bug bounty program in 2022 were exploits of the Linux kernel's io_uring vulnerabilities. As a result, io_uring was disabled for apps in Android, and disabled entirely in ChromeOS as well as Google servers.[11] Docker also consequently disabled io_uring from their default seccomp profile.[12]

              Root privilege CVE from earlier this year (2024): https://nvd.nist.gov/vuln/detail/CVE-2024-0582

              • 4hg4ufxhy 5 hours ago

                Very interesting, but unfortunate there is no example program. I guess that is left as exercise for reader, but it's a bit daunting for a non systems programmer.

              • refulgentis 4 hours ago

                It took me many io_uring hello world articles to find out it's not really used in production (ex. Android and ChromeOS both disable it) because it was, and continues to be, a source of an absolutely bonkers outsized # of security issues.

                I don't remember much more than that*, but just dropping it here because I learned a ton more from reading about that, than my Nth io_uring article.

                * for example, the article mentioning relevant buffers are shared with the system made me want to say "aHA, yes, that's what the security articles said was a core issue!" -- but I can't actually remember with 100% confidence

                • MathMonkeyMan 44 minutes ago

                  Someone linked to a kernel mailing list recently, I don't know if it was in a submission or in a comment.

                  The security issue with io_uring, as I understand it, is that it bypasses a lot of Linux's security auditing mechanisms. The problem is that, like with ioctl, if the kernel called out to a security subsystem with "here's something that the user wants to do with this file," the security subsystem would have to know what "something" means for every driver. Impossible; so do you allow most things? Deny them? If you choose the former, now there are gaping security holes. If you choose the latter, then enabling security will break too many things.

                  • loeg 3 hours ago

                    Well, it's not true that it isn't used in production. Google has been burned and at least historically did not use it. But I know some services at Facebook use it in production.

                    Yes, historically it was a big source of security bugs. I think that has tapered off somewhat as the rate of change slows down.

                  • jeffbee 6 hours ago

                    Some of the things that you cannot wait on using io_uring are your kernel actually supporting the feature mentioned in the article, io_uring actually working properly, and io_uring solving its seemingly bottomless supply of local user exploits. In the early days of this feature I was bullish but the way its implementation has emitted CVEs has not been a source of joy, and now many major Linux operators have banned the API internally. Maybe what is needed is a moment of reflection and a scratch reimplementation that learns the lessons of io_uring?

                    • loeg 3 hours ago

                      A new from-scratch implementation would suffer from a similar problem as early io_uring did (high rate of code change, which seems to be what drives security bug rates).

                      • KerrAvon 29 minutes ago

                        Isn't something fundamentally broken with either the kernel or the adoption process if that's true? It seems like you should be able to do fast async I/O without the kind of privilege escalation vulnerabilities that are still happening.