Comments Page - Rewriting Every Syscall in a Linux Binary at Load Time

« Back Rewriting Every Syscall in a Linux Binary at Load Timeamitlimaye1.substack.comSubmitted by riteshnoronha16 6 days ago

xelaboi 2 days ago
You either have a writing style that is uncannily similar to what an LLM generates, or this article was substantially written by an LLM. I don't know what it is about the style, but I just find it a bit exhausting, like an overfit on "engaging writing" that strips away sincerity.
- notepad0x90 a day ago
  I think it's better to just adapt to this. A lot of people write the content their own way, and get AI to rewrite it so that it is more readable, and free from errors. Content over appearance and all. I think the problem is you consider this auto-completion tool insincere. many do as well, because they anthropomorphize LLMs, it feels like a different sentient entity wrote it than the person posting it. but in reality, that isn't the case; it's more like a spellchecker that helped the person communicate their idea.
  The purpose of language is to communicate meaning and intent, not to sound or feel a particular way, unless you're reading for entertainment or enjoyment.
  This is the second post I'm commenting on within a span of like 30 minutes where someone did some really good work and shared it, but the top comments are complaining about AI usage.
  Either LLM-assisted content needs to be banned entirely (might be), or complaining about it should be considered a breach of etiquette at sites like HN that are tech-centric.
  xelaboi a day ago
  Appearance and style is content, and it always was. The way you write is fundamentally a part of how a reader interprets meaning and intent.
  Calling it a spellchecker is simply wrong if you give an LLM some bullet points and then instruct it to write an article. I find it more insincere because it's an extra layer between the author and the reader which substantially affects every aspect of the piece of writing, not just the spelling of individual words, or Microsoft Word nagging you to avoid passive voice.
  If OP is not a native English speaker and is using an LLM to create a reasonable prose, then it might be the best way for them to try and communicate their ideas. It's probably better than Google translate. It affects how the reader interprets the writing, though.
  My other point, which I also stand by, is that I find the default writing style of current LLMs exhausting to read. It feels like a college student has submitted an assignment on engaging writing and decided to use every technique they could find in their textbook, because they want to get top marks. It just feels forced to me.
  --------------------------------
  As an example, I asked claude to make my argument more "clear". See how it wrote it:
  Style isn't separate from content — it is content. The way something is written shapes how a reader interprets its meaning, and that's always been true. Calling an LLM a "spellchecker" only holds if it's catching typos. The moment you hand it bullet points and ask it to produce an article, it's not correcting your writing — it's replacing it. That's a fundamentally different thing.
  I'll grant one exception: if someone isn't a fluent English speaker and uses an LLM to bridge that gap, that's a legitimate trade-off, even if it still changes how the reader experiences the piece.
  But my broader complaint stands independent of that debate: current LLMs produce a recognizable, exhausting prose style. Every sentence is engineered to be "engaging." Every paragraph hits the expected beats. It reads like someone who learned to write from a listicle about writing — technically compliant, but hollow. The effort to sound compelling ends up undercutting any sense that a real person with a real perspective is behind it.
  notepad0x90 21 hours ago
  > If OP is not a native English speaker and is using an LLM to create a reasonable prose, then it might be the best way for them to try and communicate their ideas. It's probably better than Google translate. It affects how the reader interprets the writing, though.
  That's just crazy, do you think people don't get discriminated because of that? they'll probably get flagged and blacklisted from HN just because of sharing a post riddled with grammar mistakes, it will look like spam to many. If they get lucky, the top comments would be correcting their grammar mistakes, not about the content.
  If you didn't talk to me before today, you don't know how I talk. You don't know what sincere is like. the term you're looking for is authentic not sincere. questioning the sincerity of the OP is just wrong. You don't like people having control over how what they have to say is conveyed to others, because you have some irrational bias against the usage of a particular tool.
  You argue and even use AI (you don't mind being insincere? I'd like to get your own original arguments, how about that?) to dismiss content because of style, thereby justifying the need for people to be careful of the style of the post they share. Have you considered that had they not used AI, you or others would be dismissing their post for other style-related reasons? because you care about style so much.
  But you're right, style is content, it was wrong of me to claim otherwise. What I meant was probably "meaning". The writing style affects how you read the content, in this case you don't like how it forces you to read it, but the meaning OP is trying to communicate (what I meant by "content") is being glossed over.
  The take away for me from this discussion, is people need to use better prompts, and better models, not that they shouldn't use an LLM, because even when their grammar and spelling is wrong, they get nitpicked against this way.
  > The effort to sound compelling ends up undercutting any sense that a real person with a real perspective is behind it.
  That's a fault and a bias by the reader, in my opinion. I didn't even think it was LLM written, I wasn't looking for it (we tend to find what we're looking for?). My focus was on what was done, validating the claims made, and analyzing the implications. I didn't care how they sounded, because I was able to actually read the content, and understand what they were saying. If it was the other way, and I was the OP, I would want people to focus on what I was saying, and appreciate that I took some action to ensure my post is readable.
  I think they can use better prompts to make it sound and feel better, but it's a real shame that they have to. It is this sort of an interaction that makes me wish we had more LLMs making decisions instead of humans out there. Things like accents, writing styles, even last names, and spelling mistakes decide the fate of many today. The real value people bring, the real human potential is dismissed (not in this case, just making a general observation), cosmetic and performative factors override all else.
  > it's not correcting your writing — it's replacing it. That's a fundamentally different thing.
  It is my writing, in that I agreed the meaning of the rewritten content is what I intended to communicate. People get to have agency on how their meaning is conveyed. You don't have any say over that. Your criticism over how it feels, although I disagree, is legitimate, but your criticism based solely on the fact that AI rewrote the content is entirely invalid.
  Let's imagine OP had a human copy write for them, editing and rewriting the entire content, would that change anything? If not, why are we talking about LLMs instead of the specifics of what bothered you uniquely, so that people reading this thread can use better prompts to avoid those annoying pitfalls?
  I didn't even pick up on this being AI rewritten, I'm only taking yours and others' word for it. My biggest concern these days is that kids are growing up interacting with LLMs a lot, and their original work will be dismissed by older people because it sounds like an LLM. There are many cases of students having their work and exams dismissed, even facing disciplinary actions leading up to lawsuits, where teachers/academics claimed wrongly it was LLM generated content (and why I keep feeling that perhaps LLMs should replace those biased academics and teachers if possible).
  LLM usage isn't going away, perhaps prompts and models will improve, but more likely than not, it is more economical and practical for humans to be forced to adapt one way or the other, to regular LLM usage by other humans. If you skip in 50 year increments and read books or news stories, you'll also see how the writing style and "feel" is very different. There is a very distinctive "feel" to how people on HN write, compared to reddit, gaming discord servers, twitter, bluesky, or the comments section of some conservative site. You'll see some groups use terms like "bro" and "bruh" a lot, others end everything with "lol", others yet include emoji in everything. All this will feel very weird and inappropriate to someone from the 1800s. I am not saying all that to dismiss your observations, but to say that this stuff isn't all that important. If you didn't think the cause of the annoying writing style was an LLM, I doubt you would have commented on it, so don't comment at all about it is my suggestion. There was no egregious writing style offense that was so serious that we need to talk about it, instead of the actual work OP is sharing.
- renewiltord 2 days ago
  It’s clearly LLM written but the idea was interesting enough that I read it. I suspect based on username the writer is cleaning up their voice.
  I think the idea of sharing the raw prompt traces is good. Then I can feed that to an LLM and get the original information prior to expansion.
- nonameiguess 2 days ago
  Name sounds very likely not an English speaker. And the one reply here to a top-level comment is extremely obvious. I think it's unfortunate that people who write English poorly feel the need to do it, but I get it at least. The person behind this probably has a real interest and knowledge in the space but feels they can't communicate it without assistance.
  It is too bad, though. People bad at English will themselves be reading this forever now and think this is the way real people write, speak, or are supposed to.
  It's many things. The relentless ethusiasm about everything. Prefacing any answer to a question with an affirmation that it was a good question first. And yes, sorry, pedants of the web who feel witch-hunted because you knew how to employ keyboard shortcuts and used em-dashes in 2015 and have the receipts to prove it -- you never used 17 in the span of a single page. I think that was the first I can remember using ever and I had to contrive a way to do it where a semi-colon wouldn't clearly work better.
- qbane 2 days ago
  There is even a table copy-pasted into a paragraph without noticing.
  > What’s needed is something different:
  > Requirement ptrace seccomp eBPF Binary rewrite Low overhead per syscall No (~10-20µs) Yes Yes Yes [...]
jmillikin 2 days ago
This might be a very dumb question, but if the process is being run under KVM to catch `int 0x03` then couldn't you also use KVM to catch `syscall` and execute the original binary as-is? I don't understand what value the instruction rewriting is providing here.
- rep_lodsb 2 days ago
  Yes, that seems unneccessary. The overhead of trapping and rewriting every syscall instruction once can't be (much) greater than that required for rewriting them at the start either.
  Even if you disallow executing anything outside of the .text section, you still need the syscall trap to protect against adversarial code which hides the instruction inside an immediate value:
  foo: mov eax, 0xc3050f ;return a perfectly harmless constant ret ... call foo+1
  (this could be detected if the tracing went by control flow instead of linearly from the top, but what if it's called through a function pointer?)
  rep_lodsb 2 days ago
  Thinking a bit more about it (and reading TFA more carefully), what's the point of rewriting the instructions anyway?
  I first assumed it was redirecting them to a library in user mode somehow, but actually the syscall is replaced with "int3", which also goes to the kernel. The whole reason why the "syscall" instruction was introduced in the first place was that it's faster than the old software interrupt mechanism which has to load segment descriptors.
  So why not simply use KVM to intercept syscall (as well as int 80h), and then emulate its effect directly, instead of replacing the opcode with something else? Should be both faster and also less obviously detectable.
  jacobgorm 2 days ago
  Good point, an int3 is not going to be faster than a syscall, and if they implement the sandboxing policy in guest userspace is seems it would be quite easy to disable.
  jacobgorm 2 days ago
  I think the point here is optimizing for the common case, the untrusted code is still running inside a VM, so you can still trap malicious or corner cases using a more heavy-handed method. The blog post does mention "self-healing" of JIT-generated code for instance.
  It is possible to restrict the call-flow graph to avoid the case you described, the canonical reference here is the CFI and XFI papers by Ulfar Erlingsson et.al. In XFI they/we did have a binary rewriter that tried to handle all the corner cases, but I wouldn't recommend going that deep, instead you should just patch the compiler (which funnily we couldn't do, because the MSVC source code was kept secret even inside MSFT, and GCC source code was strictly off-limits due to being GPL-radioactive...)
  amitlimaye 2 days ago
  The follow on posts describe where I plan to run the binaries. the idea is to run in a guest with no kernel and everything running at ring 0 that makes the sysret a dangerou thing to call. we don't have anything running at ring 3 also the syscall instruction clobber some registers all in all between the int3 and syscall instruction i counted around 20 extra instructions in my runtime. ( This is a guess me trying to figure what would happen). That is why the int3 becomes faster for what i am trying to build. The toolchain approach suffers from the diversity of options you have to support even if ignore stuff you guys encountered. Might be easier with llvm based things but still too many things to patch and the movement you tell people used my build environment it meets resistance. I am currently aiming for python which is easy to do. The JIT is when i want to do javascript which i keep pushing out because once i go down there i have to worry about threading as well. Something i want to chase but right now trying to get something working.
- ghoul2 2 days ago
  Isn't that exactly what gvisor does?
  twic 2 days ago
  Yes: https://gvisor.dev/docs/
  amitlimaye 2 days ago
  gvisor tries to be a complete kernel in userland we are not trying to. We will consciously choose never to try and support multi-proess env in the sandbox. The idea is there are enough people running single process containers and they can benefit from a lighter more secure runtime. This solution will not try to replace the kernel. For example the python tests we run for https to some website ends up runnign implementing only 60 syscalls not 350. i expect to add another 10-20 for support typescript but this will always be strictly single process.Plus the performance overhead of gvisor is substantial 2-10us ( me reading internet) for the system i am implemeting on the hot path it is less than 1us. Plus there is always the density story my shim currently is 4KB the python runtime is shared through memfd. I am working on a demo showing i can run 1000 vm on 512 MB ram each launching in under 30msec. Remember this will never replace or be able to handle generic mutli-process sandboxes this is targeted only at single process env where we can make lots of simplifying assumptions
coppsilgold 2 days ago
You mentioned SECCOMP_RET_TRACE, but there is also SECCOMP_RET_TRAP[1] which appears to perform better. There is also KVM. Both of these are options for gVisor: <https://github.com/google/gvisor>
[1] <https://github.com/google/gvisor/blob/master/pkg/sentry/plat...>
- monocasa 2 days ago
  There's also SECCOMP_RET_USER_NOTIF, which is typically used by container runtimes for their sandboxing.
  coppsilgold 2 days ago
  SECCOMP_RET_USER_NOTIF seems to involve sending a struct over an fd on each syscall. Do they really use it? Performance ought to suffer.
  Also gVisor (aka runsc) is a container runtime as well. And it doesn't gatekeep syscalls but chooses to re-implement them in userland.
  xuhu 2 days ago
  SECCOMP_RET_USER_NOTIF appears to switch between the tracee and tracer processes for each syscall. Using SECCOMP_RET_TRAP to trigger a SIGSYS for every syscall in IO intensive apps introduces 5% overhead (and avoids a separate tracer).
  I wonder if there's any mechanism that works for intercepting static ELF's like Go programs and such.
  monocasa 2 days ago
  They use a seccomp filter to decide which syscalls get sent to the other process for processing.
Thaxll 2 days ago
It's pretty much what gVisor does.
https://gvisor.dev/
- Thaxll 2 days ago
  So why not using it instead of re-implementing the exact same thing.
CableNinja 6 days ago
I assume this would break observability through existing methods, right? If you were to strace a process that has been patched, would you see regular syscall data (as if it wasnt patched) or would your syscall replacement appear along the way?
- amitlimaye 6 days ago
  Good question. I didn't cover this in the post — the binary doesn't run on the host kernel directly. It runs inside a lightweight KVM-based VM with no operating system. The shim is the only thing handling syscalls inside the guest. So strace on the host wouldn't see anything — no syscalls reach the host kernel from the guest. From the host side, the only visible activity is the hypervisor process making syscalls on behalf of the guest.
  Inside the guest, there's no kernel to attach strace to — the shim IS the syscall handler. But we do have full observability: every syscall that hits the shim is logged to a trace ring buffer with the syscall number, arguments, and TSC timestamp. It's more complete than strace in some ways — you see denied calls too, with the policy verdict, and there's no observer overhead because the logging is part of the dispatch path.
  So existing tools don't work, but you get something arguably better: a complete, tamper-proof record of every syscall the process attempted, including the ones that were denied before they could execute. I'll publish a follow-on tomorrow that details how we load and execute this rewritten binary and what the VMM architecture looks like.
compsciphd a day ago
this has been done for ages with a simple kernel module that just wraps the real kernel syscall, no binary changes needed.
example how we used it in early 2000s to implement pre linux namespace containerization.
https://www.usenix.org/legacy/publications/library/proceedin... (note the shepherd and where kubernetes arguably got the pod name from).
and security policies on top of it
https://www.usenix.org/legacy/event/lisa07/tech/full_papers/...
ozgrakkurt 2 days ago
Really informative writing thank you.
How secure does this make a binary? For example would you be able to run untrusted binary code inside a browser using a method like this?
Then can websites just use C++ instead of javascript for example?
- amitlimaye a day ago
  yes that is the goal though C++ is something i am not targetting in the short term. The idea is to be able to run untrusted binaries in a vm with no kernel. saves memory makes for faster loads and the the bin cannot escape the vm so it can never compromise your host.
- lmz 2 days ago
  They already can use C++ if they want to. Emscripten? Jslinux?
  ozgrakkurt 2 days ago
  I mean just distributing the regular compiled x86_64 binary and then running it as a normal executable on the client side but just using that syscall shim so it is safe.
  direwolf20 2 days ago
  If you think about the fundamentals involved here, what you actually need is for the OS to refuse to implement any syscalls, and not share an address space.
  A process is already a hermetically sealed sandbox. Running untrusted code in a process is safe. But then the kernel comes along and pokes holes in your sandbox without your permission.
  On Linux you should be able to turn off the holes by using seccomp.
  amitlimaye 2 days ago
  seccomp is a very coarse filter and a very limited action set. think what you could do if you could see the payload of the syscall or change the output of a read syscall depending on agent identity.
im3w1l 2 days ago
What about int 80h?
- jcalvinowens 2 days ago
  Yeah, I had the same question. But I'd guess they probably disable IA32 completely.
  amitlimaye 2 days ago
  Int80 is a great idea but int3 is what i landed on when i was looking and at this point just trying to get something working. The good thing about int80 is a 2 byte instruction i believe rather than int3 + nop that i am doing right now
  undefined 2 days ago
  [deleted]
  im3w1l 2 days ago
  I think you misunderstand my question. int 80h is an alternative legacy way that a program can issue syscalls. So without handling that your system may miss some syscalls. Which may be fine, I'm sure they are not that common. But if someone were to try to sneak a syscall past your monitoring that might be something they might do? Edit: Or maybe since it's running in a vm the outcome might just be that it doesn't work at all which may be fine I suppose.
undefined 2 days ago
[deleted]
JSR_FDED 2 days ago
Love the detailed write up, thanks!
This is the kind of foundation that I would feel comfortable running agents on. It’s not the whole solution of course (yes agent, you’re allowed to delete this email but not that email can’t be solved at this level)… let me know when you tackle that next :-)
- amitlimaye 2 days ago
  AMA i am the author of that blog i have some working code just not something i want to share right away. Right now i am chasing density but yes security is something i will get to eventually. the issue is what to implement first :). This is the first of a series of blogs i am writing. you can check my substack. the next step is to show a density,launch speed demo hopefully middle of next week
foota 2 days ago
Hah, I've been looking into something amusingly similar to track mmap syscalls for a process :)
- pocksuppet 2 days ago
  Why not just use ptrace?
  amitlimaye a day ago
  ptrace is atleast 2 context switches that will make it pretty slow
  foota a day ago
  Yeah this wasn't something like "I want to debug a program" but rather I wanted to be able to track mmaping for later cleanup.
  Fortunately libc doesn't mmap that much internally so I think I can get away alright with interposing lib's mmap call.
hparadiz 2 days ago
I've been thinking of making a kernel patch that disables eBPF for certain processes as a privacy tool. Everyone is using eBPF now.
szmarczak 2 days ago
> It can’t detect the interception
What's stopping the process from reading its own memory and seeing that the syscall was patched?
- amitlimaye a day ago
  Actually you are right nothing is stopping it from reading but that does not help it escape the kernel. If you are worried about something adversarial that tries to detect its in a sandbox but that is not what we are trying to protect from the idea is to follow the same model of a container with something that is more secure and has less surface area to protect or attack.
edf13 2 days ago
[dead]
twic 2 days ago
[dead]