This seems not even close to the worst feature of rust - this seems like it needs some more design work and baking. Like lots of things.
The amount of hyperbole in this article makes it a bit hard to take the author all that seriously.
Is there evidence more baking won't happen? While i have my loves and hates about rust, it definitely always felt like they had a pretty thorough/careful process for additions like this. If you go constructively into the threads and offer some concerns, you will usually get some reasonable response.
(All processes of course, fail, so this is not always true, but it's mostly true)
While i think it's fine to write rants on blogs, and don't feel like everyone has a responsibility to file bugs or whatever before they write a rant about it, if you actually want to see this "worst feature" fixed, this probably won't help very much.
(IE You don't have to be constructive, or even helpful, but if you want to be constructive or helpful, this ain't how you do it)
> Is there evidence more baking won't happen?
No, actually there is a lot of evidence that it will still be worked on.
Normally I would just say look at the issue linked in the nightly docs but due to an overlap of tacking PR and moving it from std to core PR it's not supper useful.
Tracking Issue: https://github.com/rust-lang/rust/issues/78485
If the author has constructive critique they should probably mention it there (after skimming through the discussion to make sure this wasn't already considered and not done due to subtleties they overlooked in the blog post (like e.g. that it's a standard/core feature which has to work across all targets and as such can't rely on anything being initialized to 0 by the OS, or that depending on the global allocator used you definitely can't rely on things being zeroed even if the OS only hands out zeroed memory, etc. etc.))
I'm not as much low-level development as the author seems to be, but the hyperbole made me think “you have a point, which is/can be valid, but aren't you stretching the reasons to fit in your sentiment/PoV?”
Being honest, plenty of times we throw “Ergonomics” as an argument, however, are ergonomics more a feeling of how good are the API usage instead of actually prove with examples and design choices?
The more pertinent question to me is can we implement some new static analysis that understands buffer re-use and can hoist buffer initialization outside the loop? Rather than make the programmer write obfuscated code for efficiency, it is usually better to have the compiler do the heavy lifting.
P.S. Also, folks, don't re-use buffers without zeroing unless you absolutely need the performance and know what you're doing.
Why would re-using a buffer be bad? Assuming you write to it with the contents of the file/stream before it is read.
You just answered your own question
I think they implied you would prevent that.
Why is it particularly more dangerous or likely than other logic errors?
Because the compiler optimizes based on the assumption that consecutive reads yield the same value. Reading from uninitialized memory may violate that assumption and lead to undefined behavior.
(This isn't the theoretical ivory tower kind of UB. Operating systems regularly remap a page that hasn't yet been written to.)
And that's not something you should be depending on a compiler to verify.
I like that direction better but it requires the ability to declare data-flow based contracts whereas Rust’s tools are only lifetime and type contracts. Is there a language that has data-flow based contracts?
That would be easier but is not required. There are no compiler hints these days to unroll loops or hoist invariants, even though if done incorrectly it could change the result. It would take some complicated analysis, but I think it could be done safely in some cases.
I was going to make this argument, but I actually don't think it's true in almost any case.
Most functions could be inferred, but the ultimate source of basically all of these write only APIs is FFI functions, which in turn call systemcalls.
You're at least going to need a way to annotate the FFI calls and systemcalls to describe to the compiler how they access data.
If you're calling FFIs in an inner loop, you have bigger issues than the time it takes to clear the buffer, right?
No? It depends on your definition of inner loop I guess.
If you're doing some sort of zero-copy IO, the time to clear the buffer might be non-trivial (not huge, but non-trivial). It's true that you need a large enough buffer that syscall/ffi overhead doesn't dominate, but that's not unrealistic.
It's rare that we care about this, that's true, that's why generally rust has been fine with "just zero buffers". There are definitely domains that care though.
The loop unrolling & invariant hoisting is a static transformation. What the “read” function does semantically isn’t captured today within that and the compiler wouldn’t be able to automatically infer it. It would have to be told that information and there would need to be unsafe annotations for things like syscalls and FFI boundaries. The other approach is to change the API which is what BorrowedBuf is.
If you can think of a different approach of how the compiler can figure out automatically what memory has become initialized by a random function call I’m all ears.
That's what I glossed over as "complicated analysis". In my mind, if a compiler can understand register and stack use (required for static transformations), it can (theoretically, and with some effort) understand heap use. Am I wrong?
what do you mean by this?
There would need to be contractual declarations on the read method that the compiler is able to enforce that tells it that the input &mut slice has N elements clobbered based on the returned length. That’s basically what BorrowedBuf is accomplishing via the type system and runtime enforcement of the contract. Using a non-existent syntax:
fn read<T, N: size_t>(&mut self, buf: &mut [MaybeUninit<T>] becomes &[T; N] after call) -> N {
… enforces the body initializes N elements out of buf
}
and then rules that &mut [T] can also be supplied to such functions that today could only accept a &mut [MaybeUninit<T>] transparently.A more likely interface you could write today would look like:
fn read_uninit<T>(&mut self, buf: &mut [MaybeUninit<T>]) -> (&[T], &[MaybeUninit<T>]) {
… enforces the body initializes N elements out of buf
}
You still have to cast &[T] into &[MaybeUninit<T>] somehow.> You still have to cast &[T] into &[MaybeUninit<T>] somehow.
unsafe{ std::mem::transmute(slice) }
This is probably the only way that will ever exist, because let slice: &mut [NonZeroU8] = ...;
let slice_uninit: &mut [MaybeUninit<NonZeroU8>] = ...;
let nonzero_uninit: &mut MaybeUninit<NonZeroU8> = &mut slice_uninit[0];
*nonzero_uninit = MaybeUninit::zeroed();
slice[0]; // Undefined behavior for sure by now.
Is all safe except for the cast.I.e. MaybeUninit<T> allows you to write invalid bit-patterns to T, so you can't safely cast a reference to T to it (and if you do unsafely cast a reference to T to it you can't soundly write an invalid bit pattern). All current forms of safely making a MaybeUninit take ownership of the value they are declaring to be MaybeUninit for this reason.
I guess at some point we might get methods for this on types that can take on all bit patterns - if/when that's encoded as a trait.
I think an ergonomic way to do that would to have read return not an integer, but a slice of that integer’s length.
Problem would be: how do you express “you can only access the buffer you sent me through the read-only slice I returned, but you have to free that same buffer when you’re done calling me?
I think that can be done using a function creating a read buffer for a given input stream that
- during calls to read is ‘owned for writing’ by that stream (so, it has to borrow a capability that the creator of the buffer doesn’t have. I don’t think Rust currently supports that)
- where stream.read returns a read only slice whose lifetime is bound to that of the buffer
So, the creator of the buffer can only pass it to read to get a slice back that contains precisely the data read.
The stream can write to the entire buffer.
Fair, but note there is a significant subset of Rust-targeted programmers who dislike the compiler doing things like that. They also dislike the compiler doing things like auto-initializing every loop iteration, but two wrongs wouldn't make it right, just less wrong.
Maybe Rust needs another type of reference that's exclusive write only? Right now there's RO (&T) and exclusive RW (&mut T) but WO is missing.
Having a WO reference would allow these read_buf APIs to express they only write and never read so the uninitialized memory is safe to pass directly.
In some sense that's exactly what a `&mut MaybeUninit<T>` is?
Probably more once https://doc.rust-lang.org/beta/std/mem/union.MaybeUninit.htm... is no longer nightly-only.
everyone just tell you to use mpsr in this case
I think the answer is that in a case when you need that speed, you hoist the stack allocation & zeroing and unsafe that buffer in the loop if need be. Test well. I am a huge Rust fan but also it is actually possible to write correct unsafe code.
If I am interacting with from IO space, I would much rather write the interaction code myself for the machine at hand than farm it out to an array of third party crates. ::shrug::
getting the machinery to let it properly be hoisted smoothly and safely would be nice, but it isn't required.
personally I think rust macros are very painful and the "worst feature", but that's speaking as someone who did a fair bit of Common Lisp.
> While replacing the array of zeros by an array of uninitialised values may work in specific circumstances, the code is unsound. Change to the compiler, its options, modification of unrelated parts of the code or using the function for a different Read trait implementation may break the program in unpredictable ways.
Why? It seems the only thing on that list that will cause UB is using the function with a different reader (one that inspects the uninitialized bytes). Why would any of the other listed possible changes break it?
> Even an obvious optimisation of moving the buffer declaration outside of the loop isn’t available to the compiler.
Why? Can't the programmer just do this himself?
The compiler cannot assume that the read call won't read from the mutable reference (well, it might be able to given a sufficiently sophisticated optimizer and/or aggressive inlining).
The programmer, on the other hand, can do this, but the point is to make this implicitly possible by making it more explicit that read does not read from the buffer (and therefore allowing it to accept uninitialized memory).
And I don't think it can ensure that all the bits were written. I've been bit by people trying to reuse buffers/objects like this that were not fully rewritten in one of the possible re-uses. It's a bit puzzling how a change that just adds a new continue leads to memory corruption.
Zig has a couple features to help with that. I assume Rust should too (probably not directly applicable to BorrowedBuf, but for the case of a reusable pool of objects)?
It comes down to a piece of syntactic sugar, plus "result location semantics" guaranteeing that you won't have a copy. E.g.:
my_ptr.* = .{
.x = 42,
.y = 53
};
No matter how you choose to construct the intermediate fields (x and y in this example) with continues or other control flow, the very last step should be something that sets every field at once. If you miss one, the compiler will yell at you. If it compiles, the assembly is as if you filled in each field by hand.It's cool to have syntax for that, but I feel that the optimisation there works at a too high level.
I'd like the backend to notice that a variable is being dropped and "re-created" on every iteration, and then figuring out how to initialise over a "dropped" zombie value. It'd be nice to have something like this, because I'm pretty sure people don't do this kind of optimisation all the time (it's annoying to leak the variable that should survive loops outside of the loop becase it's not mentioned elsewhere).
Because the compiler doesn’t know what read/write are doing to the buffer. And since it’s declared as [0; 4096], the compiler wouldn’t be able to do anything other than 0’ing the entire 4kib region on every read instead of what’s dirtied if it attempted to automatically hoist. BorrowedBuf is an attempt to let you declare [MaybeUninit::uninit(); 4096] which the compiler could hoist although there it doesn’t matter either since the allocation of the uninit array is just an adjustment of the stack pointer.
> Why?
Because as far as the compiler is concerned it appears to change the behaviour, unless the compiler gets very fancy.
> Can't the programmer just do this himself?
Yes, but it's not really desirable for them to have to (and would arguably make the code less maintainable if they did). Doing the right thing should be easy.
Arguably the buffer belongs in the loop scope because it is only relevant there. It’s probably also safer from wrong use.
This feels like exactly what you want the compiler to think about: a case where optimization comes at the cost of organization.
Not if they are using Rust ... which is why I am not.
This is incorrect. It's trivial and compiles just fine. The argument here is that maybe for reasons the programmer doesn't want to - such as not wanting the buffer to outlive its use inside the loop, and they don't want to have to double-nest:
{
let mut buf = [0; 4096];
loop {
...
}
}
That accomplishes exactly the same goal but there's an argument -- not well made in the blog post -- that the compiler should be able to do some form of this hoisting automatically. In C, it would be automatic, because C doesn't make a zero-initialized promise for stack-allocated variables. In Rust it's not because the array is specified as zero-initialized. Of course, C's behavior comes with certain drawbacks of its own. ;)Rust's behavior isn't unreasonable. It's just a potential missed optimization, but automating it is challenging.
Adding an extra scope here is slightly annoying, but it's not always possible. I think the example in the blog post was poorly chosen, because the complexity of BorrowedBuf together with MaybeUninit doesn't make much sense when your fix makes for much more readable code.
Out of all problems I have encountered with Rust, this is a particularly minor one.
I see. It is a very bad example indeed. Terrible, terrible example.
Switching off Trump mode for a moment, I don't see why you would want to declare the buffer inside the loop, given that keeping it alive for the entire time of the loop is actually the semantics you want.
If people wrote the most optimal code the first time, we wouldn't need optimizing compilers, and all the undefined behavior that optimization passes necessarily bring along. The whole point of the example is to be poorly written in a way that the compiler "obviously" should be able to fix, but can't.
A less obvious example would be...
- A struct, A, which has an init_from_file method that deserializes data from a Read: R
- Another struct, B, which has its own init_from_file, and a variable number of A as one of its fields. B::init_from_file needs to deserialize by calling A::init_from_file in a tight loop.
This example is the same as the first, except now we've disguised the inefficiency with separation of concerns. A compiler can inline A::init_from_file into B::init_from_file to yield the same code as in the example.
So you are saying that the buffer would be allocated in the program in A::init_from_file? And the compiler would be able to optimise that away by allocating the buffer outside the loop?
If the compiler actually does that, that would be a good example. As long as I don't have to be careful to write my code in a way that some obscure compiler optimisation understands.
Because you want the buffer to go out of scope after the last iteration of the loop. Motivating that requires bringing in more rust - It could be as simple as wanting to reuse the variable name later, but more likely it would be because you were using something that had a reference that you wanted to go away so you could borrow it again without the borrow checker yelling at you.
Ok, then the nested scope is indeed exactly what you want. I don't see how obfuscating this purpose and trying to rely on obscure compiler optimisations and intricate semantics would be a good idea.
Because, as someone else noted, it might be hidden for you because you're using something that's inside another function, macro, struct, etc.
Yes, that is a valid reason. If I get that optimisation for free, why not?
I still would not rely on that optimisation, though. If I think this could be an actual bottle neck, I would make the shared buffer explicit in my code.
What language would automatically be able to hoist the array outside the loop in that kind of code?
C, because reading uninitialized memory is undefined behavior so the compile can assume it never happens.
Care to present some proof? Here’s an counter proof that the compiler isn’t able to reason about the memory in that way https://godbolt.org/z/x7j8xoMxY
There are cases where C can do loop hoisting, but the cases are a subset of what Rust does and this isn’t one of those.
You example doesn't show this. Also the allocation is hoisted out of the loop. The initialization is not and this would be invalid in general. It could eliminated in this case, but this would be dead store elimination.
But wouldn't that change behavior? An empty, zero initialised array will contain data and a bunch of zeroes after a read, but if the next read only partially fills the buffer you end up with a buffer containing data from two reads.
In this specific example there's no issue because the result of read() is being used to only write as much data as was read, but to me this seems like a pretty complicated and unlikely assumption to write optimizations for.
Reading uninitialized memory is not undefined behavior.
An explanation of why: https://stackoverflow.com/a/11965368/814422
There are some important caveats, though, around trap or non-value representations. Basically, the value held by the storage for a variable may not correspond to a valid value of the variable's type.
For example, a bool variable usually takes a full byte but only has 2 valid representations in many ABIs (0 for false, 1 for true). That leaves 254 trap representations with 8-bit bytes, and trying to read any of these is undefined behavior.
Furthermore, a variable may be stored in a register (unless you take its address with &), and registers can store values wider than the variable type--e.g., even though int has no trap representations in memory of the same size, nowadays it's usually smaller than a register--or be in a state that makes them unreadable. Trying to read such a value is also undefined behavior.
So, reading memory in general is defined behavior (just with an indeterminate value) but it has to actually be memory and you have to be reading it into a type that can accept arbitrary bit patterns.
This API has basically been adopted from Tokio. Like most of Rust buffer types, it's "not bad" to use as a caller and "awkward" to use as a consumer.
The pain of paying for buffer init is real, however. The last two projects have both seen perf hits from it.
If this is mainly useful for working with plain/uninterpreted byte arrays, then I wonder why we can't just do `[u8; N]::with_noinit()` method instead of doing the multi-line plus unsafe things listed in the article.
Is the main point that things like `slice_freeze_mut` could also be used for slices of e.g. `struct Coordinate { x: u32, y: u32, z: u32 }`?
It would obviously not work for f64 things, since there also not all bit-patterns are valid.
All f64 bit patterns are valid.
The selling point of Rust was that it protects programmers from doing dangerous things.
This is a good first approximation, but it misses something. It's actually that it protects programmers from accidentally doing dangerous things. There's a lot of support in the language for doing dangerous things, you just have to explicitly say "hey I know I'm doing something dangerous, and I promise I'm right here."
I am not yet a Rust programmer but - is it not typical to have a small collection of unsafe functions, carefully reviewed, that in this case seem like they might be easier to maintain than some of these convoluted type-based workarounds?
It's a tradeoff! You have to explore both options to know which side of the tradeoff to take.
Do some Rust types have invalid object representation or trap representation? On SysV x86_64 bool only has two valid representations in memory, the rest are trap representations.
So for an array of bools (if Rust matches SysV) freeze wouldn't be sound, even without the madvise problem.
Yes.
That's what "// SAFETY: u8 has no invalid bit patterns." is discussing. That while types in general can u8 specifically does not (none of the u*/i* integer types do) so freezing a buffer of u8s is sound.
Yes; and furthermore LLVM also has undef, which is sort of a trap representation, but it only exists in the optimizer. (There's also poison for overflow, which is a strictly less defined value than undef.)
I suspect even reading an array of uninitialized u8s would cause havoc just from LLVM miscompiles alone.
Why is it that the frozen semantics are actually needed? Is there no way to represent what people actually want here - memory that is entirely uninitialised, for which tautology might be false, until written? I.e. something that's a bit like MaybeUninit but more so?
There's probably no way for the compiler to prove safety. Rust is designed to allow 100% safe bare metal development, like a perfectly safe C that still allows you to get close to the hardware, and that's tough.
What does safe mean here? Everything can be interpreted as a [u8], right?
[u8] guarantees to the compiler that two reads through the array at the same location without any intervening writes return the same value.
Turns out that's not the case on freshly returned uninitiated allocations. The first read could return old data (say "1"), and the second read could return a freshly zeroed page ("0").
https://www.ralfj.de/blog/2019/07/14/uninit.html perhaps (the OP also talks about this when linking to a talk about jemalloc)
I'm failing to understand the correlation to "safety" here. Reading a byte for which you don't know the value isn't "unsafe". It's literally (!) the desired behavior of foreign data being read from an external source, which is in fact the use case in the article.
There's no safety problem as long as the arbitrary value is deterministic, which it is, being process RAM. The "uninitialized data read" bugs reported from instrumentation tools in C code are because the code is assuming the value has some semantics. The read itself has no value and is presumably an artifact of the bug, but it is safe.
> There's no safety problem as long as the arbitrary value is deterministic, which it is, being process RAM.
The article discusses how it is in fact, on Linux with memory returned from at least one very common allocator, not deterministic. Ctrl-f tautology.
That's just a terminology collision. All RAM access is deterministic in the sense that the value will not change until written. It's not "predictable" in the sense that the value could be anything.
C code that reads uninitialized data is presumed to be buggy, because it wouldn't be doing that unless it thought the memory was initialized. But the read itself is not unsafe.
Rust is just confused, and is preventing all reads from uninitialized data a-priori instead of relying on its perfectly working type system to tell it whether the uninitialized data is safe to use. And that has performance impact, as described in the linked article, which has then resulted in some terrible API choices to evade.
> All RAM access is deterministic in the sense that the value will not change until written.
Again, the article literally points to how this is not true given modern allocators. The memory that Linux exposes to processes will change without being written to prior to being initialized given how allocators manage it. This isn't a fiction of the C-standard or rust reference, it's what actually happens in the real world on a regular basis.
Rust is not confused, it is correctly observing what is allowed to actually happen to uninitialized memory while the process does nothing to it.
You could change the C/Rust specification of that memory. You could in your C/rust implementation declare that the OS swapping out pages of uninitialized memory counts as a write just like any other, and that it's the programmers (allocators) responsibility to make sure those writes obey the normal aliasing rules. Doing so would be giving up performance though, because the fact that writing to memory has the side-effect of cancelling collection of freed pages is a powerful way for processes to quickly communicate with the OS. (You'd probably also cause other issues with memory mapped IO, values after the end of the stack changing, and so on, but we can just focus on this one issue for now).
You have some misconceptions about C and undefined behavior.
> C code that reads uninitialized data is presumed to be buggy, because it wouldn't be doing that unless it thought the memory was initialized. But the read itself is not unsafe.
The read itself is very much unsafe because it's undefined behavior. The compiler is allowed to assume the programmer doesn't allow reads to uninitialized memory to happen, so if the programmer does allow a read from uninitialized memory, any false conclusion can follow from the false assumption.
This is a problem even in trivial cases; see [1] and try commenting out and switching arround the calls to foo and bar. The behavior is very unintuitive, because reads from uninitialized memory is unsafe.
> You have some misconceptions about C and undefined behavior.
The discussion is about RAM and Rust, not C. And the particular use of uninitialized data in C that corresponds to the linked article (as a target buffer for a read call) is clearly not undefined behavior.
This is a classic HN tangent, basically. You're making the discussion worse and not better.
The question is "Why can't Rust act like C when faced with empty buffers?", and the answer has nothing to do with undefined behavior or unsafe. They just got it wrong.
Okay, I think I see. In the linked article, due to the behavior of `read` uninitialized memory is never read. The same would be true in equivalent C code.
However, in C, the programmer doesn't need to prove to the compiler that uninitialized memory is never read, they are just expected to prevent it from happening. In this case, it's clear to the programer that there's no undefined behavior.
In Rust though, the compiler must be able to statically verify no undefined behavior can occur (except due to unsafe sections). It's not possible to statically verify this in either the Rust or C case, because not enough information is encoded into the type signature of `read`. The article discusses a couple of ways that information might be encoded so that Rust can be more like C, and discusses their trade-offs. C explicitly sidesteps this by placing the responsibility entirely on the programmer.
So to directly answer your question "Why can't Rust act like C when faced with empty buffers?", it's because the Rust compiler cannot yet statically verify there's no undefined behavior in this case, even though there is in fact no undefined behavior, and one of the primary design goals of Rust is to statically prevent undefined behavior.
And to what's perhaps the initial question, this is discussed using the term "safety" simply because Rust defines things which can't be statically verified to not invoke undefined behavior as "unsafe". Perhaps a better term would be "not yet statically provable as safe", but it's a bit of a mouthful.
> it's because the Rust compiler cannot yet statically verify there's no undefined behavior in this case
Uh... yes it can. It's a memory write to the uninitialized region. Writes are not undefined, nor unsafe, and never have been. They aren't in C, they aren't in hardware. Writes are fine.
The bug here is API design, not verification constraints.
The issue isn't writes to uninitialized memory, it's reads from uninitialzed memory. The compiler doesn't know how much of the buffer `read` writes. The docs say it returns a unsigned integer with how many bytes it wrote, so a programmer can know the later read from `buffer[0..num_bytes_written]` is valid, but the compiler doesn't know what the number returned from `read` represents, so from the compiler's point of view, the whole buffer needs to be initialized regardless of what read does for reads from it to be valid. That means it has to be initialized before it's passed to read, otherwise the compiler can't prove the elements which are later read from the buffer are initialized.
I don’t understand your point and you’re wrong on a couple of things.
> C code that reads uninitialized data is presumed to be buggy, because it wouldn't be doing that unless it thought the memory was initialized. But the read itself is not unsafe. Rust is just confused, and is preventing all reads from uninitialized data a-priori instead of relying on its perfectly working type system to tell it whether the uninitialized data is safe to use
Reads of uninitialized memory is unsafe full stop. That’s literally what Rust’s memory safety is about. If you give that up you’re not in safe land and you can always use unsafe & all the risks that come with that to try to write more optimal code / abstractions.
This article is literally about the mechanisms Rust is trying to stabilize how you use the type system to go from a block of uninitialized memory & is aware of writes occuring so that it can take a &[MaybeUninit<T>] and give you back a &[T] after a call to read which wrote into the slice. But reading uninitialized memory by definition is tautologically unsafe. It doesn’t mean that the computer will pull out a knife and kill you, but it does mean you’re no longer memory safe.
> All RAM access is deterministic in the sense that the value will not change until written
That's correct at a hardware level. It's not correct from a userspace program's perspective when interacting with Linux if the memory in question is uninitialized.
> That's correct at a hardware level.
Not even there, due to memory caches.
> All RAM access is deterministic in the sense that the value will not change until written.
No, this is directly addressed in the article.
RAM access to uninitialized memory is not deterministic and can change. See MADV_FREE.
That's a VM feature. I mean, yes, if you change your process's memory space between accesses you break whatever the compiler might have assumed. You don't need fancy flags, either! Just, y'know, munmap() will do.
This is yet another unrelated tangent. Can you explain why it it you think the presence of MADV_FREE disallows Rust from allowing writes to uninitialized memory?
You’re missing the point. It doesn't matter that this is a VM feature and not a hardware feature. The fact that it exists at all means that reading from uninitialized memory is unsound. You can write to it all you want, but not read from it.
Once more, the use case in question is WRITING to uninitialized memory, not reading from it. Rust is applying the constraints of the latter (and thus requiring "unsafe") to the former, which is not an unsafe operation.
No. Rust is only applying any constraints to reading from it.
The entire discussion is how to make an API which defines which portions of the memory can be read from because they have been written to, so we can expose an API that cannot be misused. There have never been any constraints on writing to uninitialized memory.
what does the term "nightly" mean in this context?
It is the version of rust that is under active development, and also contains experimental features that aren’t in the “stable” rust compiler.
Rust has “nightly” and “stable” toolchain variants called “release channels”. The “nightly” version is literally just the compiler and all the other tools built in the CI/CD pipeline on a nightly basis with the latest branch including various features that need further development before they go to stable.
The process and reasoning are described here https://doc.rust-lang.org/book/appendix-07-nightly-rust.html
I'm not a Rust person (give me Lisp any day), but this stuck out to me:
> A motivated programmer can try adding necessary support to actively maintained packages [...] but what if one is stuck at an older version of the crate or deals with apparently abandoned crates
One could maybe do some programming? I mean, hell, if most of the work has already been done for you, then what's holding you back? Besides, why would you want to use outdated, bug-ridden libraries filled with vulnerabilities instead of something well-maintained?
The notion that a library must be “bug-ridden” and “filled with vulnerabilities” just because it hasn’t changed in a while is a strange one. If it were true, it would also be true for libraries that do change. It’s not like libraries accumulate new bugs and vulnerabilities by not changing. Libraries that constantly gain new features, on the other hand, are prone to also gain new bugs and vulnerabilities.
Agreed -
It's beyond wrong. For example, at the core of plenty of numerical libraries is 30+ year old fortran code from netlib that works great. It's just done.
It does what it's supposed to. It does not have meaningful bugs. There is no reason to change it.
Obviously, one can (and people do) rewrite it in another language to avoid having to have a fortran compiler around, or because they want to make some other tradeoff (usually performance).
But it's otherwise done. There is no need to make new releases of the code. It does what it is supposed to do.
That's what you want out of libraries past a certain point - not new features, you just want them to do what you want and not break you.
If the author intends for:
> apparently abandoned
To mean:
> hasn't changed in a while
Then I will agree that that doesn't need to mean it has to be laden with problems. I was going more on the basis of it meaning that there are a number of unresolved bug reports as well as a lack of activity. In the Common Lisp community there is generally agreement that software can be simply finished (although there is also an amount of broken abandonware).
My main point is if people are using libraries with problems that aren't getting fixed, they shouldn't be afraid to take over maintenance or to migrate to libraries which don't have those problems.
I like Donald Knuth's approach where his version numbers of TeX and Metafont asymptotically approach pi and e, respectively, emphasizing the slow approach toward perfection.
And communicating nothing to the end user except a sense of whimsy? Nah, that's useless.
What do the version numbers of programs communicate to end users anyway? Basically just that it's newer than the last version.
Libraries are a bit different though, because of semver compatibility, but unfortunately they are wrong often enough that you can't really rely on them anyway.
> It’s not like libraries accumulate new bugs and vulnerabilities by not changing.
That's not actually true, for libraries that interact with the platform. For example, Mac OS changed some type definition in header files in their arch transition, which makes building a pre-M1 rust project that depends on a contemporary library version that interfaces with the OS (like SDL) on an ARM host without any changes impossible. You need to update your dep (and potentially the way you consume them) just to be able to build the project, or procure a host supported by your dependency (one that existed when it was written, so an X86 machine).
We could argue all day that this is "not a bug" or the user's fault for using a new host and it was never supported or any other deflection. But it is a concrete example of "this code hasn't changed and the passage of time shifted the ground from under it".
Agree, I think we need to speak of abandoned, versus maintained.
No activity doesn't mean abandoned, it can just mean super stable and few bugs to fix.
Abandoned is a big worry though.
I wonder if we need a canary file, but in the repo. An "I'm still here" file.
I still question that. It's generally as good as the day it came out. Discovered vulnerabilities are going to be in the language, probably not in the library itself. As long as it's not dependent on a language that's so old it's not getting security fixes, it should be fine.
Actually, a big exception is libraries that have external dependencies that change. A client library for an API, for example. Those can break quickly.
Hm, vulns are more common in libraries in general than in the compiler, I'd say. Or am I misunderstanding?
By "discovered vulnerabilities", I mean a security issue that wasn't known when the library was first written but then came to be known. This is what's fixed when a library is maintained.
This entirely depends on the library, but I just generally don't see a lot of security fixes in library updates. But for compiler updates, I do.
I'm speaking super broadly, and this will be very different for a Python graphing library versus a C networking library.
The vulnerability is almost never in the compiler (not never - I have seen a case, but very rare). Most attacks are in the library itself. If your library has a buffer overflow you are vulnerable. If your library has a C style buffer length + size parameters and you mess them up is it the libraries fault for such a bad API?
If a library is stable with no major bugs, then I don’t see much of a difference between abandoned and maintained.
The one thing I would look at instead is open issues.
Abandoned means no one around to deal with security issues, or impacting bugs.
Maintained means they are. There may be no new features though.
(these are my definitions, but I think there should be similar concepts in all dev's heads)
It changes nothing about the current version of the library though. It only impacts how quickly or easily one would be able to get a fix in case some issue does come up. But I’m questioning the expectation that such issues will come up, and the associated expectation that one will have to regularly update the library because new issues won’t ever stop coming up.
It's about perception. When I look at a old page on GitHub and there haven't been any code changes in two years but it does look like the Issues page is addressed quickly then I can trust the old code. On the other hand if the Issues page isn't addressed I can assume that the code will cause me problems in the future and I'll look for a different library.
There are endless examples of libraries being updated, due to bad code in the library. It's not a rare event, it's very, very common.
If a library is abandoned, and no one is around updating the code for vulnerabilities, it's trouble. That's because one day you can use it, and the next you cannot.
(Yes you can personally patch it, but that's an issue that's come up. And how will you know? Look at all the bugtrackers daily to see people yelling "HEY!"?)
Reading through, they're describing kindof "eras" of best practices. Like "hex" and "base64" might never change (so might never update), but those are kindof prime name tokens.
It made me think: `std::hex,base64` vs `boost::hex,base64`, but then assuredly those would be incompatible because $REASONS.
...but what if it was like: `v2025::std::hex,base64` vs `boost::v2025::std::hex,base64` (ie: explicitly stating you're adhering to the v2025 guidelines w.r.t. memory management or parameter naming or whatnot).
It's a roundabout way of saying: `tokio::async_foo`, `boost::async_bar`, `v2026::std::async::foo,bar`, where as the marketplace of ideas settles on "better" ways of dealing with async (or whatever) there can eventually be compatibility between different object "modalities" (ways of working).
`std::hex,base64` should be quite stable, but having a path for `...::webapp::` and `...::database::` to eventually interop seems really useful for a language to encourage?