Very interesting. I wonder if this a result of some "swiss cheese" effect due to constraints around UEFI and NVRAM themselves, when updating EFI variables.
NVRAM must maintain atomicity of memory transactions for power failures. Its whole purpose is to store data when you turn your computer off. As a result, when deleteing an EFI variable, you can't manage the individual bytes - you have to delete a whole entry (which can be rather large - based on the EFI specification and the code used for edk2, e.g. https://github.com/tianocore/edk2/blob/83a86f465ccb1a792f5c5...). Deleting these entries might become a problem when you start running against memory constraints and what slots in memory are actually available; hence a possible fragmentation issue.
Additionally, I appreciated how short and specific this blog post was. I enjoy this style of post of someone encountering a problem and solving it.
There's also the fact that an NVRAM variable is never overwritten inplace; the new value is written elsewhere, and the pointer is updated to use the address of the new value. This is probably mainly for wear-leveling, but I guess it could also introduce fragmentation?
Just an observation from when I was debugging a board that selfdestructed when booting a particular efi-file so I had to dig into the flash contents to figure out why, but I think this particular code was straight from tianocore.
Probably for atomicity. It’s likely only a pointer sized block can be updated atomically so in order to safely update the value that may be larger you write it somewhere else and atomically update the pointer. That way you can only observe the old or new value, and not some intermediate result if power was lost part way through writing the new value. The same techniques are used in journaling file systems.
True, I was trying to find the variable storage requirements in the UEFI specification but couldn't (is it Section 3? 8?), so I resorted to linking to the struct definition in the EFI shell package that the author used.
I imagine it's not fragmentation in the strictest sense. It's more than likely it's just the result of a bug. Perhaps garbage collection wasn't being triggered and space wasn't getting freed up. It could be that the author caused the problem themselves by how they were managing their nvram.
I believe it was in the mid-2000s that BIOSes started storing configuration data in the same SPI flash they occupied, and UEFI just continued that trend and expanded on it. That removing the CMOS battery no longer clears it automatically is both a good and bad thing, and another problem it's created is that flash has a limited number of write cycles - and every time the system is booted, and possibly even while running, writes occur.
> another problem it's created is that flash has a limited number of write cycles
SPI flash typically have somewhere between 10k to 100k write cycles. Without wear leveling, lets say a write is made on every boot, and lets say a machine is booted 3 times a day you are still going to have a 9+ year life on a flash chip with only 10k writes.
EDIT: Did a search for a random "replacement SPI flash MSI" and found an MX25L25673G being used. Using that as an example it has 100k write cycles, using 3 writes a day, that's like 91 years. Basically flash memory is at the point where its "good enough" not to worry about.
I found an point-of-sale minipc with a very nice integrated touchscreen (Pipo X10). As I kept getting unrecognized usb device on a usb-serial adapter I entered bios and played with a few usb-related bios settings. Save & Exit and I discovered I just disabled ALL my usb ports with no way to reset the bios or any other available port to plug an input device into.
So yeah, the CMOS battery not clearing the bios + my own stupidity cost me ~150$.
its a bad thing. Nothing good about it.
Short sweet and to the point. Why isn't this a button in the bios, naturally hidden behind a few levels of menus? "Defrag your NVRAM" Other than power failures what would other failure modes be? I guess if the NVRAM has bad cells? Guess it's easier to just reset the whole BIOS whenever there's a problem?
So refreshing to read. Most people these days would make a 10+ minute video to promote this.
If something crashes in between the "wipe" and "restore" steps, or something goes wrong with the restore, your computer is now a brick.
A safer alternative could be to reset BIOS settings to factory defaults - that should reset NVRAM as well.
whats the risk of bricking, if the machine dies during this? I guess thats somewhat unquantifyable, but I think the warning is apt: if you had a powerloss in the middle, you are almost definitionally writing to places which matter a LOT across boot.
on the plus side, you can probably get a copy of the state before wiping it, at least as a logical structure. but what kind of fallback boot path you are on, is very specific to what the machine likes to do.
Depends on the motherboard.
A lot of your "gaming" motherboards come with dual bios. In such cases, if you toast one you flip a switch and use the other. In most cases with motherboard with 2 bios chips you can boot from the good bios, download the latest version of your bios from your motherboard manufacturer, flip the selector switch back to the bad bios while still booted, and reflash over the bad bios it from within your OS. You would most likely lose any custom efi vars but your could easily reconfigure them as needed.
If your motherboard doesn't have dual bios or bios recovery system built in, all is not lost but you probably going to have to reflash the bios via an external programmer, but such tools are dirt cheap these days. heck you could do it from a Raspberry Pi and a SOP8 test clip if you don't have any other computer to reflash the chip.
If you want to be super safe, you could dump the contents of the bios flash chip using an external programmer before attempting to do any of this (Same tools as you would need to flash the chip, Another computer, and if that other computer isn't a SBC with SPI GPIO a cheap USB SPI tool - You could use a RPI Pico for this for just a few bucks).
(Dumping the contents of the chip via the bios provider / motherboard manufactures own bios flasher tool will often skip over parts of the chip. Looking at you Intel ME! - but if your only touching efi vars, a backup from the bios flashing tool should be good enough, just make sure you don't overwrite the areas the tool didn't back up, which is why I suggested using an external tool to dump the contents, much safer.)
EDIT: As for the risk? Well again it depends, you gotta be pretty unlucky to kill the bios because of a power failure right as your messing about with it, but it can happen. I went though a spell of "tinkering in the bios" for a while *cough*Windows SLIC*cough*, and only messed up the bios once during those years and I did a far bit of flashing back then, but I was able recover the bios using an external programmer.
Dual BIOS doesn't mean the manufacturer was smart enough to implement two separate storage spaces for efivars.
Except in practice (and as pointed out multiple times here), UEFI and efivars are stored in the same chip, and this is straight-up true if it's an AMI-based system.
Perform the procedure while being connected to an UPS will reduce that risk by orders of a magnitude. Probably a very good idea!
It's a good warning to not do this unless you're having problems. I'm the type who would do this just to avoid potential future problems.
It’s quite a nerd hobby to go and defrag your bootloader nvram
I had an ASUS Z97-K motherboard that used to do the same thing after I had been playing with the boot entries for a while. I never went as far as figuring out what the issue was. I theorised that it was a case of the motherboard not performing garbage collection of the deleted entries; it didn’t occur to me that it could have been a fragmentation issue. I always fixed it by resetting the BIOS…
I've found ASUS UEFI can hold onto useless boot entries through thick & thin sometimes.
Booting to the UEFI shell, Shellx64.efi, will automatically map the drives and bring you to the Shell command line.
Lots of motherboards did not actually have Shellx64.efi built-in, in that case you would see a boot option in the firmware to boot to UEFI shell found on a boot device (an internal or external drive).
Plus some of the built-in Shellx64.efi don't actually include the BCFG command, so you might need to use an external boot device containing a more complete Shellx64.efi anyway.
At the shell command line, bcfg boot dump -b is what you enter to list the boot entries, starting with entry 00.
bcfg boot rm 04 would then be the shell command to remove entry 04, for instance.
You don't need Windows or Linux for this since you're booting to the UEFI shell to access the firmware directly.
This doesn't do anything to the boot entries in the EFI folder itself, which on Windows will likely have an entry in its BCD for every one it has been exposed to, and they can come right back into the firmware unless you also delete them from the BCD.
Remember, a properly crafted EFI folder will ideally boot as expected when there are not any boot entries in the firmware at all. Then the ideal firmware will autoinclude the entries found on the disk and you might not even need an EFI folder after that. But things are not usually ideal. Not every OS gets this right the first time, and it can change for the better or worse after a while.
Alternatively, if you are on Windows, at the admin command prompt, BCDEDIT /enum All, will display all entries, with the firmware ones towards the top. Then you can simultaneously delete an unwanted entry from both the EFI folder and the firmware with Bcdedit /delete {target-guid-here}, or in in powershell Bcdedit /delete "{target-guid-here}", since powershell is still having trouble with the curly braces.