You are not logged in.

#1 2021-08-07 14:39:50

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Investigating corrupted GRUB/EFI

Hi all,
this morning when I powered on my MSI Modern 15, I discovered that GRUB was apparently gone and I could not boot Arch. Using the installation medium, I easily fixed the issue, reinstalling GRUB.

Now, I am quite worried because I have no idea why this happened, so I cannot be sure this won't happen again. 
Yesterday I updated the system as usual, including the kernel. I don't remember changing any boot-related configuration or doing something relevant at system-level.

Any suggestions to investigate this issue? Or, should I consider this something than "may happen"?

Thanks

EDIT: It happened again and with no upgrades performed during the previous day. I don't have Windows installed.


EDIT 2: We have found out below that the issue is not actually related to GRUB, but to the EFI variables (hence, boot configuration) being somehow lost.

EDIT 3: Eventually I figured out that the problem is caused by the SSD not being detected at boot occasionally. In turn, this caused the EFI configuration to be lost and this is what I noticed when rebooting (the SSD was detected at that point, and I found no boot configuration in place).

Last edited by childerico (2021-11-02 08:50:59)

Offline

#2 2021-08-07 14:45:34

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,791

Re: Investigating corrupted GRUB/EFI

Was Grub really gone?  Or was it unable to boot Arch Linux?   
What symptoms led you to assert that Grub was gone?


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#3 2021-08-07 15:18:49

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

At boot, I got an EFI shell instead of GRUB. I first considered something was wrong with the SSD itself, but then I verified that I could regularly mount and use it from a live distro.

In the UEFI settings, the SSD was correctly listed as one of the boot devices, but GRUB was not listed in the BBS Boot Priority settings (it only displayed "Windows Boot Manager").
After reinstalling, I can now see GRUB in the UEFI BBS settings.

Offline

#4 2021-08-07 15:47:16

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,791

Re: Investigating corrupted GRUB/EFI

Why do I have a feeling that Windows is the guilty party here?

Is Windows fastboot disabled?  http://wiki.archlinux.org/index.php/Dua … ibernation
Did you boot Windows? 
Did Windows update?


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#5 2021-08-07 15:57:38

Slithery
Administrator
From: Norfolk, UK
Registered: 2013-12-01
Posts: 5,776

Re: Investigating corrupted GRUB/EFI

Did you update your motherboard firmware?


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan
Closing -- for deletion; Banning -- for muppetry. - jasonwryan

aur - dotfiles

Offline

#6 2021-08-07 16:23:42

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

@Slithery: no, I haven't updated the firmware.

@ewaller: Actually, Windows has never been installed on this laptop, which I received without any OS. So, it was a bit surprising even for me to see "Windows Boot Manager" as a boot entry.
I have a file "EFI/Microsoft/Boot/BootMGFW.efi" that justifies showing that entry, but to be honest I have no idea where it comes from. It's unlikely, but maybe I copied stuff from the EFI directory of my old laptop when I first installed Arch.

Edit: indeed, when GRUB was not detected, Windows Boot Manager was the only available boot entry, but it was not able to boot anything and I got an EFI shell.

Last edited by childerico (2021-08-07 16:25:01)

Offline

#7 2021-08-07 17:02:38

Maniaxx
Member
Registered: 2014-05-14
Posts: 738

Re: Investigating corrupted GRUB/EFI

childerico wrote:

I have a file "EFI/Microsoft/Boot/BootMGFW.efi" that justifies showing that entry

Firing up this file could do "something". Windows is known to delete GRUB EFI entries now and then on some systems.
Another possibility could be a dying CMOS battery.

You could try an additional 'fallback' entry 'EFI/Boot/BOOTX64.EFI' that might survive if this happens again. Just copy the existing grub EFI file there or use 'grub-install --removable' as described in the link below. The fallback EFI might be run if you select the main HDD entry in BIOS.
https://wiki.archlinux.org/title/GRUB#D … _boot_path

Last edited by Maniaxx (2021-08-07 17:14:38)


sys2064

Offline

#8 2021-08-07 19:46:47

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

Thanks for the suggestion @Maniaxx

Offline

#9 2021-09-29 07:47:12

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

(I am not sure whether starting a new thread would be preferred)

After ~2 months I again faced the same issue. It happened this morning when I powered on my laptop and yesterday as well. In both the cases, the laptop had been powered off in a clean way.
At startup, I got the UEFI firmware settings screen.
The hard disk was detected, but it apparently did not provide any valid boot entry. As suggested by @Maniaxx, I have two GRUB installations on the EFI partition (one installed with --removable), but none of them was listed.

As usual, I fixed the issue using the Arch installation medium. I also verified that grubx64.efi existed in /efi/EFI/GRUB as expected and re-installing GRUB did not add new files there.

I am wondering whether something happens to my EFI partition during the day, e.g., because of bad mounting options. This is an excerpt from my fstab:

# /dev/nvme0n1p1 LABEL=EFI
UUID=2479-3C2F      	/efi      	vfat      	rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro	0 2

Thanks for any suggestion to fix this annoying issue.

Offline

#10 2021-09-29 08:54:32

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,739

Re: Investigating corrupted GRUB/EFI

If you aren't doing any active writes that shouldn't matter. MSI has a general track record of a bad UEFI implementation, however it "should" at least retain the entry you installed with "removable", however you are usually not going to see that as GRUB in the firmware menu and will want to boot the "[UEFI] hard disk name" or similar.

Online

#11 2021-09-29 10:32:49

Maniaxx
Member
Registered: 2014-05-14
Posts: 738

Re: Investigating corrupted GRUB/EFI

You could try to remove the battery (for a minute) and try again. Maybe the EC (embedded controller) can be reset this way trying to recover the nvram.

Before fixing the boot chain you could check/list the nvram entries with 'efibootmgr' first. Maybe this leads to some clues.


sys2064

Offline

#12 2021-09-29 10:41:07

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

Thanks for your replies.

In the meanwhile, I have also noticed that a BIOS update is available for my laptop (at the time I first opened this thread it was not available and I had not checked later). I upgraded it, although the release notes do not mention anything related to UEFI.

Offline

#13 2021-09-29 12:31:06

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

I used the lunch break to do a test and shut down the laptop. When I turned it on, I again faced the UEFI firmware settings screen.

Some notes:

1- I tried to use the fallback GRUB selecting simply "Hard Disk" in the boot menu, but it didn't work. Maybe I did not install the removable GRUB correctly, although it looks fine (it is in /efi/EFI/BOOT/BOOTX64.EFI).
2- I run "efibootmgr" before and after reinstalling GRUB from the installation medium. The first time it only showed boot entries associated with the USB drive I was using; after re-installing GRUB, the "grub" entry was inserted and selected as default. Indeed, rebooting I was able to boot Arch.

Offline

#14 2021-09-29 12:54:49

Maniaxx
Member
Registered: 2014-05-14
Posts: 738

Re: Investigating corrupted GRUB/EFI

So the nvram entries are gone. Did you check the cmos battery?


sys2064

Offline

#15 2021-09-29 13:15:33

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

@Maniaxx: Not yet. The laptop is only a few months old and it is in its warranty period. As far as I understand, replacing the CMOS battery requires the motherboard to be disassembled (at least for other MSI models) and I would prefer to avoid doing so. However, I don't see other options at the moment.

Offline

#16 2021-09-29 15:40:11

Maniaxx
Member
Registered: 2014-05-14
Posts: 738

Re: Investigating corrupted GRUB/EFI

The battery is probably ok anyway. I just didn't want to let this unmentioned in case it has easy (frontend) access to the cmos battery.


sys2064

Offline

#17 2021-09-29 16:22:27

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

I haven't noticed any other issue that may be related to the battery (e.g., lost date and time) too. By checking further, I saw that in my laptop the battery should have easy access, so I could try replacing it.

I am also considering the idea of mounting efivars in read-only mode (https://wiki.archlinux.org/title/Unifie … ble_access) to prevent any modification at software level. Does it make sense in your opinion?

Offline

#18 2021-09-29 17:46:47

Maniaxx
Member
Registered: 2014-05-14
Posts: 738

Re: Investigating corrupted GRUB/EFI

If you have a voltmeter you could measure the battery voltage. Otherwise i wouldn't spent too much time on this.

Disabling the efivars sounds promising. That's a very good idea in my opinion.

I can remember a Win10 system (i5-3xxx platform) that could corrupt the BIOS/nvram on-demand by enabling WOL (wake-on-lan) in the Windows driver settings. The system could be brought back by resetting the cmos but the primary culprit was a faulty network driver.

Last edited by Maniaxx (2021-09-29 17:50:27)


sys2064

Offline

#19 2021-09-30 08:36:29

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,739

Re: Investigating corrupted GRUB/EFI

Regarding mounting efivars in ro, this will only help if there's active modification of the efivars on the linux side, which shouldn't happen under normal circumstances, you can try but I doubt it will be useful

Online

#20 2021-10-04 08:07:21

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

As probably expected, mounting efivars in ro has not solved the issue. The boot had gone smoothly for 2-3 days since then, but yesterday and this morning it failed again.
Again, the system date and time as showed by the BIOS were not lost.

I don't really know what to do now.

1) I could still try replacing the CMOS battery. Related to this: my laptop is only a few months old and is in its warranty period. Will the warranty be invalidated, if I replace the CMOS battery by myself?

2) I read here (https://wiki.archlinux.org/title/EFISTUB) that "The UEFI Shell Specification 2.0 establishes that a script called startup.nsh at the root of the ESP partition will always be interpreted and can contain arbitrary instructions; among those you can set a bootloading line. Make sure you mount the ESP partition on /boot and create a startup.nsh script that contains a kernel bootloading line."

The problem with 2) is that my ESP partition does not coincide with /boot and is mounted in /efi. As far as I understand, I could not follow those instructions, because I would need to reference the kernel image on a different paritition.

Thanks for any advice.

EDIT: After posting, I realized that I could actually use 2) to launch GRUB instead of directly loading the kernel. This should be easier, right?

Last edited by childerico (2021-10-04 08:44:05)

Offline

#21 2021-10-04 10:17:34

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,739

Re: Investigating corrupted GRUB/EFI

Regarding the realisation in the edit indeed that should work. But really most of this shouldn't be necessary on a halfway decent UEFI. Maybe you actually do want to update your firmware (only skipped so sorry if I missed this but as far as I see you only denied having done an update in #6, while I'm recommending you actively do one)

Online

#22 2021-10-06 12:34:31

childerico
Member
From: Italy
Registered: 2015-11-18
Posts: 67

Re: Investigating corrupted GRUB/EFI

Perhaps I found a very ugly workaround.

Inspired by this post (https://github.com/rhboot/efibootmgr/is … -290134093) I manually created a copy of grubx64.efi in EFI/Microsoft/Boot/BootMGFW.efi.

Today it happened again that, after powering on the laptop, I got the BIOS setup screen instead of GRUB. Differently from the past, when I rebooted from there, GRUB was loaded!

Now, efibootmgr gives the following output:

BootCurrent: 0001
Timeout: 1 seconds
BootOrder: 0001
Boot0001* Windows Boot Manager

I cannot explain why, but it seems that the entry associated with Windows Boot Manager is retained or re-created, while the GRUB entry (and - I assume - any other entry) are lost.

Offline

#23 2021-10-06 17:27:55

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,739

Re: Investigating corrupted GRUB/EFI

That sounds about right and really can only be chalked up to MSI having an absolutely abysmal UEFI implementation.

Online

Board footer

Powered by FluxBB