You are not logged in.
Hi,
I have been using Arch on my laptop for many years without any major issues, but yesterday, few days after my last update, my PC is not able to boot anymore.
After I did my last update (pacman -Syu) my PC was able to reboot properly, but not anymore. I didn't do any changes to the bootloader/GRUB. My disk is encrypted using LUKS.
Here is the issue I am observing: The grub boot is extremely slow. It can stay few minutes on the "GRUB loading. Welcome to GRUB!" message before trying to continue the boot process and then different errors will show up. Here is a list of errors I got after multiple reboots:
1)
Booting 'Arch Linux'
Loading Linux linux
Loading initial ramdisk
XZ-compressed data is corrupt
-- System halted
2)
GRUB loading.
Welcome to GRUB!
error: invalid extend.
Entering rescue mode...
grub rescue>
3)
GRUB loading.
Welcome to GRUB!
error: reloc offset is out of the segment.
Entering rescue mode...
grub rescue>
4)
GRUB loading.
Welcome to GRUB!
error: file '/grub/i386-pc/normal.mod' not found
Entering rescue mode...
grub rescue>
5)
Booting 'Arch Linux'
error: invalid arch-independent ELF magic.
Loading Linux linux
Loading initial ramdisk
Press any key to continue...
XZ-compressed data is corrupt
-- System halted
6)
Booting 'Arch Linux'
error: file '/grub/i386-pc/test.mod' not found
[...]
error: file '/grub/i386-pc/linux.mod' not found
Press any key to continue...
Failed to boot both default and fallback entries
Press any key to continue...
7)
Booting 'Arch Linux'
error: symbol
'grub? <GARBAGE DATA>' not found
Loading Linux linux
error: out of memory
Loading initial ramdisk
unaligned pointer 0x8
Aborted. Press any key to exit.
Also, I am able to boot successfully on liveUSB and mount my LUKS partition.
My PC is around 6 years old, and I am wondering if this issue could come from a hardware issue on the hard drive (corrupted blocks?).
If you know good tools to identify hard drive issues, please let me know.
Do you have ideas what might be the problem? Any help is welcome
Thank you
Last edited by steeve85 (2018-12-21 18:45:43)
Offline
From what you have described, this could just be a loose cable, or it could be that your boot device is dying.
From the liveCD, check the state of the boot device with smartctl. https://wiki.archlinux.org/index.php/S.M.A.R.T.
Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD
Making lemonade from lemons since 2015.
Offline
Your observation seems accurate. Here are the smartctl logs: https://0bin.net/paste/ohlHtWvAsFzsf7kP … 3wiHwI1SlV
After a reboot, the "UDMA_CRC_Error_Count" number increases by 5 to 8 according to my tests. So more than 30% of these errors must be recent as I rebooted my laptop many times in the last few days...
I will open my laptop to see if I can see/fix loose cable, if not that might be the end of life of my dear disk/laptop
But what I don't understand, is why I can access the disk from a LiveUSB while my system refuse to boot on the disk? ...
Last edited by steeve85 (2018-12-20 08:20:22)
Offline
I recently ran into a similar problem, so what happened to me may of interest to yourself.
Whilst editing my window manager configuration I managed to lock up the computer. I had to kill the power to reboot the machine. In my case GRUB was running into the same error, unable to load a specific module. I booted into the LiveUSB and I couldn't open the folder concerned. I was running into a file system error.
I then used dd to back up the entire laptop hard drive onto my external 1TB hard drive.
By checking journalctl I saw a recommendation to use fsck on the boot partition. I did this and said yes to every suggestion fsck made. I then had to delete every GRUB related folder I could find in the boot directory, EFI images and all, and then reinstall GRUB and reconfigure. My machine now boots as it did before.
I should point out the above steps took me around half a day from beginning to end.
In my case I was running into the same consistent error, so I imagine the file system must have been corrupted when I powered down the machine. Given the errors you are getting, I suggest you dd your partition onto an external hard drive and then replace your hardware. The different error messages sound like hard drive failure to me.
But what I don't understand, is why I can access the disk from a LiveUSB while my system refuse to boot on the disk? ...
Perhaps the corruption only affects the earliest sectors on your disk. My boot partition is only 260 MB, I had a corruption in that partition, but the rest of the hard disk, (360 GB) had no errors at all. It seems that when disk corruption occurs, it seems to affect the beginning of the disk exclusively.
Offline
The ability to mount does not imply the ability to consistently read. Since there're no re-allocations, no pending sectors and no offline uncorrectables, but the disk has passed 2 selftests (short + ext), it's indeed most likely the cable - or the controller.
You might want to try the disk attached to a different system.
When was the gravitational incident?
Offline
I did not drop it but I took several flights the day before it start showing these errors. Some shocks going through TSA might have damaged some old components...
Offline
The issue was coming from the connector. I was able to boot from USB and then boot when the disk was reconnected inside the laptop.
Thanks for the help
Offline