You are not logged in.
Hi everybody,
My server crashed in a strange way and will not boot from the ssd anymore.
Before I realized something was wrong, the server continued working, the plex server worked, but when I ssh'ed in, I got a strange prompt and the machine complained about
- .... stale file handle....
- no commands worked, "ls" returned "command not found" or something similar
- I had to use my password to login via ssh, the key I usually use to log in passwordless didn't work.
- complained about the dm-crypt somehow, saying it was now read only.
Unfortunately I have write this from memory as I haven't been able to start the server since I shut it down. I had to hold down the power button to actually power off the server.
When I tried to boot it again I got the blinking message "missing OS", and it hasn't booted since.
I have used an arch image on a usb to look at the ssd with the OS, and I can mount the EFI partition, the boot partition and I can decrypt the luks dm-crypt and mount the btrfs root subvolume. But the computer will not boot off the ssd.
My setup is as follows: Arch installed on a computer with an Asus p9d ws motherboard, intended to work as a server, with gpt partitioned ssd into tree partitions
1. EFI partition since previously I booted the arch image into EFI mode.
2. Boot partition in ext2 since I have a fully encrypted system, with grub as my bootloader.
3. A third partition encrypted with dm-crypt luks, with a btrfs filesystem with different subvolumes for /root, /home, /var, /var/log.
The server's workload and tasks was as follows: it was running zfsonlinux on some other drives, it was running nfs, and it was running docker with a plex server.
Now, trying to find out why my motherboard doesn't want to boot from my ssd anymore, I used an arch image on a usb drive, and now it boots into bios mode instead of uefi as before! I also tried an ubuntu image and that also booted into bios mode. I cannot for the life of me find anything in the motherboard bios (uefi/bios to be precise, the interface to the motherboard software, just to make sure there are no misunderstandings) that toggles uefi mode on or off. And why that should have changed while the computer was running normally is beyond me.
Now, I don't really care about bios or uefi, the machine will only boot once a month or even more seldom So, I have one possible solution to my problem, I can just change my ssd partition to use bios instead of uefi.
But I don't know if I will break my system or not by changing from bios to uefi booting? How would I actually do this without changing anything on my btrfs subvolumes /root and /home and so on.
I should also mention that when I boot the arch image from usb, I checked "journalctl -xe" but found no errors or warnings.
Should I then follow the arch wiki installation guide, mount all my partitions, chroot in, and then remake grub? Then use gdisk to set my number 2 partition to the "legacy bootable" flag? Will that be enough?
By the way, I have tried making partitions 1 and 2 "legacy bootable" separately, but to no avail.
Do I have to destroy my number 1 EFI partition, and merge partitions 1 and 2 with gdisk, and then mount the system and chroot in and remake grub?
Or would I have to actually run pacstrap and sort of remake the entire system. Hopefully without destroying anything. I will of course make a snapshot of my btrfs subvolumes before doing anything like this.
Have any of you had this strange problem before? Do you have any good suggestions for me?
The reason I'm requesting help is that I do not want to inadvertantly break my system, and that I suspect that there is a really easy solution that I don't see now, and that some of the knowledgeable archers sees.
Last edited by archphys (2015-10-12 17:02:43)
Offline
So if, as you say, you cannot boot in EFI mode, then forget about the EFI partitions (FYI, you only ever need 1 EFI partition). reinstall Grub to the /boot folder
sudo pacman -S grub
This will automatically reinstall the kernal image and initramfs to /boot.
Offline
Thanks for the reply. Your suggestion to reinstall grub didn't work, but probably only because I found there was a file system error on my EFI partition. I fscked it and I found an error.
To get my server working I ignored the EFI partition, flagged the boot partition as legacy bootable in gdisk and installed syslinux, using bios instead of uefi.
In summary, I believe it was a faulty file system error.
Offline
Keep in mind that UEFI is no longer exposing itself to the OS as you're booting into legacy BIOS mode (CSM, Compatability Support Module). It means that Linux now relies on other code to communicate with the firmware and the UEFI is now emulating a legacy BIOS. After checking and repairing your disk, reinstall the GRUB EFI file on the EFI partition.
Just a tip.
Offline