You are not logged in.

#1 2024-01-07 12:16:20

qi437103
Member
Registered: 2014-09-21
Posts: 3
Website

[SOLVED] Btrfs error preventing boot

Hi, I am facing a weird btrfs error on my server's root partition. Would appreciate any help to debug this.

Hardware: Samsung Evo 980. I recently moved it from one m2 slot to another, and have been facing the error since then.

Partition layout: this is the only storage device in the system, a single btrfs partition takes the whole drive minus efi and swap. The partition is automatically mounted by systems during start up via systemd-gpt-auto-generator.

Booting setup: unified kernel image built with dracut, boot loader is systemd-boot.

The error: during booting, kernel log is flooded with logs similar to:

BTRFS error (device nvme0n1p2: state EA): dev  /dev/nvme0n1p2 errs: wr 147, rd xxx, flush 0, corrupt 0, gen 0

The re number keeps growing while the wr number is fixed.

This renders the system not bootable because the root partition apparently isn't readable.

The strange thing is that if I boot into live cd, the partition is perfectly mountable (and no error with BTRFS check), and I even chrooted into it to do a full upgrade. What puzzles me is what is the difference between the host system and livecd that could cause the issue.

Any pointers would be really helpful, thanks!

Last edited by qi437103 (2024-01-14 21:46:18)

Offline

#2 2024-01-14 15:56:35

ua4000
Member
Registered: 2015-10-14
Posts: 421

Re: [SOLVED] Btrfs error preventing boot

different kernel version between live cd and installed system ?

And why did you swap m2 slots ? Can you swap back ? At least for testing ?

Offline

#3 2024-01-14 17:18:43

-thc
Member
Registered: 2017-03-15
Posts: 502

Re: [SOLVED] Btrfs error preventing boot

If you inserted a second M.2 SSD to the old slot the numbering may have changed:

/dev/nvme0n1 -> /dev/nvme0n2

You can boot into an arch live install medium and use the nvme-cli tool:

nvme list

Offline

#4 2024-01-14 21:44:37

qi437103
Member
Registered: 2014-09-21
Posts: 3
Website

Re: [SOLVED] Btrfs error preventing boot

I finally got this solved by removing a bunch of kernel command lines I added before

pcie_aspm=off nvme_core.default_ps_max_latency_us=1500 hugepages=16 default_hugepagesz=1G hugepagesz=1G isolcpus=0,1,2,3,4,5,6,7,8,9,10,11

I'm not sure which one exactly did the trick, but I highly suspect pcie_aspm=off is the offender here. It was added previously to silence some errors in the kernel log for 8086:7abc device. But I don't see those errors anymore even after remove the paramter.

nvme_core.default_ps_max_latency_us=1500 was to fix the notorious Samsung Evo 980 SSD reporting 84C temperature firmware bug. I may try to add this one back later and see if it would break anything.

hugepages and isolcpus were added for the Windows KVM that was on the server and are not needed anymore.

See below for some more details and my responses to questions for documentation purposes. Thanks!

---

ua4000 wrote:

different kernel version between live cd and installed system ?

Kernel versions are indeed different between the livecd (6.2) and the host (6.6). But I also tried to downgrade the host to lts 6.1 kernel and that didn't work either.

ua4000 wrote:

And why did you swap m2 slots ? Can you swap back ? At least for testing ?

Previously I had a Windows KVM with passthrough GPU and NVME SSD, and the server's root SSD was on an expansion card. Now I moved all Windows VM-related hardware to a new physical PC that I just built, so I moved back the server's SSD to the M2 slot on the MB.

I'd prefer not to swap it back since it's really hard to pull the server out again and open it up... tongue

-thc wrote:

the old slot the numbering may have changed

That indeed changes. But it shouldn't and doesn't cause any problems because I didn't hardcode the number anywhere in the system. All mounting is done via partition UUID. The fact that the filesystem was mounted successfully tells me that at least the partition was found. The symptom was that reading the top-level directory structure was fine, i.e.,

ls /

, but not when listing one level deeper

ls /usr

Offline

Board footer

Powered by FluxBB