You are not logged in.

#1 2017-11-20 15:09:30

Wild Penguin
Member
Registered: 2015-03-19
Posts: 319

bcache suddenly failed, any way to recover the FS to check the logs?

Hi!

A little bit of background:

I have on my box bcache set up because I just wanted the little extra speed. I have one partition (well, most of my SSD) wet up as a cache partition (see below). There is a 4GB HDD with Arch Linux root and a data partition, both running as a bcache backing device with the same cache.

The story: After running this setup for almost two years, something went wrong while I was not attending the computer. After a regular upgrade with pacman (IIRC there was no kernel upgrade) and a reboot. After I came to back (a few minutes after the reboot!), mount of root had failed and the boot process had bailed out at the emergency shell. I hastily run fsck assuming it is some kind of trivial error ... after realizing I was being bombarded with seemingly neverending errors from fsck trying to rescue the filesystem, I interrupted it and shut down the computer. I was presuming there is a hard disk failure or something, and thought that running fsck will not make things any better.

This is the situation I am currently in. After checking from BIOS that there are no clear HW failures and rebooting to a live installation from USB (the next day which is today) all HDDs and the SSD seem to be in order according to smartctl. Except that ... well the data partition is recognized after attaching to bcache but is still full of errors. Moreover, the Arch Root partition is not recognized by bcache, but instead kernel logs claim that it has no bcache super block (which it definitely should have).

TL;DR: Root was installed on a bcache device, but bcache superblock is not recognized anymore because of some unknown error. Anyway to recover the FS on the broken bcache to check the logs in case there's any tips for what went wrong?

Here is the (relevant part of) the partition scheme:

Model: ATA Samsung SSD 850 (scsi)
Disk /dev/sdc: 250GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name     Flags
 1      1049kB  537MB   536MB   fat16           EFI      boot, esp
 2      537MB   16.9GB  16.4GB  linux-swap(v1)  swapssd
 3      16.9GB  250GB   233GB                   bcache

Model: ATA WDC WD40EFRX-68W (scsi)
Disk /dev/sdb: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name          Flags
 1      1049kB  1000GB  1000GB               bcache-arch
 2      1000GB  4001GB  3001GB               bcache-pool2

Here: bcache=cache (on the SSD), bcache-arch = root, bcache-pool2=data partition. EDIT: Both bcache-arch and bcache-pool2 were formatted as ext4; I can only fsck the latter since the former is not recognized!

There are other HDDs too but those are not running with bcache.

Sorry for a lack of information (such as the fstab) - but since the root FS is not accessible ATM, I can not read it and post it here. FWIW all FS were referenced with their FS UUIDs.

Thanks!

(good thing my data is intact)

Last edited by Wild Penguin (2017-11-20 15:17:49)

Offline

#2 2017-11-20 20:22:13

Wild Penguin
Member
Registered: 2015-03-19
Posts: 319

Re: bcache suddenly failed, any way to recover the FS to check the logs?

Ok, a little bit of something I've found while googleing and reading documents:

The filesystem should be available at offset 8192 in the device node, even if the superblock is corrupted (if someone is found in the same boat).

I've also noticed that there has been some changes to the bcache code recently. I'd doubt a serious bug would make it in to a released Kernel, but I'm using the testing repositories. It is possible something is broken in the current kernel , perhaps in some obscure situation. But I still haven't ruled out hardware malfunction (although everything seems to be working in the live installation and SMART data is OK).

But too late to check if my root FS is still there intact. Will try to mount it RO tomorrot smile

Cheers!

Offline

#3 2017-11-21 20:59:13

Wild Penguin
Member
Registered: 2015-03-19
Posts: 319

Re: bcache suddenly failed, any way to recover the FS to check the logs?

Ok,

There's a bug in the Kernel 4.14(.1), affecting bcache and other things too (actually it is some blocklayer subsystem, it's quite confusing for someone not being a developer - it can cause other breakage, too!).

A bug report is in the cooking!

In the meantime I'd advice against using the kernel in testing! See this Gentoo bug report: https://bugs.gentoo.org/638206

Offline

#4 2017-11-21 21:03:14

loqs
Member
Registered: 2014-03-06
Posts: 17,194

Re: bcache suddenly failed, any way to recover the FS to check the logs?

You could file a bug report here asking for https://git.kernel.org/pub/scm/linux/ke … rtno.patch to be pulled before 4.14.2 as the issue results in data loss.

Offline

#5 2017-11-21 21:08:02

frostschutz
Member
Registered: 2013-11-15
Posts: 1,409

Re: bcache suddenly failed, any way to recover the FS to check the logs?

I don't use kernel .0 releases for this kind of reason but a data corruption issue in a kernel .1? Haven't seen that in a while. Sh:t happens.

Good luck on the recovery.

Offline

#6 2017-11-21 21:14:36

Wild Penguin
Member
Registered: 2015-03-19
Posts: 319

Re: bcache suddenly failed, any way to recover the FS to check the logs?

I'd presume there is little I can report upstream (it is already known there and also the data loss(es) affecting some users, and a patch is already queued for 4.14 ).

Now it is in Arch bugzilla, too, so the Arch maintainers can take whatever steps they want / can / see appropriate...

If I've lost some data, that will be seen - and will be quite minimal in any case. Luckily I had some backup setups in place, and restoring the installation is trivial (but timeconsuming)  - and actually originally I just wanted to see the logs, since I was perplexed what is wrong - HW or SW - and how to prevent it from happening again. I got the answers I was after now...

EDIT: some typos and minor clarifications

Last edited by Wild Penguin (2017-11-21 21:16:49)

Offline

Board footer

Powered by FluxBB