You are not logged in.

#1 2018-09-23 22:56:24

Brunste
Member
From: United States
Registered: 2018-01-28
Posts: 19

[SOLVED] Unable to fix corrupted file system

My system had a complete freeze up for some reason earlier this morning and I (for some very stupid reason) force powered down it down after being unable to get it to respond again. After rebooting it, my system got stuck on performing fsck on my drive, spitting out a whole lot of information which I've included below:

[  74.746788] ata4.00: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x6 frozen
[  74.746788] ata4.00: failed command: READ FPDMA QUEUED
[  74.746788] ata4.00: cmd 60/f8:40:38:9e:9f/00:00:60:00:00/40 tag 8 ncq dma 126976 in
[  74.746788]               res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  74.746788] ata4.00: status {DRDY}
[105.466779] ata4.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x6 frozen
[105.466779] ata4.00: failed command: READ FPDMA QUEUED
[105.466779] ata4.00: cmd 60/f8:48:38:9e:9f/00:00:60:00:00/40 tag 9 ncq dma 126976 in
[105.466779]               res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[105.466779] ata4.00: status {DRDY}
[137.466779] ata4.00: exception Emask 0x0 SAct 0x800 SErr 0x0 action 0x6 frozen
[137.466779] ata4.00: failed command: READ FPDMA QUEUED
[137.466779] ata4.00: cmd 60/f8:58:38:9e:9f/00:00:60:00:00/40 tag 11 ncq dma 126976 in
[137.466779]               res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[137.466779] ata4.00: status {DRDY}
[168.613440] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[168.613440] ata4.00: failed command: READ DMA EXT
[168.613440] ata4.00: cmd 25/00:f8:38:38:9e:9f/00:00:60:00:00/e0 tag 13 dma 126976 in
[168.613440]               res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[168.613440] ata4.00: status {DRDY}

I hand typed all of that from an image I took, so if anything seems strange, I may have typed it wrong.

This output continued for several minutes, alternating between READ DMA EXT and READ FPDMA QUEUED. Every so often, I'd also get a few of the following lines sprinkled in:

 [201.444191] print_req_error: I/O error, dev sdb, sector 1621073464 

Finally, fsck finished with the following output:

[961.269926] print_req_error: I/O error, dev sdb, sector 1289328648
[961.269926] buffer I/O error on dev sdb4, logical block 1, lost async page write
[961.269926] buffer I/O error on dev sdb4, logical block 2, lost async page write
[961.269926] buffer I/O error on dev sdb4, logical block 3, lost async page write
[961.269926] buffer I/O error on dev sdb4, logical block 4, lost async page write
[961.269926] buffer I/O error on dev sdb4, logical block 5, lost async page write
[961.269926] buffer I/O error on dev sdb4, logical block 6, lost async page write
[961.269926] buffer I/O error on dev sdb4, logical block 7, lost async page write
[961.269926] print_req_error: I/O error, dev sdb, sector 1717148888
[961.269926] print_req_error: I/O error, dev sdb, sector 1717148904
[961.269926] print_req_error: I/O error, dev sdb, sector 1717148848
[961.269926] print_req_error: I/O error, dev sdb, sector 1721342480
[961.269926] print_req_error: I/O error, dev sdb, sector 1725536936
ERROR: Bailing out. Run 'fsck /dev/sdb4' manually

I was then dropped me in a [rootfs ]# terminal. I tried the fsck command it told me to run, and it responded that there was a bad superblock and suggested I try using fsck from another. I did not know how to find a superblock besides using dumpe2fs, which was not available, so I rebooted my computer and decided to try from my installation USB. After rebooting, I realized the boot option for my Arch Linux was missing. Only the windows boot loader (I have a dual boot which works fine) and my USB were listed. I went ahead and booted into the USB regardless.

I was able to get the superblocks from dumpe2fs, but running fsck says that the recovery flag was not set in the backup superblock, that it was running the journal anyway, says that it is recovering the journal, then just seems to hang for a long time. The /dev/sdb4 partition my install is on is only ~300GB, and I allowed it to run for longer than the fsck that ran at boot. I am also unable to mount into /dev/sdb4 using the mount command, as it hangs just the same.

I would give more specific details for the output of trying to run fsck from the backup superblock, but I did not want to try to recreate the output out of fear of making the situation worse. Any suggestions on how to proceed from here? I do not have a large enough storage device to dd a drive image  so that's sort of off the table until I get the chance to order one.

Last edited by Brunste (2018-09-27 17:51:47)

Offline

#2 2018-09-23 23:27:13

Slithery
Administrator
From: Norfolk, UK
Registered: 2013-12-01
Posts: 5,776

Re: [SOLVED] Unable to fix corrupted file system

At this point it's usually just quickest and easiest to restore your backup onto a new drive (or the current one once you've ascertained it's health status).


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan
Closing -- for deletion; Banning -- for muppetry. - jasonwryan

aur - dotfiles

Offline

#3 2018-09-23 23:44:50

Brunste
Member
From: United States
Registered: 2018-01-28
Posts: 19

Re: [SOLVED] Unable to fix corrupted file system

Slithery wrote:

At this point it's usually just quickest and easiest to restore your backup onto a new drive (or the current one once you've ascertained it's health status).

Do you mean the superblock I mentioned? If so, how do I best go about doing that.

If you're talking about a backup located on another drive, let's pretend, hypothetically speaking, that one does not exist.

Offline

#4 2018-09-24 06:32:50

seth
Member
Registered: 2012-09-03
Posts: 51,165

Re: [SOLVED] Unable to fix corrupted file system

Stop trying to operate on that disk, you probably worsen things.
Get a new drive, boot some live distro (eg. grml is designed for disaster relief) and dd_rescue the broken drive.
Operate on that copy.
Pray.

This disk is very most likely toast, you're lucky if you can read anything from it.

Offline

#5 2018-09-24 11:20:02

Brunste
Member
From: United States
Registered: 2018-01-28
Posts: 19

Re: [SOLVED] Unable to fix corrupted file system

seth wrote:

Stop trying to operate on that disk, you probably worsen things.
Get a new drive, boot some live distro (eg. grml is designed for disaster relief) and dd_rescue the broken drive.
Operate on that copy.
Pray.

This disk is very most likely toast, you're lucky if you can read anything from it.

What about the drive suggests its a goner if my other operating system partition seems to be fine?

Last edited by Brunste (2018-09-24 11:20:24)

Offline

#6 2018-09-24 12:18:32

seth
Member
Registered: 2012-09-03
Posts: 51,165

Re: [SOLVED] Unable to fix corrupted file system

The I/O errors in comment #1
Naturally not all sectors are affected at the same time, but that's a really bad sign.
So #1, you don't trust the device any longer; #2 you can run smart and badblocks to check how much is actually damaged.
But if this can not assigned to an isolated incident (HDD ./. gravity) the'll be overall wear and the bad blocks are going to spread.

Offline

#7 2018-09-24 12:26:22

Brunste
Member
From: United States
Registered: 2018-01-28
Posts: 19

Re: [SOLVED] Unable to fix corrupted file system

Gotchya. I'll go ahead and order a new drive to see if I can dd_rescue to data to it. Think it's safe to run SMART to see just how bad the issue is? If so, I'll do it when I get home tonight and post the results.

Offline

#8 2018-09-24 12:29:55

seth
Member
Registered: 2012-09-03
Posts: 51,165

Re: [SOLVED] Unable to fix corrupted file system

The smart test will attempt to read every sector. The first thing you should do is to clone the data, so you don't care about further damage to the device. Afterwards you can and should test the hell out of it.

Offline

#9 2018-09-27 17:47:53

Brunste
Member
From: United States
Registered: 2018-01-28
Posts: 19

Re: [SOLVED] Unable to fix corrupted file system

Just a follow up post to say that I ended up buying a new drive and used GNU ddrescue to recover what seems to be all of my data with no problem. I connected the disk to my server machine (also running Arch), and followed the examples included in the ddrescue documentation. I would highly recommend reading over the entire thing if you're attempting something similar to this, because I wasted a lot of time just going off the examples for a quick solution. After reading the entire guide, I figured it out in a quarter of the time I lost.

For a quick slightly-more-detailed explanation, I started out just trying to recover the /dev/sdwX partition that was corrupted to /dev/sdzY, a new ext4 partition I just made on my data drive. I set the minimum read speed to 40000MB/s. It froze up a few times, so I rebooted the system, read the current position it was at last from the mapfile, and restarted it a little bit past that position. I repeated this every time it got stuck.

ddrescue -f -n -a 40000000 /dev/sdwX /dev/sdzY mapfile
        (Rebooted after it got stuck and read <pos> from the mapfile. Added v spaces to the position to get past the problem sector)
ddrescue -f -n -a 40000000 -i <pos + v> /dev/sdwX /dev/sdzY mapfile
        (Rebooted and restarted every time it got stuck after this, updating <pos> each time)

After about 90% was completed, it kept freezing up and removing /dev/sdwX from my system. I tried to continue a few more times, even trying adding the reverse flag (-R), but to no avail. I decided I would just try to fix what remained on the file system.

e2fsck -v -f -y /dev/sdzY

I would probably recommend not including -y if you had sometime important on your drive. I didn't, and also didn't want to hold enter for an hour so I just threw that in there. After watching numbers flash on my screen for about an hour, it finally stopped and most of my files were there. I've transferred most of them over to my new Arch install already, and they seem to be good.

I'll mark this thread as solved, but I'll come back to throw in the SMART data once I get a chance to run it on the old drive.

Last edited by Brunste (2018-09-27 17:52:52)

Offline

Board footer

Powered by FluxBB