/home partition corrupts randomly

jiaweihli · 2012-03-12 02:31:59

My /home partition 'forgets' that it's ext4 every few restarts. Arch claims it's inconsistent when it tries to read it as an ext2 filesystem.
When I boot up gparted, it says the partition is unallocated.

Does anyone have any idea what's going on? Is my disk going bad? None of my other partitions/OSes have shown any signs of failing.

Last edited by jiaweihli (2012-03-12 02:44:29)

cfr · 2012-03-12 03:15:38

If your disk is not backed up, you should back it up. This is best anyway but especially if it even might be a disk problem. (And as I proved to myself last week, other problems can very quickly cause massive corruption as a side-effect even if the disk is fine. ) Do this *first*.

What is the content of /etc/fstab?

If you have an MBR partitioned disk, what is the output of fdisk -l? If you have a GPT partition, of gdisk -l? (Install gptfdisk if necessary. Or use parted to get the same information.) If you aren't sure which, try fdisk. If it warns you the disk is gpt, use gdisk.

Do you have SMART monitoring tools installed? smartmontools can get you a lot of information about your disks, at least if it is an hdd rather than an ssd. (I'm not sure about ssd.) I'd check the info from the disk first and then run an extended test on the disk and examine the results. You can also schedule various tests etc. depending on your hardware just to keep an eye on things.

meph · 2012-03-12 03:44:30

How do you fix it? I mean, if it's every few restarts, then I guess you somehow make it work again every time, so how?

jiaweihli · 2012-03-12 05:41:31

administrator@VS ~ $  > cat /etc/fstab
# 
# /etc/fstab: static file system information
#
# <file system> <dir>   <type>  <options>       <dump>  <pass>
tmpfs           /tmp    tmpfs   nodev,nosuid    0       0
/dev/sda6 / ext4 defaults 0 1
/dev/sda7 /home ext4 defaults 0 1
/dev/sda2 /media/data ntfs defaults,nofail 0 2

administrator@VS ~ $  > sudo fdisk -l /dev/sda

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x6658bd64

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048      206847      102400    7  HPFS/NTFS/exFAT
/dev/sda2       307408896   614402047   153496576    7  HPFS/NTFS/exFAT
/dev/sda3       614402048  1953525167   669561560    7  HPFS/NTFS/exFAT
/dev/sda4          208894   307408895   153600001    5  Extended
/dev/sda5       108752896   204802047    48024576   83  Linux
/dev/sda6          210944   108750847    54269952   83  Linux
/dev/sda7       204804096   307408895    51302400   83  Linux

Partition table entries are not in disk order

As irrelevant as this info might be: sda1 is my boot, sda3 is Windows, and sda5 is Ubuntu.
My home partition corrupted 3 times today, and my data partition corrupted once. I would 'fix' them by reformatting them (this is a fresh install of Arch, so I didn't lose anything in either partition)
Both of them are now working now. Is there a log file I can post that would be more helpful to getting to the bottom of this issue?

meph · 2012-03-12 12:26:51

I don't know what is it caused by, but since you're reformatting anyway, you might want to try formatting the partition to a different filesystem type as a diagnostic step. Prefferably something different to the ext family, for example reiserfs. It's not a solution, but may help identifying the culprit.

Also check your dmesg.log and kernel.log for errors, possibly just after it happens, so you know where to look. You can pastebin the logs, I can check them as well.

Hardware failure is possible but unlikely, the only reason I can imagine why whole partition would die so quickly is that journal would be stored on bad disk blocks. But ext4 checksums journal data, so it would know if it was corrupted. But if you want to make sure, check out 'gsmartcontrol', a handy GUI tool to test reliability of S.M.A.R.T enabled disks.

DSpider · 2012-03-12 20:14:55

Welcome to the forums.

A few small observations: Don't rely too much on SMART. It's accuracy is only very little above 50-50%. Instead, I think you should try SpinRite or HDD Regenerator to scan for bad sectors.

Is there a reason why you set "nofail" in fstab for /dev/sda2 ? Because according to the wiki, that's for external devices... And that "0 2" at the end is for Linux partitions only, not for NTFS. Read the wiki entry on what <pass> does.

cfr · 2012-03-13 00:57:30

Also, only one entry should be "0 1". home should be "0 2".

SMART isn't everything. I only suggested it because it certainly can be helpful. Last year, it gave me time to get everything off my drive before it finally died. However, I would be wary of gui tools. I have no experience with gsmartcontrol and it looks quite nice. Just do not be misled by a "PASS" for the drive's health status. The "overall" health of my drive was still "OK" even when it was probably beyond the point at which I could have rescued much data from it. You have to monitor the details it gives you for it to be of any use at all. (Also, I was on a Mac and lots of the utilities for dealing with bad sectors etc. seemed to be for Linux, for Windows or expensive!)

Last edited by cfr (2012-03-13 00:58:09)

thisoldman · 2012-03-13 06:10:11

Most HD manufacturers have free diagnostic tools available for download. You can find the drive model number with 'hdparm -I': My Seagate drive model number begins with the characters 'ST'; the Hitachi drive explicitly identifies itself, the Western Digital model number starts with 'WD'.

# hdparm -I /dev/sda | grep 'Model Number'
        Model Number:       ST500DM002-1BD142
# hdparm -I /dev/sdb | grep 'Model Number'
        Model Number:       Hitachi HDT725032VLA380
# hdparm -I /dev/sdc | grep 'Model Number'
        Model Number:       WDC WD3200AAKS-00VYA0

Links for the diagnostic tools:
Seagate: http://www.seagate.com/ww/v/index.jsp?l … 04090aRCRD
Hitachi: http://www.hitachigst.com/support/downloads/#DFT
Western Digital: http://support.wdc.com/product/download … 30&lang=en

I haven't done this but you could run the 'badblocks' program through 'e2fsck' with the '-c' option on an unmounted partition. I've found the manufacturers' diagnostic tools to be relatively easy to use and their instructions easier to understand than the manpage for 'badblocks'.

Last edited by thisoldman (2012-03-13 06:11:17)

jiaweihli · 2012-03-18 04:37:41

I ran into this issue again today - luckily, e2fsck was able to recover the partition.

Could this have anything to do with my hwclock getting set improperly? I'd been using the ntpd daemon, and constantly noticed my windows/linux partitions out of sync.
Also, Ubuntu claimed something about "times in the future" about something or other (vague, I know) that it was fixing.

ewaller · 2012-03-18 05:14:11

can you post the output of hwclock --debug

jiaweihli · 2012-03-19 05:20:10

administrator@VS ~ $  > sudo hwclock --debug
hwclock from util-linux 2.21
Using /dev interface to clock.
Last drift adjustment done at 1331688441 seconds after 1969
Last calibration done at 1331688441 seconds after 1969
Hardware clock is on UTC time
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
...got clock tick
Time read from Hardware Clock: 2012/03/19 00:18:10
Hw clock time : 2012/03/19 00:18:10 = 1332116290 seconds since 1969
Sun 18 Mar 2012 07:18:10 PM CDT  -0.844153 seconds

Awkwardly enough, it looks like the hardware clock is set to the correct time, but the time zone shifting causes it to be off by 5 hours.
i.e. 00:18 is correct, not 07:18

foppe · 2012-03-19 05:35:08

From the looks of it you may want to check this page http://linuxconfig.org/linux-wd-ears-advanced-format for issues with Advanced Format on certain HD Caviar Green disks.

Arch Linux

#1 2012-03-12 02:31:59

/home partition corrupts randomly

#2 2012-03-12 03:15:38

Re: /home partition corrupts randomly

#3 2012-03-12 03:44:30

Re: /home partition corrupts randomly

#4 2012-03-12 05:41:31

Re: /home partition corrupts randomly

#5 2012-03-12 12:26:51

Re: /home partition corrupts randomly

#6 2012-03-12 20:14:55

Re: /home partition corrupts randomly

#7 2012-03-13 00:57:30

Re: /home partition corrupts randomly

#8 2012-03-13 06:10:11

Re: /home partition corrupts randomly

#9 2012-03-18 04:37:41

Re: /home partition corrupts randomly

#10 2012-03-18 05:14:11

Re: /home partition corrupts randomly

#11 2012-03-19 05:20:10

Re: /home partition corrupts randomly

#12 2012-03-19 05:35:08

Re: /home partition corrupts randomly

Board footer