You are not logged in.

#1 2018-06-27 04:32:32

LeftyAce
Member
Registered: 2012-08-18
Posts: 162

[Solved] Intermittent errors - Hard drive failing?

Hi all,

I've recently received the following error (relating to my hard drive):

Waiting 10 seconds for f4017da9 ...
Waiting 10 seconds for 478ff45c ...

Eventually the waiting fails and I get dumped to an emergency shell. After two reboots everything was back to normal.

Then a day later, I managed to boot up all the way, only to have the / drive (SSD) but no /home (HDD). Again, reboot got things back to normal.

How can I test the HDD to see if it's failing? I ran the smartctl short test which completed without error, and I started the long test but can't tell if it's still running.

Thanks,
Lefty

Last edited by LeftyAce (2018-07-01 22:26:52)

Offline

#2 2018-06-27 06:31:44

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,177

Re: [Solved] Intermittent errors - Hard drive failing?

smartctl -a /dev/sda

Could it be a loose connection? How are the drives wired?

Offline

#3 2018-06-27 14:23:45

LeftyAce
Member
Registered: 2012-08-18
Posts: 162

Re: [Solved] Intermittent errors - Hard drive failing?

The machine is a laptop (should have mentioned that), so the SSD is more of a card, the HDD is a regular laptop drive.

I managed to run the SMART full test and it passed. I'll open it up and make sure the HDD is well seated.

Offline

#4 2018-06-27 14:55:21

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,177

Re: [Solved] Intermittent errors - Hard drive failing?

Please post the entire output - the table tells more about pot. problems than a single test.

Offline

#5 2018-06-28 02:27:34

LeftyAce
Member
Registered: 2012-08-18
Posts: 162

Re: [Solved] Intermittent errors - Hard drive failing?

Here's the output of systemctl -a /dev/sda:

Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 181) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x7035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   192   187   021    Pre-fail  Always       -       1400
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2806
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4490
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2717
191 G-Sense_Error_Rate      0x0032   001   001   000    Old_age   Always       -       217
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       67
193 Load_Cycle_Count        0x0032   116   116   000    Old_age   Always       -       252690
194 Temperature_Celsius     0x0022   121   096   000    Old_age   Always       -       26 (Min/Max 23/26)
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
254 Free_Fall_Sensor        0x0032   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%      4490         -
# 2  Extended offline    Completed without error       00%      4490         -
# 3  Extended offline    Aborted by host               90%      4486         -
# 4  Short offline       Completed without error       00%      4486         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Offline

#6 2018-06-28 06:13:42

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,177

Re: [Solved] Intermittent errors - Hard drive failing?

That drive looks healthy.
Does this only happen on battery or also on external power supply?

Edit: and do you have a parallel windows installation?

Last edited by seth (2018-06-28 06:14:10)

Offline

#7 2018-06-28 18:37:07

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: [Solved] Intermittent errors - Hard drive failing?

I would look into this: '193 Load_Cycle_Count        0x0032   116   116   000    Old_age   Always       -       252690'.

The value of 252690 is a high number so you might want to do something to keep the drive from constantly parking its heads.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#8 2018-07-01 15:49:31

LeftyAce
Member
Registered: 2012-08-18
Posts: 162

Re: [Solved] Intermittent errors - Hard drive failing?

No parallel windows installation.

My drive setup is a bit complicated:

/boot is on an external USB.
/ is on the internal SSD.
/home is split on the internal SSD and the internal HDD (using bcache, so the SSD partition is a 32GB cache for the HDD).

Both / and the bcache volume are encrypted using dmcrypt+luks.

R00kie, where should I start to investigate the large number of head parks? Is there some way to find out which process is spinning up the HDD? You're absolutely right that the HDD is parking and unparking frequently, I can hear it. Given the bcache setup, I'm surprised though, if most reads/writes are coming off the SSD, the HDD should just stay off.

Offline

#9 2018-07-01 15:54:19

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,731
Website

Re: [Solved] Intermittent errors - Hard drive failing?

Offline

#10 2018-07-01 16:18:59

LeftyAce
Member
Registered: 2012-08-18
Posts: 162

Re: [Solved] Intermittent errors - Hard drive failing?

Thanks graysky.

I'm a bit confused. It looks like my disk shouldn't be spinning down at all:

# hdparm -B /dev/sdb

/dev/sdb:
 APM_level      = 128

The wiki says "Values from 1 to 127 permit spin-down, whereas values from 128 to 254 do not. "

Offline

#11 2018-07-01 20:46:56

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: [Solved] Intermittent errors - Hard drive failing?

I'm going to guess (because you deleted that part of the output of hdparm) that you have a western digital drive. For those you probably want to set the APM level to 254 or 255, whichever makes the drive stop parking and unparking the heads.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#12 2018-07-01 21:35:07

LeftyAce
Member
Registered: 2012-08-18
Posts: 162

Re: [Solved] Intermittent errors - Hard drive failing?

Thanks R00KIE! I actually didn't delete any of the hdparm output, what I posted was all I got. But you're right, it's a WD HDD. I've set the hdparm value to 254 and so far the parking seems to have stopped.

Question: I'm using tlp (https://wiki.archlinux.org/index.php/TLP), and I can specify hdparm settings for both my drives. Is there any point in setting values for the SSD? Do they do anything?

Offline

#13 2018-07-01 22:14:19

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,177

Re: [Solved] Intermittent errors - Hard drive failing?

SSDs have no head to rest ;-)

On the original topic: try the  "rootwait" or "rootdelay=30" (this will wait 30 seconds before attempting to mount root) parameters - the decryption probably takes too long. On a hunch, this could be related to the crng issue, https://bugs.archlinux.org/task/58355

Offline

#14 2018-07-01 22:26:35

LeftyAce
Member
Registered: 2012-08-18
Posts: 162

Re: [Solved] Intermittent errors - Hard drive failing?

Thanks Seth. If this happens again (it hasn't happened since I first posted) I'll add the rootdelay term.

At this point I don't know if it's solved or not, but I'll mark it [solved] :-)

Offline

Board footer

Powered by FluxBB