You are not logged in.
Hi all,
I've recently received the following error (relating to my hard drive):
Waiting 10 seconds for f4017da9 ...
Waiting 10 seconds for 478ff45c ...Eventually the waiting fails and I get dumped to an emergency shell. After two reboots everything was back to normal.
Then a day later, I managed to boot up all the way, only to have the / drive (SSD) but no /home (HDD). Again, reboot got things back to normal.
How can I test the HDD to see if it's failing? I ran the smartctl short test which completed without error, and I started the long test but can't tell if it's still running.
Thanks,
Lefty
Last edited by LeftyAce (2018-07-01 22:26:52)
Offline
smartctl -a /dev/sdaCould it be a loose connection? How are the drives wired?
Offline
The machine is a laptop (should have mentioned that), so the SSD is more of a card, the HDD is a regular laptop drive.
I managed to run the SMART full test and it passed. I'll open it up and make sure the HDD is well seated.
Offline
Please post the entire output - the table tells more about pot. problems than a single test.
Offline
Here's the output of systemctl -a /dev/sda:
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 181) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 192 187 021 Pre-fail Always - 1400
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2806
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 4490
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2717
191 G-Sense_Error_Rate 0x0032 001 001 000 Old_age Always - 217
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 67
193 Load_Cycle_Count 0x0032 116 116 000 Old_age Always - 252690
194 Temperature_Celsius 0x0022 121 096 000 Old_age Always - 26 (Min/Max 23/26)
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
254 Free_Fall_Sensor 0x0032 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Aborted by host 90% 4490 -
# 2 Extended offline Completed without error 00% 4490 -
# 3 Extended offline Aborted by host 90% 4486 -
# 4 Short offline Completed without error 00% 4486 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.Offline
That drive looks healthy.
Does this only happen on battery or also on external power supply?
Edit: and do you have a parallel windows installation?
Last edited by seth (2018-06-28 06:14:10)
Offline
I would look into this: '193 Load_Cycle_Count 0x0032 116 116 000 Old_age Always - 252690'.
The value of 252690 is a high number so you might want to do something to keep the drive from constantly parking its heads.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
No parallel windows installation.
My drive setup is a bit complicated:
/boot is on an external USB.
/ is on the internal SSD.
/home is split on the internal SSD and the internal HDD (using bcache, so the SSD partition is a 32GB cache for the HDD).
Both / and the bcache volume are encrypted using dmcrypt+luks.
R00kie, where should I start to investigate the large number of head parks? Is there some way to find out which process is spinning up the HDD? You're absolutely right that the HDD is parking and unparking frequently, I can hear it. Given the bcache setup, I'm surprised though, if most reads/writes are coming off the SSD, the HDD should just stay off.
Offline
Offline
Thanks graysky.
I'm a bit confused. It looks like my disk shouldn't be spinning down at all:
# hdparm -B /dev/sdb
/dev/sdb:
APM_level = 128The wiki says "Values from 1 to 127 permit spin-down, whereas values from 128 to 254 do not. "
Offline
I'm going to guess (because you deleted that part of the output of hdparm) that you have a western digital drive. For those you probably want to set the APM level to 254 or 255, whichever makes the drive stop parking and unparking the heads.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Thanks R00KIE! I actually didn't delete any of the hdparm output, what I posted was all I got. But you're right, it's a WD HDD. I've set the hdparm value to 254 and so far the parking seems to have stopped.
Question: I'm using tlp (https://wiki.archlinux.org/index.php/TLP), and I can specify hdparm settings for both my drives. Is there any point in setting values for the SSD? Do they do anything?
Offline
SSDs have no head to rest ;-)
On the original topic: try the "rootwait" or "rootdelay=30" (this will wait 30 seconds before attempting to mount root) parameters - the decryption probably takes too long. On a hunch, this could be related to the crng issue, https://bugs.archlinux.org/task/58355
Offline
Thanks Seth. If this happens again (it hasn't happened since I first posted) I'll add the rootdelay term.
At this point I don't know if it's solved or not, but I'll mark it [solved] :-)
Offline