Thinking the HDD is bad but seeking console [solved]

graysky · 2017-05-14 20:12:00

I have older 2TB Western Digital (WD20EARS) HDD that has approx 2 years of hours on it (light use during that time) I tried filling with random data as I am prepping it for use. A little over 1/3 of the way through the process, dd threw an I/O error that dmesg caught as well. I am thinking this disk is trashed but wanted to see what more experienced folks think.

EDIT: I will repeat the write out connecting the HDD via another controller to confirm it's not the onboard controller to blame.
EDIT2: Wow, there is massive speed difference using the SI-PEX40064 PCI-e Controller Card vs. the onboard Jmicron controller (about 6x faster)... the MB is 7 years old... let's see if it completes without errors.

% sudo cryptsetup open --type plain /dev/sdb partb --key-file /dev/random
% sudo dd if=/dev/zero of=/dev/mapper/partb status=progress
763866083840 bytes (764 GB, 711 GiB) copied, 34960.3 s, 21.8 MB/s                                 
dd: writing to '/dev/mapper/partb': Input/output error
1491925953+0 records in
1491925952+0 records out
763866087424 bytes (764 GB, 711 GiB) copied, 34970.2 s, 21.8 MB/s
sudo dd if=/dev/zero of=/dev/mapper/partb status=progress  350.27s user 3419.01s system 10% cpu 9:42:50.16 total

% dmesg
...
[39723.923044] ata12.00: exception Emask 0x0 SAct 0x800 SErr 0x0 action 0x0
[39723.923264] ata12.00: irq_stat 0x48000000
[39723.923383] ata12.00: failed command: READ FPDMA QUEUED
[39723.923540] ata12.00: cmd 60/08:58:c0:fb:ec/00:00:58:00:00/40 tag 11 ncq dma 4096 in
                        res 41/40:00:c0:fb:ec/00:00:58:00:00/40 Emask 0x409 (media error) <F>
[39723.924011] ata12.00: status: { DRDY ERR }
[39723.924132] ata12.00: error: { UNC }
[39723.930610] ata12.00: configured for UDMA/133
[39723.930620] sd 11:0:0:0: [sdb] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[39723.930623] sd 11:0:0:0: [sdb] tag#11 Sense Key : 0x3 [current] 
[39723.930625] sd 11:0:0:0: [sdb] tag#11 ASC=0x11 ASCQ=0x4 
[39723.930627] sd 11:0:0:0: [sdb] tag#11 CDB: opcode=0x28 28 00 58 ec fb c0 00 00 08 00
[39723.930629] blk_update_request: I/O error, dev sdb, sector 1491925952
[39723.937147] ata12: EH complete

% sudo smartctl --all /dev/sdb
...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   166   164   021    Pre-fail  Always       -       6658
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       271
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   081   081   000    Old_age   Always       -       14552
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       265
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       28
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1871
194 Temperature_Celsius     0x0022   118   104   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     14552         -
# 2  Extended offline    Completed without error       00%     11064         -
# 3  Short offline       Completed without error       00%      8990         -

Last edited by graysky (2017-05-15 19:09:15)

frostschutz · 2017-05-14 20:55:28

It has at least one unreadable sector. I would no longer trust this drive with data, even if it managed to reallocate those sectors. It's probably a hopeless case since you already have I/O errors when writing.

You should run SMART selftests (-t long or -t select) more regularly on your harddrives.

mich41 · 2017-05-14 20:56:06

hdparm --write-sector 1491925952 --yes-i-know-what-i-am-doing /dev/sdb

Will fix it. Given a crystal ball I could even tell you for how long. One disk gave me a bad sector once and never again, one every few weeks, another every few minutes.

edit:

I will repeat the write out connecting the HDD via another controller to confirm it's not the onboard controller to blame.

Meh, waste of time. It says "media error". And at the same time SMART shows exactly one bad sector. Coincidence? You decide

Last edited by mich41 (2017-05-14 21:16:14)

graysky · 2017-05-14 22:56:50

It just passed 940 GB copied without dd throwing an error or anything in dmesg... stilling going.

R00KIE · 2017-05-15 10:56:33

I'm not sure I would trust that drive any longer. Even if you test it more thoroughly with badblocks there is no guarantee that it will not throw more errors in the future.

I've experienced first hand a drive that would pass several full badblocks tests and then fail without any apparent pattern.

Ropid · 2017-05-15 12:17:07

You could still use it for something that's not important, just to see what happens. I had a HDD that started dying and added more broken sectors over the course of a week or two, but then the numbers stopped increasing. The drive then kept working fine for several years, then suddenly was just dead one day.

graysky · 2017-05-15 19:08:37

First, let me say that I agree with much of what has been said here and thanks to all who replied. After switching controllers, I was able to run the dd command 3 times without an error. Example below. As well, I see nothing in my SMART sensors to indicate a problem. Proceeding at-risk with non-critical data. Marking solved.

% sudo dd if=/dev/zero of=/dev/mapper/noise bs=4M status=progress
2000393601024 bytes (2.0 TB, 1.8 TiB) copied, 21238 s, 94.2 MB/s  
dd: error writing '/dev/mapper/noise': No space left on device
476933+0 records in
476932+0 records out
2000398934016 bytes (2.0 TB, 1.8 TiB) copied, 21261.4 s, 94.1 MB/s
sudo dd if=/dev/zero of=/dev/mapper/noise bs=4M status=progress  1.35s user 1991.44s system 9% cpu 5:54:21.76 total

And

% sudo smartctl -a /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.13-1-ARCH] (local build)
...
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   081   081   000    Old_age   Always       -       14574
...
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     14567         -

R00KIE · 2017-05-15 21:07:58

Just writing to the disk is not going to tell you if the disk is ok, you need to write a known pattern and then read it back and confirm it is correct, the usual tool for the job is badblocks.

Arch Linux

#1 2017-05-14 20:12:00

Thinking the HDD is bad but seeking console [solved]

#2 2017-05-14 20:55:28

Re: Thinking the HDD is bad but seeking console [solved]

#3 2017-05-14 20:56:06

Re: Thinking the HDD is bad but seeking console [solved]

#4 2017-05-14 22:56:50

Re: Thinking the HDD is bad but seeking console [solved]

#5 2017-05-15 10:56:33

Re: Thinking the HDD is bad but seeking console [solved]

#6 2017-05-15 12:17:07

Re: Thinking the HDD is bad but seeking console [solved]

#7 2017-05-15 19:08:37

Re: Thinking the HDD is bad but seeking console [solved]

#8 2017-05-15 21:07:58

Re: Thinking the HDD is bad but seeking console [solved]

Board footer