You are not logged in.

#1 2013-06-30 17:47:35

jschuster
Member
Registered: 2013-06-30
Posts: 14
Website

Intermittent HDD failures

My hard drive (a WD Blue / SE / SE16 (SATA II)) has been failing off and on. I'll have my system on and running fine for a day or so, then it suddenly seems to lose the connection to the hard drive and can't run anything. At that point, whenever I try to do anything involving that drive, I get an error along the lines of "partition/device not found". Doing "sudo reboot" doesn't seem to help, but doing a full shutdown, waiting a minute or two, and then starting up again seems to make it work again - at least until it loses the connection in the next 24 hours or so. Fortunately I can at least run a few basic tools, since the drive has only /home mounted on it; the root partition is on another drive.

I've run both the short and long tests with smartctl, but they don't detect any errors. Any diagnostic tips?

Offline

#2 2013-06-30 17:52:27

bjornoslav
Member
Registered: 2011-11-01
Posts: 137

Re: Intermittent HDD failures

Replace the SATA power and data cables and try again.


asus ux303la, core i5@1.6ghz, 8 gb ram, 500gb hdd, hd4400 gpu, crux x64 with openbox

Offline

#3 2013-06-30 22:15:22

Roken
Member
From: South Wales, UK
Registered: 2012-01-16
Posts: 1,251

Re: Intermittent HDD failures

Also - look at other drives on the sata bus - e.g. CD/DVD - I've been burned with one of them failing and the system sees the failure on the HD.


Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus Prime B450 Plus, 32Gb Corsair DDR4, Cooler Master N300 chassis, 5 HD (1 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703

Offline

#4 2013-07-01 19:41:54

jschuster
Member
Registered: 2013-06-30
Posts: 14
Website

Re: Intermittent HDD failures

I replaced the SATA cable yesterday, and it's still working today. I'll keep my fingers crossed, but it looks like that was the problem. Thanks for the help.

Offline

#5 2013-07-03 13:54:59

jschuster
Member
Registered: 2013-06-30
Posts: 14
Website

Re: Intermittent HDD failures

I've had the same problem at least a couple of times since my last post. It's not the SATA cable, and I tried switching around the different power connectors (the cable is connected directly to my power supply, so I have no way of using a different cable). The same drive always fails, so it doesn't seem to be a cabling issue.

I haven't had a chance to look at the DVD drive, but I'll check it out. Otherwise, I have a couple of backup drives I may swap in temporarily and see if they do the trick.

Offline

#6 2013-07-03 14:58:30

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,739

Re: Intermittent HDD failures

Anything interesting the journal or in the output of dmesg regarding the drive after a failure or when it comes back to life?


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#7 2013-07-06 16:36:47

jschuster
Member
Registered: 2013-06-30
Posts: 14
Website

Re: Intermittent HDD failures

Nothing terribly interesting when it boots up - all looks normal. When the drive fails, I see the following from dmesg:

ata4.00: exception Emask 0x50 SAct 0x0 SErr 0x890800 action 0xe frozen
ata4: SError: { HostInt PHYRdyChg 10B8B LinkSeq }
ata4.00: failed command: WRITE DMA
ata4.00: cmd ca/00:08:47:19:c4/00:00:00:00:00/e2 tag 0 dma 4096 out
             res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x54 (ATA bus error)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link down (SStatus 0 SControl 300)
ata4: hard resetting link
ata4: SATA link down (SStatus 0 SControl 300)
ata4: limiting SATA link speed to 1.5 Gbps
ata4: hard resetting link
ata4: STA link down (SStatus 0 SControl 310)
ata4.00: disabled
ata4.00: deviced reported invalid CHS sector 0
sd 3:0:0:0: [sdb]

There are a bunch more errors and things after that that look similar - it looks like it's just trying to reset the connection and get it working, but nothing ends up succeeding.

Is it possible this could be an overheating issue? It's been really hot around here lately (highs in the high 90s), so perhaps my fans haven't been able to keep up.

I also haven't tried switching to a different port on the motherboard. I'll unplug the DVD drive for a bit and try a different motherboard port and see what happens.

Offline

#8 2013-07-22 15:59:18

Markus00000
Member
Registered: 2011-03-27
Posts: 318

Re: Intermittent HDD failures

I had exactly the same issue (drive goes off till after the next shutdown, no SMART errors etc.) with a WD drive in my notebook.

After this went on for much too long and I failed to make any sense of this behavior, I replaced the drive. The new one has not gone off since.

Offline

#9 2013-07-22 16:05:19

skottish
Forum Fellow
From: Here
Registered: 2006-06-16
Posts: 7,942

Re: Intermittent HDD failures

Any time that you suspect drive issues, go to the manufacturers site and download their disk utilities. Everyone has an ISO that can thoroughly check the disk. Searching through the logs and such is a waste of time when it comes to disk issues; non-dedicated software doesn't understand hardware issues.

Last edited by skottish (2013-07-22 16:06:30)

Offline

#10 2013-07-23 06:22:45

Markus00000
Member
Registered: 2011-03-27
Posts: 318

Re: Intermittent HDD failures

skottish wrote:

Everyone has an ISO that can thoroughly check the disk.

Do they more than running SMART tests? For example, Western Digital provides "Data Lifeguard Diagnostics" which let's you do the following:

  • QUICK TEST - performs SMART drive quick self-test to gather and verify the Data Lifeguard information contained on the drive.

  • EXTENDED TEST - performs a Full Media Scan to detect bad sectors. Test may take several hours to complete depending on the size of the drive.

  • WRITE ZEROS - writes zeros to the drive with options of Full Erase and Quick Erase. File system and data will be lost.

  • VIEW TEST RESULT

Except "write zeros" it sounds very much like being SMART tests.

Offline

#11 2013-07-23 13:52:21

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Intermittent HDD failures

Even the "WRITE ZEROS" can be done by issuing a "security erase" or "security erase enhanced" depending on what the drive supports, this can be done with hdparm.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#12 2013-07-23 14:12:30

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: Intermittent HDD failures

I don't know but it is possible that some manufacturer's might provide more informed diagnostics than smartmontools, especially if your drive is not in the database and the tools have trouble interpreting the data it provides. If your disk is well supported by the generic tools that might be different, I guess.

In any case, have you gotten data from smartctl, examined the smart error log and run the available tests?


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#13 2013-07-23 14:19:15

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: Intermittent HDD failures

I use the Seagate and WD diagnostics often. Both give you a result code if the drive fails which you will need to get an RMA. Even if you're out of warranty you can get a straightforward idea of whether or not the drive is bad, which the output of smartctl doesn't always give you.


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#14 2013-07-24 13:44:16

jschuster
Member
Registered: 2013-06-30
Posts: 14
Website

Re: Intermittent HDD failures

I did run the smartctl tests (both the short and long ones) a while back, with no errors. I may run the WD boot diagnostics at some point, but it looks like it might take some doing to create a boot USB stick (I think I'd have to install FreeDOS on it and then add the diagnostic tool).

I'm becoming more and more convinced it was just an overheating issue, though. The drive hasn't failed since July 16th, which is probably around when I dusted out the case, and the weather has started to cool off since then, too. I'll post again if it fails again and I run the diagnostics, but for now I'm crossing my fingers that everything is okay.

Offline

Board footer

Powered by FluxBB