You are not logged in.

#1 2023-04-28 01:58:08

Oreolek
Member
Registered: 2013-11-17
Posts: 5

[Bad SSD] Can't recover the system

I had an outage and it corrupted my root filesystem.
I managed to repair the Pacman database and extract the package list.

A sane thing would be to reinstall the system, rigth? But now when I try:

 pacman -S --overwrite="*" $(< pkglist) 

the system loses the SSD drive halfway through. I/O errors, then fsck can't even find the superblock. I have to reboot and start over, and it happens again. (It's a lot of packages so installation runs for several hours even on SSD.)
I changed the SATA cable. Drive is only 6 months old. What can it be?

Last edited by Oreolek (2023-05-01 10:16:40)

Offline

#2 2023-04-28 07:02:52

mpan
Member
Registered: 2012-08-01
Posts: 1,188
Website

Re: [Bad SSD] Can't recover the system

A broken SSD or motherboard?


Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#3 2023-04-28 10:52:23

Oreolek
Member
Registered: 2013-11-17
Posts: 5

Re: [Bad SSD] Can't recover the system

SMART says okay and motherboard wouldn't break only 5-6 hours after booting the system, would it?

I'm checking if it's NCQ

Last edited by Oreolek (2023-04-28 10:53:09)

Offline

#4 2023-04-28 20:30:58

mpan
Member
Registered: 2012-08-01
Posts: 1,188
Website

Re: [Bad SSD] Can't recover the system

S.M.A.R.T. collects information about storage itself⁽¹⁾ and storage access mechanism. It does not provide information about the link or device’s electronics.⁽²⁾ Motherboard may fail at any moment and it may be load related. Not saying that certainly any of these happen in your case. But in absence of indicators other than “fails under load” I see it as a solid guess.

If disabling NCQ helped, please mark the thread as solved.
____
⁽¹⁾ Platters in HDDs, flash memory in SSDs.
⁽²⁾ It does report temperature, but the sensor placement is device-specific. It provides the current temperature too, not temperature history.


Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#5 2023-04-29 11:34:21

Oreolek
Member
Registered: 2013-11-17
Posts: 5

Re: [Bad SSD] Can't recover the system

No, disabling NCQ didn't help. I tried installing packages one by one but suddenly got (retyping this manually):

[22234.507038] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[22234.507785] ata5.00: failed command: WRITE DMA EXT
[22234.508490] ata5.00: cmd 35/00:10:68:96:b7/00:08:11:00:00/e0 tag 11 dma 1056768 out
             res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[22234.509879] ata5.00: status: { DRDY }
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
[...]
ata5: limiting SATA link speed to 1.5 Gbps
ata5: hard resetting link
ata5: COMRESET failed (errno=-16)
ata5: reset failed, giving up
ata5.00: disable device
ata5: EH complete
sd 4:0:0:0: [sdd] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=90s
sd 4:0:0:0: [sdd] tag#13 CDB: Write(10) 2a 00 11 b7 96 68 00 08 10 00
I/O error, dev sdd, sector .... op 0x1:(WRITE) flags 0x0 phys_seg 61 prio class 2
EXT4-fs warning (device sdd1): ext4_end_bio:343: I/O error 10 writing to inode ... starting block ...)

UEFI setup now displays the SSD in the list of drives as "SATA2_4: , SN: None Size: 0.0GB, Max. UDMA: 0, Max. Speed: Unknown, S.M.A.R.T: Not Supported"

UPD: well it's most likely the SATA socket on motherboard so this is semi-solved for now.

UPD2: just installed the system on an HDD. The motherboard is fine, the SATA cable is fine, new system doesn't have issues so it was a Netac SSD dying too quickly.

Last edited by Oreolek (2023-05-01 10:17:52)

Offline

#6 2023-05-01 11:20:59

mpan
Member
Registered: 2012-08-01
Posts: 1,188
Website

Re: [Bad SSD] Can't recover the system

To be sure, you may test the SSD on another computer. If it fails there too under similar load, it’s very high chance it was the sole cause of the problem.

If it does not fail, unfortunately no clear information is delivered.


Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

Board footer

Powered by FluxBB