You are not logged in.
I'm trying to determine if my disk is dying. It's 7 years old, I wouldn't be surprised if its got some flaws, but the results are confusing.
Smartctl is showing nothing seems to be wrong, but I get disk hangs ~1-2 times a day where my iowait spikes at random times. During the iowait spikes everything hangs, for example if I open a new terminal the cursor will sit at the top and blink but the machine/user name doesn't show yet and i can't get commands through. It can last 10-20 seconds. The journal points to what seems to be disk errors, but I'm not sure how to interpret it, my smart output doesn't seem to have counted errors but the available tests are passed.
Is the smart result unreliable here? I assume it might be failing in ways not covered by the smart tests
Smart output:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.2-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: LITEON IT L8T-256L9G
Serial Number: 002452106954
Firmware Version: H881202
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
TRIM Command: Available
Device is: Not in smartctl database 7.3/5319
ATA Version is: ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 4a
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Mar 4 18:32:23 2023 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 10) seconds.
Offline data collection
capabilities: (0x15) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0003 100 100 000 Pre-fail Always - 0
9 Power_On_Hours 0x0002 100 100 000 Old_age Always - 7591
12 Power_Cycle_Count 0x0003 100 100 000 Pre-fail Always - 4437
177 Wear_Leveling_Count 0x0003 100 100 000 Pre-fail Always - 207531
178 Used_Rsvd_Blk_Cnt_Chip 0x0003 100 100 000 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0003 100 100 000 Pre-fail Always - 0
182 Erase_Fail_Count_Total 0x0003 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0003 100 100 000 Pre-fail Always - 0
188 Command_Timeout 0x0003 100 100 000 Pre-fail Always - 202
189 Unknown_SSD_Attribute 0x0003 100 100 000 Pre-fail Always - 302
191 Unknown_SSD_Attribute 0x0003 100 100 000 Pre-fail Always - 0
192 Power-Off_Retract_Count 0x0003 100 100 000 Pre-fail Always - 91
196 Reallocated_Event_Count 0x0003 100 100 000 Pre-fail Always - 0
198 Offline_Uncorrectable 0x0003 100 100 000 Pre-fail Always - 0
199 UDMA_CRC_Error_Count 0x0003 100 100 000 Pre-fail Always - 2933
232 Available_Reservd_Space 0x0003 100 100 010 Pre-fail Always - 0
241 Total_LBAs_Written 0x0003 100 100 000 Pre-fail Always - 574081
242 Total_LBAs_Read 0x0003 100 100 000 Pre-fail Always - 604093
SMART Error Log Version: 0
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 7590 -
# 2 Extended offline Completed without error 00% 6974 -
# 3 Extended offline Completed without error 00% 6184 -
# 4 Short offline Completed without error 00% 4750 -
# 5 Extended offline Completed without error 00% 1478 -
# 6 Short offline Completed without error 00% 1376 -
# 7 Short offline Completed without error 00% 0 -
# 8 Short offline Completed without error 00% 0 -
# 9 Short offline Completed without error 00% 0 -
Selective Self-tests/Logging not supported
the logs vary, saying similar yet different things every time
example journal entry #1:
Mar 04 17:56:27 box kernel: ata5.00: exception Emask 0x0 SAct 0xca001 SErr 0x50000 action 0x6 frozen
Mar 04 17:56:27 box kernel: ata5: SError: { PHYRdyChg CommWake }
Mar 04 17:56:27 box kernel: ata5.00: failed command: READ FPDMA QUEUED
Mar 04 17:56:27 box kernel: ata5.00: cmd 60/00:00:e0:ac:58/01:00:06:00:00/40 tag 0 ncq dma 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 04 17:56:27 box kernel: ata5.00: status: { DRDY }
Mar 04 17:56:27 box kernel: ata5.00: failed command: READ FPDMA QUEUED
Mar 04 17:56:27 box kernel: ata5.00: cmd 60/00:68:e0:ad:58/01:00:06:00:00/40 tag 13 ncq dma 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 04 17:56:27 box kernel: ata5.00: status: { DRDY }
Mar 04 17:56:27 box kernel: ata5.00: failed command: READ FPDMA QUEUED
Mar 04 17:56:27 box kernel: ata5.00: cmd 60/08:78:08:50:91/00:00:14:00:00/40 tag 15 ncq dma 4096 in
res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Mar 04 17:56:27 box kernel: ata5.00: status: { DRDY }
Mar 04 17:56:27 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Mar 04 17:56:27 box kernel: ata5.00: cmd 61/10:90:c0:28:e0/00:00:11:00:00/40 tag 18 ncq dma 8192 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 04 17:56:27 box kernel: ata5.00: status: { DRDY }
Mar 04 17:56:27 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Mar 04 17:56:27 box kernel: ata5.00: cmd 61/30:98:d0:28:e0/00:00:11:00:00/40 tag 19 ncq dma 24576 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 04 17:56:27 box kernel: ata5.00: status: { DRDY }
Mar 04 17:56:27 box kernel: ata5: hard resetting link
Mar 04 17:56:27 box kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 04 17:56:27 box kernel: ata5.00: configured for UDMA/133
Mar 04 17:56:27 box kernel: ata5.00: device reported invalid CHS sector 0
Mar 04 17:56:27 box kernel: sd 4:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=44s
Mar 04 17:56:27 box kernel: sd 4:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current]
Mar 04 17:56:27 box kernel: sd 4:0:0:0: [sda] tag#0 Add. Sense: Unaligned write command
Mar 04 17:56:27 box kernel: sd 4:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 06 58 ac e0 00 01 00 00
Mar 04 17:56:27 box kernel: I/O error, dev sda, sector 106474720 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 2
Mar 04 17:56:27 box kernel: sd 4:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=39s
Mar 04 17:56:27 box kernel: sd 4:0:0:0: [sda] tag#13 Sense Key : Illegal Request [current]
Mar 04 17:56:27 box kernel: sd 4:0:0:0: [sda] tag#13 Add. Sense: Unaligned write command
Mar 04 17:56:27 box kernel: sd 4:0:0:0: [sda] tag#13 CDB: Read(10) 28 00 06 58 ad e0 00 01 00 00
Mar 04 17:56:27 box kernel: I/O error, dev sda, sector 106474976 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 2
Mar 04 17:56:27 box kernel: ata5: EH complete
example journal entry #2:
Feb 28 21:01:33 box kernel: ata5.00: exception Emask 0x0 SAct 0x7fcc0070 SErr 0x40000 action 0x6 frozen
Feb 28 21:01:33 box kernel: ata5: SError: { CommWake }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/10:20:10:c5:ef/00:00:11:00:00/40 tag 4 ncq dma 8192 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/18:28:30:c5:ef/00:00:11:00:00/40 tag 5 ncq dma 12288 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:30:20:c5:ef/00:00:11:00:00/40 tag 6 ncq dma 4096 out
res 40/00:01:97:10:a9/00:00:02:00:00/40 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/10:90:f0:78:19/00:00:02:00:00/40 tag 18 ncq dma 8192 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:98:90:10:a9/00:00:02:00:00/40 tag 19 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:b0:a8:50:11/00:00:06:00:00/40 tag 22 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:b8:80:50:11/00:00:06:00:00/40 tag 23 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:c0:30:50:11/00:00:06:00:00/40 tag 24 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:c8:78:50:51/00:00:1a:00:00/40 tag 25 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:d0:38:50:91/00:00:15:00:00/40 tag 26 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:d8:78:50:51/00:00:15:00:00/40 tag 27 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:e0:c0:6f:95/00:00:13:00:00/40 tag 28 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:e8:c0:47:94/00:00:13:00:00/40 tag 29 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 28 21:01:33 box kernel: ata5.00: cmd 61/08:f0:28:c5:ef/00:00:11:00:00/40 tag 30 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 28 21:01:33 box kernel: ata5.00: status: { DRDY }
Feb 28 21:01:33 box kernel: ata5: hard resetting link
Feb 28 21:01:33 box kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 28 21:01:33 box kernel: ata5.00: configured for UDMA/133
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5.00: device reported invalid CHS sector 0
Feb 28 21:01:33 box kernel: ata5: EH complete
Last edited by Bumble (2023-03-09 02:57:21)
Offline
Looks a lot like a cable issue to me - though I'm very far from an expert on such things (ewaller may chime in).
Open it up and check the cables - or just unplug/replug the drive cables at each end.
I don't know how SMART works under the hood, but it's reasonable to suspect that it ignores link errors at the OS level of commands not getting to the disk as it may focus just on assessing how well the disk handles data that gets to it.
Last edited by Trilby (2023-03-04 23:59:16)
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
Looks a lot like a cable issue to me - though I'm very far from an expert on such things (ewaller may chime in).
Open it up and check the cables - or just unplug/replug the drive cables at each end.
I don't know how SMART works under the hood, but it's reasonable to suspect that it ignores link errors at the OS level of commands not getting to the disk as it may focus just on assessing how well the disk handles data that gets to it.
That wouldn't be so bad, thanks for your input. I'll hold off a bit to see if anyone else has any additional wisdom. I've never cracked open laptops before but I suppose I might need to try soon.
Offline
What kind of laptop is it? If it's an IBM or Lenovo, go for it - they're basically industrial strength lego kits. If it's a apple / macbook, don't even consider it - you'd sooner be able to put back together an egg after cooking the omlet. Most other brands are somewhere between these two extremes.
Last edited by Trilby (2023-03-05 01:47:09)
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
It's an acer. I've found videos and instruction sets of people taking apart this exact model. It shouldn't be too bad.
Offline
Mar 04 17:56:27 box kernel: ata5: SError: { PHYRdyChg CommWake }
Offline
Mar 04 17:56:27 box kernel: ata5: SError: { PHYRdyChg CommWake }
i wasn't using tlp at all, but i've enabled max_performance so i suppose i'll know in a day or two
Offline
Did you enable ALPM by other means (powertop, laptop-mode-tools, etc)?
What was the value of /sys/class/scsi_host/host*/link_power_management_policy ?
Offline
Did you enable ALPM by other means (powertop, laptop-mode-tools, etc)?
What was the value of /sys/class/scsi_host/host*/link_power_management_policy ?
without tlp is
med_power_with_dipm
med_power_with_dipm
med_power_with_dipm
med_power_with_dipm
med_power_with_dipm
med_power_with_dipm
after starting tlp.service, which is what i'm currently trying
max_performance
max_performance
max_performance
max_performance
max_performance
max_performance
edit: i missed your first question somehow, no i don't recall knowing of any way that it would become enabled, this is a device i keep plugged in so i don't need power saving controls
edit 2: i haven't been running into the same problem, i'm going to mark this as solved. this seems to have done the trick, thank you
Last edited by Bumble (2023-03-09 02:56:21)
Offline