You are not logged in.

#1 2022-09-02 14:55:40

lnumines
Member
Registered: 2022-09-02
Posts: 19

I/O error out of the blue

I use XFCE with Pipewire.
I was AFK, with open firefox tabs. I came back and my internet doesn't worked. When i tried to open terminal it said I/O error.
All icons was gone, i wasn't able to open any program. I wasn't able to shutdown so i halted it.

I don't know why it happened. I tried to read logs but I don't think it has a log because disk wasn't working. (i'm not sure though if disk wasn't working)
I runned journalctl --verify and there is some corruptions i guess?

41afe8: Invalid object: Bad message                              
File corruption detected at xxxx

693ed8: Invalid entry item (15/25) offset: 000000                
693ed8: Invalid object contents: Bad message                     
File corruption detected at xxxx

693ed8: Invalid entry item (15/25) offset: 000000                
693ed8: Invalid object contents: Bad message                     
File corruption detected at xxxx

When i run

last|grep boot

it has,

reboot   system boot  5.19.5-arch1-1   Mon Aug  8 15:35 - 00:17 (22+08:42)
reboot   system boot  5.19.5-arch1-1   Mon Aug  8 15:35 - 01:07 (21+09:32)
reboot   system boot  5.19.4-arch1-1   Mon Aug  8 15:35 - 00:20 (21+08:45)

I don't think there was a 5.19.5 kernel on Aug 8.

I want to know why it happened. Thanks in advance.

Kernel: 5.19.5-arch1-1
XFCE, Pipewire, Firefox, btop on background and vanilla arch things
I didn't do any partial upgrades.

I don't have any bad sectors on my hdd and smart stats seem good.

Last edited by lnumines (2022-09-02 17:11:28)

Offline

#2 2022-09-02 15:36:07

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,492

Re: I/O error out of the blue

I don't have any bad sectors on my hdd and smart stats seem good.

Post the output of "smartctl -a" for the device and check the system journal (or dmesg if you haven't rebooted) for the nature of the IO errors (could be the bus or link, aka "cable")

Online

#3 2022-09-02 15:48:46

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

seth wrote:

I don't have any bad sectors on my hdd and smart stats seem good.

Post the output of "smartctl -a" for the device and check the system journal (or dmesg if you haven't rebooted) for the nature of the IO errors (could be the bus or link, aka "cable")

Sure:

=== START OF INFORMATION SECTION ===
Model Family:     deleted
Device Model:     deleted
Serial Number:    deleted
LU WWN Device Id: deleted
Firmware Version: deleted
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database deleted
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Sep  2 18:39:27 2022 +03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (  96) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x1035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   090   006    Pre-fail  Always       -       185549128
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   071   071   020    Old_age   Always       -       29744
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   074   060   030    Pre-fail  Always       -       64881459538
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5525 (13 42 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   020    Old_age   Always       -       3403
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       494
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       4295098372
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   062   050   045    Old_age   Always       -       38 (Min/Max 29/38)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       1160
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       245
193 Load_Cycle_Count        0x0032   066   066   000    Old_age   Always       -       68130
194 Temperature_Celsius     0x0022   038   050   000    Old_age   Always       -       38 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       8
240 Head_Flying_Hours       0x0000   094   094   000    Old_age   Offline      -       5376 (136 253 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       28419069573
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       52677307617
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 523 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 523 occurred at disk power-on lifetime: 5160 hours (215 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0f 40 97 09  Error: UNC at LBA = 0x0997400f = 160907279

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 20 fc 06 82 40 00      01:30:14.252  READ FPDMA QUEUED
  61 00 00 88 95 0a 4d 00      01:30:14.234  WRITE FPDMA QUEUED
  61 00 00 88 94 0a 4d 00      01:30:14.234  WRITE FPDMA QUEUED
  61 00 00 88 93 0a 4d 00      01:30:14.233  WRITE FPDMA QUEUED
  61 00 00 88 92 0a 4d 00      01:30:14.233  WRITE FPDMA QUEUED

Error 522 occurred at disk power-on lifetime: 5160 hours (215 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0e 40 97 09  Error: UNC at LBA = 0x0997400e = 160907278

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 80 18 64 3e 47 00      01:30:11.136  READ FPDMA QUEUED
  60 00 40 ba 24 95 4b 00      01:30:11.130  READ FPDMA QUEUED
  60 00 40 1a 0b 94 4b 00      01:30:11.126  READ FPDMA QUEUED
  60 00 20 8d c2 97 4b 00      01:30:11.123  READ FPDMA QUEUED
  60 00 40 3a e0 94 4b 00      01:30:11.119  READ FPDMA QUEUED

Error 521 occurred at disk power-on lifetime: 5160 hours (215 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0d 40 97 09  Error: UNC at LBA = 0x0997400d = 160907277

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 12 ce 20 40 00      01:30:08.049  READ FPDMA QUEUED
  60 00 08 a0 8d 4b 4d 00      01:30:08.045  READ FPDMA QUEUED
  60 00 80 28 59 f7 49 00      01:30:08.037  READ FPDMA QUEUED
  60 00 80 48 02 4c 47 00      01:30:08.033  READ FPDMA QUEUED
  60 00 20 b5 b7 97 4b 00      01:30:08.027  READ FPDMA QUEUED

Error 520 occurred at disk power-on lifetime: 5160 hours (215 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0c 40 97 09  Error: UNC at LBA = 0x0997400c = 160907276

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ca d9 c5 42 00      01:30:04.679  READ FPDMA QUEUED
  60 00 80 80 3b 7c 4d 00      01:30:04.679  READ FPDMA QUEUED
  60 00 80 80 3a 64 4d 00      01:30:04.679  READ FPDMA QUEUED
  61 00 00 c0 ee 41 4d 00      01:30:04.677  WRITE FPDMA QUEUED
  61 00 00 88 13 3f 4d 00      01:30:04.677  WRITE FPDMA QUEUED

Error 519 occurred at disk power-on lifetime: 5160 hours (215 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0b 40 97 09  Error: UNC at LBA = 0x0997400b = 160907275

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 7a 94 93 4b 00      01:30:01.759  READ FPDMA QUEUED
  60 00 80 90 5c 64 4d 00      01:30:01.755  READ FPDMA QUEUED
  61 00 10 00 31 58 4d 00      01:30:01.754  WRITE FPDMA QUEUED
  61 00 70 90 bf f6 49 00      01:30:01.753  WRITE FPDMA QUEUED
  60 00 08 38 a8 47 47 00      01:30:01.750  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Completed without error       00%      5512         -
# 2  Short offline       Completed without error       00%      5512         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Last edited by lnumines (2022-09-02 17:10:56)

Offline

#4 2022-09-02 16:49:37

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

I still need help!

Offline

#5 2022-09-02 17:00:32

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,492

Re: I/O error out of the blue

Don't bump!
https://wiki.archlinux.org/title/Genera … es#Bumping

Also edit your post and wrap the output in code tags, https://bbs.archlinux.org/help.php#bbcode

187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       494

"Ungood", though seems to have been 365h ago.

seth wrote:

check the system journal (or dmesg if you haven't rebooted) for the nature of the IO errors (could be the bus or link, aka "cable")

But also run an extended self-test.

Online

#6 2022-09-02 17:10:12

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

seth wrote:

Don't bump!
https://wiki.archlinux.org/title/Genera … es#Bumping

Also edit your post and wrap the output in code tags, https://bbs.archlinux.org/help.php#bbcode

187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       494

"Ungood", though seems to have been 365h ago.

seth wrote:

check the system journal (or dmesg if you haven't rebooted) for the nature of the IO errors (could be the bus or link, aka "cable")

But also run an extended self-test.

Sorry.

Sure.

Yes i've seen it too. It was 365 hours ago so it isn't it.

I'll run extended self-test.

Offline

#7 2022-09-02 17:13:37

schard
Forum Moderator
From: Hannover
Registered: 2016-05-06
Posts: 2,657
Website

Re: I/O error out of the blue

I think this

Error 523 occurred at disk power-on lifetime: 5160 hours (215 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0f 40 97 09  Error: UNC at LBA = 0x0997400f = 160907279

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

Is more of an issue.
I'd recommend swapping out the drive asap.


Inofficial first vice president of the Rust Evangelism Strike Force

Offline

#8 2022-09-02 17:15:56

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

schard wrote:

I think this

Error 523 occurred at disk power-on lifetime: 5160 hours (215 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0f 40 97 09  Error: UNC at LBA = 0x0997400f = 160907279

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

Is more of an issue.
I'd recommend swapping out the drive asap.

I will swap disk probably but i just wanna be sure if that's about disk or it was my fault, or some bug? I'm running extended self-test at the moment.
I think it's not about cable because when i rebooted after the incident it found disk without a flaw. So it's probably about disk itself or something else. Not sure.

Last edited by lnumines (2022-09-02 17:20:21)

Offline

#9 2022-09-02 17:23:01

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,492

Re: I/O error out of the blue

@schard, there was a flurry of those, but look at the timestamps.

seth wrote:

"Ungood", though seems to have been 365h ago.

@lnumines

seth twice before wrote:

check the system journal (or dmesg if you haven't rebooted) for the nature of the IO errors (could be the bus or link, aka "cable")

Online

#10 2022-09-02 17:28:40

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

seth wrote:

@schard, there was a flurry of those, but look at the timestamps.

seth wrote:

"Ungood", though seems to have been 365h ago.

@lnumines

seth twice before wrote:

check the system journal (or dmesg if you haven't rebooted) for the nature of the IO errors (could be the bus or link, aka "cable")

I have a system journal dated 8 Aug 2022 with 5.19.5 kernel but there wasn't 5.19.5 kernel at that moment. It's corrupted i think and it has I/O errors.

Aug 08 15:35:40 archlinux kernel: ata1.00: exception Emask 0x52 SAct 0xe0000 SErr 0xc0c00 action 0x6 frozen
Aug 08 15:35:40 archlinux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 08 15:35:40 archlinux kernel: ata1: SError: { Proto HostInt CommWake 10B8B }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/00:88:e0:d4:a5/02:00:04:00:00/40 tag 17 ncq dma 262144 in
                                           res 40/00:98:58:1b:82/00:00:1c:00:00/40 Emask 0x52 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/08:90:a8:27:75/00:00:12:00:00/40 tag 18 ncq dma 4096 in
                                           res 40/00:98:58:1b:82/00:00:1c:00:00/40 Emask 0x52 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/08:98:58:1b:82/00:00:1c:00:00/40 tag 19 ncq dma 4096 in
                                           res 40/00:98:58:1b:82/00:00:1c:00:00/40 Emask 0x52 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1: hard resetting link
Aug 08 15:35:40 archlinux kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 08 15:35:40 archlinux kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x100)
Aug 08 15:35:40 archlinux kernel: ata1.00: revalidation failed (errno=-5)
Aug 08 15:35:40 archlinux kernel: ata1: hard resetting link
Aug 08 15:35:40 archlinux kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 08 15:35:40 archlinux kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x100)
Aug 08 15:35:40 archlinux kernel: ata1.00: revalidation failed (errno=-5)
Aug 08 15:35:40 archlinux kernel: ata1: limiting SATA link speed to 3.0 Gbps
Aug 08 15:35:40 archlinux kernel: ata1: hard resetting link
Aug 08 15:35:40 archlinux kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Aug 08 15:35:40 archlinux kernel: ata1.00: configured for UDMA/133
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=11s
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#17 Sense Key : Illegal Request [current] 
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#17 Add. Sense: Unaligned write command
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 04 a5 d4 e0 00 02 00 00
Aug 08 15:35:40 archlinux kernel: I/O error, dev sda, sector 77976800 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=11s
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#18 Sense Key : Illegal Request [current] 
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#18 Add. Sense: Unaligned write command
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#18 CDB: Read(10) 28 00 12 75 27 a8 00 00 08 00
Aug 08 15:35:40 archlinux kernel: I/O error, dev sda, sector 309667752 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Aug 08 15:35:40 archlinux kernel: ata1: EH complete
Aug 08 15:35:40 archlinux kernel: ata1.00: exception Emask 0x10 SAct 0x7020 SErr 0x280100 action 0x6 frozen
Aug 08 15:35:40 archlinux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 08 15:35:40 archlinux kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/08:28:18:41:52/00:00:0f:00:00/40 tag 5 ncq dma 4096 in
                                           res 40/00:70:e0:e1:a5/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/18:60:98:8a:5d/00:00:08:00:00/40 tag 12 ncq dma 12288 in
                                           res 40/00:70:e0:e1:a5/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/08:68:00:78:0c/00:00:0f:00:00/40 tag 13 ncq dma 4096 in
                                           res 40/00:70:e0:e1:a5/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/00:70:e0:e1:a5/01:00:04:00:00/40 tag 14 ncq dma 131072 in
                                           res 40/00:70:e0:e1:a5/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1: hard resetting link
Aug 08 15:35:40 archlinux kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Aug 08 15:35:40 archlinux kernel: ata1.00: configured for UDMA/133
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#5 Sense Key : Illegal Request [current] 
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#5 Add. Sense: Unaligned write command
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#5 CDB: Read(10) 28 00 0f 52 41 18 00 00 08 00
Aug 08 15:35:40 archlinux kernel: I/O error, dev sda, sector 257048856 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#12 Sense Key : Illegal Request [current] 
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#12 Add. Sense: Unaligned write command
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#12 CDB: Read(10) 28 00 08 5d 8a 98 00 00 18 00
Aug 08 15:35:40 archlinux kernel: I/O error, dev sda, sector 140348056 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#13 Sense Key : Illegal Request [current] 
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#13 Add. Sense: Unaligned write command
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#13 CDB: Read(10) 28 00 0f 0c 78 00 00 00 08 00
Aug 08 15:35:40 archlinux kernel: I/O error, dev sda, sector 252475392 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#14 Sense Key : Illegal Request [current] 
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#14 Add. Sense: Unaligned write command
Aug 08 15:35:40 archlinux kernel: sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 04 a5 e1 e0 00 01 00 00
Aug 08 15:35:40 archlinux kernel: I/O error, dev sda, sector 77980128 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
Aug 08 15:35:40 archlinux kernel: ata1: EH complete
Aug 08 15:35:40 archlinux kernel: kauditd_printk_skb: 1 callbacks suppressed
Aug 08 15:35:40 archlinux kernel: ata1.00: exception Emask 0x10 SAct 0xc0080 SErr 0x280100 action 0x6 frozen
Aug 08 15:35:40 archlinux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 08 15:35:40 archlinux kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/08:38:30:33:7d/00:00:22:00:00/40 tag 7 ncq dma 4096 in
                                           res 40/00:98:e0:26:a6/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/00:90:e0:25:a6/01:00:04:00:00/40 tag 18 ncq dma 131072 in
                                           res 40/00:98:e0:26:a6/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/00:98:e0:26:a6/01:00:04:00:00/40 tag 19 ncq dma 131072 in
                                           res 40/00:98:e0:26:a6/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Aug 08 15:35:40 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:40 archlinux kernel: ata1: hard resetting link
Aug 08 15:35:41 archlinux kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Aug 08 15:35:41 archlinux kernel: ata1.00: configured for UDMA/133
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#7 Sense Key : Illegal Request [current] 
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#7 Add. Sense: Unaligned write command
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#7 CDB: Read(10) 28 00 22 7d 33 30 00 00 08 00
Aug 08 15:35:41 archlinux kernel: I/O error, dev sda, sector 578630448 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#18 Sense Key : Illegal Request [current] 
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#18 Add. Sense: Unaligned write command
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#18 CDB: Read(10) 28 00 04 a6 25 e0 00 01 00 00
Aug 08 15:35:41 archlinux kernel: I/O error, dev sda, sector 77997536 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#19 Sense Key : Illegal Request [current] 
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#19 Add. Sense: Unaligned write command
Aug 08 15:35:41 archlinux kernel: sd 0:0:0:0: [sda] tag#19 CDB: Read(10) 28 00 04 a6 26 e0 00 01 00 00
Aug 08 15:35:41 archlinux kernel: I/O error, dev sda, sector 77997792 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
Aug 08 15:35:41 archlinux kernel: ata1: EH complete
Aug 08 15:35:41 archlinux kernel: ata1: limiting SATA link speed to 1.5 Gbps
Aug 08 15:35:41 archlinux kernel: ata1.00: exception Emask 0x10 SAct 0x8000000 SErr 0x280100 action 0x6 frozen
Aug 08 15:35:41 archlinux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 08 15:35:41 archlinux kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
Aug 08 15:35:41 archlinux kernel: ata1.00: failed command: READ FPDMA QUEUED
Aug 08 15:35:41 archlinux kernel: ata1.00: cmd 60/00:d8:e0:2b:a6/01:00:04:00:00/40 tag 27 ncq dma 131072 in
                                           res 40/00:d8:e0:2b:a6/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Aug 08 15:35:41 archlinux kernel: ata1.00: status: { DRDY }
Aug 08 15:35:41 archlinux kernel: ata1: hard resetting link
Aug 08 15:35:42 archlinux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 08 15:35:42 archlinux kernel: ata1.00: configured for UDMA/133
Aug 08 15:35:42 archlinux kernel: sd 0:0:0:0: [sda] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Aug 08 15:35:42 archlinux kernel: sd 0:0:0:0: [sda] tag#27 Sense Key : Illegal Request [current] 
Aug 08 15:35:42 archlinux kernel: sd 0:0:0:0: [sda] tag#27 Add. Sense: Unaligned write command
Aug 08 15:35:42 archlinux kernel: sd 0:0:0:0: [sda] tag#27 CDB: Read(10) 28 00 04 a6 2b e0 00 01 00 00
Aug 08 15:35:42 archlinux kernel: I/O error, dev sda, sector 77999072 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 0

Offline

#11 2022-09-02 17:41:49

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

I don't even think I had arch linux at 8 Aug.
last | greep boot

reboot   system boot  5.19.5-arch1-1   Fri Sep  2 17:14   still running
reboot   system boot  5.19.5-arch1-1   Mon Aug  8 15:35 - 00:17 (22+08:42)
reboot   system boot  5.19.5-arch1-1   Mon Aug  8 15:35 - 00:17 (22+08:42)
reboot   system boot  5.19.5-arch1-1   Mon Aug  8 15:35 - 01:07 (21+09:32)
reboot   system boot  5.19.4-arch1-1   Mon Aug  8 15:35 - 00:20 (21+08:45)
reboot   system boot  5.19.4-arch1-1   Mon Aug 29 21:10 - 00:20  (03:10)
reboot   system boot  5.19.4-arch1-1   Mon Aug 29 20:38 - 21:10  (00:31)
reboot   system boot  5.19.4-arch1-1   Mon Aug 29 20:12 - 20:16  (00:03)
reboot   system boot  5.19.4-arch1-1   Mon Aug 29 15:56 - 20:16  (04:19)
reboot   system boot  5.19.4-arch1-1   Sun Aug 28 01:03 - 03:31  (02:27)
reboot   system boot  5.19.3-arch1-1   Sat Aug 27 21:56 - 01:03  (03:07)
reboot   system boot  5.19.3-arch1-1   Fri Aug 26 22:57 - 21:14  (-1:42)
reboot   system boot  5.19.3-arch1-1   Thu Aug 25 20:25 - 01:04  (04:38)
reboot   system boot  5.19.3-arch1-1   Thu Aug 25 15:46 - 20:11  (04:25)
reboot   system boot  5.19.3-arch1-1   Mon Aug  8 15:35 - 01:56 (16+10:20)
reboot   system boot  5.19.3-arch1-1   Wed Aug 24 20:20 - 01:56  (05:36)
reboot   system boot  5.19.3-arch1-1   Mon Aug  8 15:35 - 20:19 (16+04:44)
reboot   system boot  5.19.3-arch1-1   Wed Aug 24 17:53 - 19:42  (01:48)
reboot   system boot  5.19.3-arch1-1   Tue Aug 23 23:24 - 03:27  (04:02)
reboot   system boot  5.19.3-arch1-1   Tue Aug 23 18:46 - 23:23  (04:37)
reboot   system boot  5.19.3-arch1-1   Tue Aug 23 18:38 - 18:46  (00:08)
reboot   system boot  5.19.3-arch1-1   Tue Aug 23 18:32 - 18:37  (00:05)
reboot   system boot  5.19.3-arch1-1   Tue Aug 23 18:20 - 18:32  (00:11)
reboot   system boot  5.19.3-arch1-1   Tue Aug 23 17:37 - 18:20  (00:42)
reboot   system boot  5.19.3-arch1-1   Tue Aug 23 17:10 - 17:37  (00:27)
reboot   system boot  5.19.3-arch1-1   Tue Aug 23 16:54 - 17:09  (00:15)

Offline

#12 2022-09-02 18:50:43

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

seth wrote:
seth wrote:

check the system journal (or dmesg if you haven't rebooted) for the nature of the IO errors (could be the bus or link, aka "cable")

But also run an extended self-test.

I've runned it.
Results:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      5528         -
# 2  Conveyance offline  Completed without error       00%      5512         -
# 3  Short offline       Completed without error       00%      5512         -

Offline

#13 2022-09-02 20:39:50

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,492

Re: I/O error out of the blue

Aug 08 15:35:40 archlinux kernel: ata1: SError: { Proto HostInt CommWake 10B8B }
Aug 08 15:35:40 archlinux kernel: ata1.00: cmd 60/00:88:e0:d4:a5/02:00:04:00:00/40 tag 17 ncq dma 262144 in
                                           res 40/00:98:58:1b:82/00:00:1c:00:00/40 Emask 0x52 (ATA bus error)

Drive doesn't wake up, errors are on the bus (so no surprise wrt. the smart test)

Model Family:     deleted
Device Model:     deleted
Serial Number:    deleted
LU WWN Device Id: deleted
Firmware Version: deleted

wtf? Did you somehow edit this?
What kind of drive is this?

I was AFK, with open firefox tabs. I came back

Was the problem a one-off or do you keep hitting these errors?
Try

hdparm -S 0 /dev/sda

to (hopefully) prevent the drive from going into standby.

Online

#14 2022-09-02 21:20:39

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

I just removed hdd name and versions myself.

It was a one-off. It didn't happen after that idk why.
Talking about being AFK, i left PC 30-50 minutes multiple times before it happened.

I have tlp installed. Can it be about tlp?

Last edited by lnumines (2022-09-02 21:25:40)

Offline

#15 2022-09-02 21:22:39

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

Is hdparm command persisent?

Offline

#16 2022-09-02 21:29:45

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,492

Re: I/O error out of the blue

The actual model could be interseting (check Schard's profile …)
No idea about TLP, it's gonna be hard to tell what caused the wakeup failure.
hdparm isn't persistent, it might not even survive an S3

Online

#17 2022-09-02 21:32:00

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

I can give the model, it's ST500LT012-1DG142.

Offline

#18 2022-09-03 06:58:34

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,492

Re: I/O error out of the blue

Nope, that's a spinner.
It might have simply failed to spin up beacause of undervoltage - did this happen on (low) battery or external supply?

Online

#19 2022-09-03 12:19:54

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

seth wrote:

Nope, that's a spinner.
It might have simply failed to spin up beacause of undervoltage - did this happen on (low) battery or external supply?

I don't remember tbh. But probably it was plugged in.

smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Laptop HDD
Device Model:     ST500LT012-1DG142
Serial Number:    S9J23HGA
LU WWN Device Id: 5 000c50 08a7a8062
Firmware Version: 0003SDM1
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database 7.3/5319
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Sep  3 15:19:01 2022 +03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Last edited by lnumines (2022-09-04 13:01:01)

Offline

#20 2022-09-03 15:34:38

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

I can provide more details if needed.

Offline

#21 2022-09-03 15:53:53

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,492

Re: I/O error out of the blue

What details on what?

You posted an edited smartctl before and now the unaltered head of one.
We know that there've been a bunch of errors, but those date back ~2 weeks *uptime* .

Some incident was, according to you, a one-off and according to the log a failure to wake the device.
The journal may ahead of the commwake error have indications of what lead to it - or not.

For the filesystem/journal corruption you'll have hopefully run fsck?
The weird dates could be either due to the corruption or because the system/hw clock isn't reliable and you rely on ntp to correct it (are there major drifts in the journal of the running boot)?

If you want to test the drive on its reliablity, see https://wiki.archlinux.org/title/Badblocks (the non-destructive test is gonna take a while)

Online

#22 2022-09-03 16:14:56

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

seth wrote:

What details on what?

You posted an edited smartctl before and now the unaltered head of one.
We know that there've been a bunch of errors, but those date back ~2 weeks *uptime* .

Some incident was, according to you, a one-off and according to the log a failure to wake the device.
The journal may ahead of the commwake error have indications of what lead to it - or not.

For the filesystem/journal corruption you'll have hopefully run fsck?
The weird dates could be either due to the corruption or because the system/hw clock isn't reliable and you rely on ntp to correct it (are there major drifts in the journal of the running boot)?

If you want to test the drive on its reliablity, see https://wiki.archlinux.org/title/Badblocks (the non-destructive test is gonna take a while)

I didn't run any test after that so i only posted the head of the command, you can look at my previous comment there is no difference. Sorry.

I don't think there was a corruption before it happened. But dates are weird. I wasn't even using Arch at 8 Aug.

There is no major drift in journal I looked at current boot.

Before the CommWake error there is:

Aug 08 15:35:25 archlinux systemd[1]: Finished Load Kernel Module drm.
Aug 08 15:35:25 archlinux kernel: audit: type=1130 audit(1659962125.539:5): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=modprobe@drm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 08 15:35:25 archlinux kernel: audit: type=1131 audit(1659962125.539:6): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=modprobe@drm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 08 15:35:25 archlinux systemd[1]: Mounting Kernel Configuration File System...
Aug 08 15:35:25 archlinux systemd[1]: Mounted Kernel Configuration File System.
Aug 08 15:35:25 archlinux systemd[1]: Started Journal Service.
Aug 08 15:35:25 archlinux kernel: audit: type=1130 audit(1659962125.563:7): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 08 15:35:25 archlinux kernel: fuse: init (API version 7.36)
Aug 08 15:35:25 archlinux kernel: audit: type=1130 audit(1659962125.669:8): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=modprobe@fuse comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 08 15:35:25 archlinux kernel: audit: type=1131 audit(1659962125.669:9): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=modprobe@fuse comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 08 15:35:25 archlinux kernel: EXT4-fs (sda3): re-mounted. Quota mode: none.
Aug 08 15:35:25 archlinux kernel: audit: type=1130 audit(1659962125.686:10): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-remount-fs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 08 15:35:25 archlinux kernel: audit: type=1130 audit(1659962125.803:11): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udev-trigger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 08 15:35:40 archlinux systemd-journald[259]: Received client request to flush runtime journal.
Aug 08 15:35:40 archlinux systemd-journald[259]: File /var/log/journal/6232e0d0bfff4cb2aedfadc862ccb636/system.journal corrupted or uncleanly shut down, renaming and replacing.

After this, errors come.

Offline

#23 2022-09-03 17:41:47

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

Actually If it's a HDD failure. I don't care. I won't troubleshoot it. I'll just swap it with a good HDD or SSD.
But if it's a software, hardware or driver problem then let me know. I don't want to happen it again with another disk.

Offline

#24 2022-09-03 20:16:34

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,492

Re: I/O error out of the blue

But if it's a software, hardware or driver problem then let me know.

Aug 08 15:35:40 archlinux systemd-journald[259]: File /var/log/journal/6232e0d0bfff4cb2aedfadc862ccb636/system.journal corrupted or uncleanly shut down, renaming and replacing.

Apparently there seems to have been "some" incidedent before that commwake issue - you may need to look at older journal entries forpatterns of IO issues or filesystem coruption.

seth wrote:

If you want to test the drive on its reliablity, see https://wiki.archlinux.org/title/Badblocks (the non-destructive test is gonna take a while)

Other than that,

187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       494

is so far the only sign of troubles w/ the drive and if the bus error is a HW issue, it's not because of the drive.

Online

#25 2022-09-03 21:35:29

lnumines
Member
Registered: 2022-09-02
Posts: 19

Re: I/O error out of the blue

I tested it with badblocks and it gave me tons of CommWake errors, i couldn't even terminate the process with Ctrl-C. So this means it's a HDD failure right?

Offline

Board footer

Powered by FluxBB