You are not logged in.

#1 2019-01-24 23:25:20

jjb2016
Member
From: Oxfordshire
Registered: 2016-02-29
Posts: 73

UDMA_CRC_Error_Count ... ?

Hi All,

I have a hard drive in my system which appears to have an issue, but I'm not sure what it means. I rebooted the system earlier after an update and noticed these lines during boot ...

[justin@IXTREME ~]$ sudo journalctl -b | grep "I/O"
Jan 24 22:58:19 archlinux kernel: APIC: Switch to symmetric I/O mode setup
Jan 24 22:58:19 archlinux kernel: 00:06: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 120
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 136
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 264
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 72
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 520
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 3907028184
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 3907028200
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 3907028224
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 3907028264
Jan 24 22:58:20 archlinux kernel: print_req_error: I/O error, dev sdd, sector 3907028440
Jan 24 22:58:54 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2080
Jan 24 22:58:54 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2592
Jan 24 22:58:54 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907011616
Jan 24 22:58:54 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907012128
Jan 24 22:58:55 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2080
Jan 24 22:58:55 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2592
Jan 24 22:58:55 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907011616
Jan 24 22:58:55 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907012128
Jan 24 22:58:56 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2080
Jan 24 22:58:56 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2592
Jan 24 22:58:59 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907012128
Jan 24 22:59:00 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2592
Jan 24 22:59:00 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907012128
Jan 24 22:59:00 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2080
Jan 24 22:59:01 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2592
Jan 24 22:59:01 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907011616
Jan 24 22:59:01 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907012128
Jan 24 22:59:01 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2080
Jan 24 22:59:02 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 2592
Jan 24 22:59:02 IXTREME kernel: print_req_error: I/O error, dev sdd, sector 3907011616

and smartctl shows this for the drive in question (/dev/sdd) ...

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.8-arch1-1-custom] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba P300
Device Model:     TOSHIBA HDWD120
Serial Number:    37O718YAS
LU WWN Device Id: 5 000039 fe5df6271
Firmware Version: MX4OACF0
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jan 24 23:07:52 2019 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
					was aborted by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249)	Self-test routine in progress...
					90% of test remaining.
Total time to complete Offline 
data collection: 		(14535) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 243) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   140   140   054    Pre-fail  Offline      -       68
  3 Spin_Up_Time            0x0007   147   147   024    Pre-fail  Always       -       258 (Average 257)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       73
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   124   124   020    Pre-fail  Offline      -       33
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       11061
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       64
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       78
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       78
194 Temperature_Celsius     0x0002   142   142   000    Old_age   Always       -       42 (Min/Max 21/54)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       701

SMART Error Log Version: 1
ATA Error Count: 701 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 701 occurred at disk power-on lifetime: 11061 hours (460 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 10 00 00 00 00   3d+19:07:40.069  READ LOG EXT
  60 e0 00 20 08 00 40 00   3d+19:07:40.065  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+19:07:40.053  READ LOG EXT
  60 e0 00 20 08 00 40 00   3d+19:07:40.015  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+19:07:40.003  READ LOG EXT

Error 700 occurred at disk power-on lifetime: 11061 hours (460 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 b1 4f 08 00 00  Error: ICRC, ABRT at LBA = 0x0000084f = 2127

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 20 08 00 40 00   3d+19:07:40.065  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+19:07:40.053  READ LOG EXT
  60 e0 00 20 08 00 40 00   3d+19:07:40.015  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+19:07:40.003  READ LOG EXT
  60 e0 00 20 08 00 40 00   3d+19:07:39.976  READ FPDMA QUEUED

Error 699 occurred at disk power-on lifetime: 11061 hours (460 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 81 7f 08 00 00  Error: ICRC, ABRT at LBA = 0x0000087f = 2175

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 20 08 00 40 00   3d+19:07:40.015  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+19:07:40.003  READ LOG EXT
  60 e0 00 20 08 00 40 00   3d+19:07:39.976  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+19:07:39.965  READ LOG EXT
  60 e0 00 20 08 00 40 00   3d+19:07:39.965  READ FPDMA QUEUED

Error 698 occurred at disk power-on lifetime: 11061 hours (460 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 81 7f 08 00 00  Error: ICRC, ABRT at LBA = 0x0000087f = 2175

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 20 08 00 40 00   3d+19:07:39.976  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+19:07:39.965  READ LOG EXT
  60 e0 00 20 08 00 40 00   3d+19:07:39.965  READ FPDMA QUEUED
  61 10 00 10 46 e0 40 00   3d+19:07:39.964  WRITE FPDMA QUEUED
  61 10 00 10 0a 00 40 00   3d+19:07:39.937  WRITE FPDMA QUEUED

Error 697 occurred at disk power-on lifetime: 11061 hours (460 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 c1 3f 08 00 00  Error: ICRC, ABRT at LBA = 0x0000083f = 2111

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 20 08 00 40 00   3d+19:07:39.965  READ FPDMA QUEUED
  61 10 00 10 46 e0 40 00   3d+19:07:39.964  WRITE FPDMA QUEUED
  61 10 00 10 0a 00 40 00   3d+19:07:39.937  WRITE FPDMA QUEUED
  60 10 10 10 46 e0 40 00   3d+19:07:39.909  READ FPDMA QUEUED
  61 10 08 10 44 e0 40 00   3d+19:07:39.909  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     11061         -
# 2  Extended offline    Completed without error       00%     10202         -
# 3  Short offline       Completed without error       00%     10195         -
# 4  Short offline       Completed without error       00%     10096         -
# 5  Extended offline    Completed without error       00%     10039         -
# 6  Short offline       Completed without error       00%      2452         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

So I can see the UDMA_CRC_Error_Count of 701 but I don't know what this means or if it's related to the I/O error messages during boot.  SMART self tests are showing up no issues.  Could this caused by a bad data cable?  Anybody got any ideas?  I should probably mention that this drive is attached, along with 5 other drives, to an LSI SAS HBA ...

01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
	Subsystem: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
	Flags: bus master, fast devsel, latency 0, IRQ 47, NUMA node 0
	I/O ports at e000 [size=256]
	Memory at da6c0000 (64-bit, non-prefetchable) [size=16K]
	Memory at da280000 (64-bit, non-prefetchable) [size=256K]
	Expansion ROM at da200000 [disabled] [size=512K]
	Capabilities: [50] Power Management version 3
	Capabilities: [68] Express Endpoint, MSI 00
	Capabilities: [d0] Vital Product Data
	Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [138] Power Budgeting <?>
	Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
	Kernel driver in use: mpt3sas
	Kernel modules: mpt3sas

Any advice would be appreciated. Just need to know if this is more likely a drive fault, or a fault with cable / HBA / something else.

Thanks.

Offline

#2 2019-01-25 08:18:32

seth
Member
Registered: 2012-09-03
Posts: 51,143

Re: UDMA_CRC_Error_Count ... ?

You could run an extended self-test, but usually that's the connection (cable, plugs, bus), yes.

Also the drive is pretty hot, upgrade the cooling if you can.

Offline

#3 2019-01-25 16:14:23

jjb2016
Member
From: Oxfordshire
Registered: 2016-02-29
Posts: 73

Re: UDMA_CRC_Error_Count ... ?

Yep - Extended self-test returns no issues.  I'll have to go in there and check the cables ....
My machine is not ideally places in my home, and the case & cooling needs upgrading for sure ... another job for another time!
Thanks for the reply.

Offline

#4 2019-02-07 20:20:45

jjb2016
Member
From: Oxfordshire
Registered: 2016-02-29
Posts: 73

Re: UDMA_CRC_Error_Count ... ?

Installed new cables - no more I/O error messages during boot and the UDMA_CRC_Error_Count has not gone up again.  Looks like new cables solved the problem - phew.
Does anybody know if the UDMA_CRC_Error_Count can be reset to zero somehow?

Offline

#5 2019-02-07 23:48:04

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: UDMA_CRC_Error_Count ... ?

This isn't normally possible, thought Google returns some instances where particular HDD models can be hacked.
Personally, I just write smartctl output to a file if I need it for future reference to check if the disk isn't deteriorating further.

Offline

Board footer

Powered by FluxBB