You are not logged in.

#1 2021-12-26 20:45:06

Silenzium
Member
From: Germany
Registered: 2011-05-03
Posts: 32

[SOLVED]READ FPDMA QUEUED & Reallocated_Sector_Ct++ - New HDD failing?

Hey there,

I just upgraded my NAS with larger HDDs (WD Red 4 TB to Seagate Ironwolf 12 TB). After cloning one of them with Clonezilla, I got this error after boot:

[  154.812778] ata4.00: invalid checksum 0x2 on log page 10h
[  154.812790] ata4: log page 10h reported inactive tag 0
[  154.813097] ata4.00: exception Emask 0x1 SAct 0x80000006 SErr 0x0 action 0x0
[  154.813412] ata4.00: irq_stat 0x40000008
[  154.813604] ata4.00: failed command: READ FPDMA QUEUED
[  154.813861] ata4.00: cmd 60/00:08:20:26:73/01:00:09:00:00/40 tag 1 ncq dma 131072 in
                        res 40/00:10:00:69:00/00:00:e3:01:00/40 Emask 0x1 (device error)
[  154.814521] ata4.00: status: { DRDY }
[  154.814692] ata4.00: failed command: WRITE FPDMA QUEUED
[  154.814943] ata4.00: cmd 61/40:10:00:69:00/05:00:e3:01:00/40 tag 2 ncq dma 688128 out
                        res 40/00:10:00:69:00/00:00:e3:01:00/40 Emask 0x1 (device error)
[  154.815731] ata4.00: status: { DRDY }
[  154.815935] ata4.00: failed command: READ FPDMA QUEUED
[  154.816187] ata4.00: cmd 60/00:f8:20:25:73/01:00:09:00:00/40 tag 31 ncq dma 131072 in
                        res 40/00:10:00:69:00/00:00:e3:01:00/40 Emask 0x1 (device error)
[  154.816824] ata4.00: status: { DRDY }
[  154.925135] ata4.00: configured for UDMA/133
[  154.925221] sd 3:0:0:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  154.925238] sd 3:0:0:0: [sdc] tag#1 Sense Key : Illegal Request [current] 
[  154.925253] sd 3:0:0:0: [sdc] tag#1 Add. Sense: Unaligned write command
[  154.925269] sd 3:0:0:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 00 09 73 26 20 00 00 01 00 00 00
[  154.925280] print_req_error: I/O error, dev sdc, sector 158541344
[  154.925697] sd 3:0:0:0: [sdc] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  154.925713] sd 3:0:0:0: [sdc] tag#31 Sense Key : Illegal Request [current] 
[  154.925727] sd 3:0:0:0: [sdc] tag#31 Add. Sense: Unaligned write command
[  154.925742] sd 3:0:0:0: [sdc] tag#31 CDB: Read(16) 88 00 00 00 00 00 09 73 25 20 00 00 01 00 00 00
[  154.925751] print_req_error: I/O error, dev sdc, sector 158541088
[  154.926069] ata4: EH complete

SMART data:

# smartctl -a /dev/sdc
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-18-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST12000VN0008-2PH103
Serial Number:    XXX
LU WWN Device Id: 5 000c50 0c9110c5d
Firmware Version: SC61
User Capacity:    12.000.138.625.024 bytes [12,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 26 21:35:45 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1104) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   063   061   044    Pre-fail  Always       -       59411605
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       4
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       128
  7 Seek_Error_Rate         0x000f   063   060   045    Pre-fail  Always       -       1990119
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       4
 18 Head_Health             0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   099   099   000    Old_age   Always       -       1
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   058   049   040    Old_age   Always       -       42 (Min/Max 25/42)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       31
194 Temperature_Celsius     0x0022   042   049   000    Old_age   Always       -       42 (0 25 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       8
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Pressure_Limit          0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       10h+11m+52.737s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       7845334924
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       32204736

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 18 hours (0 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 e0 25 73 09  Error: WP at LBA = 0x097325e0 = 158541280

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 40 ff ff ff 4f 00      00:15:00.212  WRITE FPDMA QUEUED
  60 00 00 20 26 73 49 00      00:14:53.918  READ FPDMA QUEUED
  60 00 00 20 25 73 49 00      00:14:53.913  READ FPDMA QUEUED
  61 00 40 ff ff ff 4f 00      00:14:51.855  WRITE FPDMA QUEUED
  61 00 40 ff ff ff 4f 00      00:14:51.855  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        18         -
# 2  Extended offline    Aborted by host               90%        18         -
# 3  Short offline       Completed without error       00%        18         -
# 4  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Reallocated_Sector_Ct doesn't look good to me. Furthermore, it's more noisy than before. Is my brand new hard drive already failing?



EDIT: Figures are getting worse after switching the SATA cable:

# smartctl -a /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-18-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST12000VN0008-2PH103
Serial Number:    XXX
LU WWN Device Id: 5 000c50 0c9110c5d
Firmware Version: SC61
User Capacity:    12.000.138.625.024 bytes [12,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 26 22:15:09 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1104) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   063   061   044    Pre-fail  Always       -       66405997
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       6
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       408
  7 Seek_Error_Rate         0x000f   063   060   045    Pre-fail  Always       -       2007495
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       19
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       5
 18 Head_Health             0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   099   099   000    Old_age   Always       -       1
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   049   040    Old_age   Always       -       34 (Min/Max 32/34)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       33
194 Temperature_Celsius     0x0022   034   049   000    Old_age   Always       -       34 (0 25 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       8
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Pressure_Limit          0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       10h+27m+34.865s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       7852313820
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       32220232

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 18 hours (0 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 e0 25 73 09  Error: WP at LBA = 0x097325e0 = 158541280

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 40 ff ff ff 4f 00      00:15:00.212  WRITE FPDMA QUEUED
  60 00 00 20 26 73 49 00      00:14:53.918  READ FPDMA QUEUED
  60 00 00 20 25 73 49 00      00:14:53.913  READ FPDMA QUEUED
  61 00 40 ff ff ff 4f 00      00:14:51.855  WRITE FPDMA QUEUED
  61 00 40 ff ff ff 4f 00      00:14:51.855  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        19         -
# 2  Short offline       Completed without error       00%        18         -
# 3  Extended offline    Aborted by host               90%        18         -
# 4  Short offline       Completed without error       00%        18         -
# 5  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Last edited by Silenzium (2021-12-27 09:09:52)

Offline

#2 2021-12-26 22:49:38

arch_jsb
Member
Registered: 2010-03-13
Posts: 28

Re: [SOLVED]READ FPDMA QUEUED & Reallocated_Sector_Ct++ - New HDD failing?

Your guess is correct. That count going up means the disk is not safe for data whatsoever from now on.

Offline

#3 2021-12-27 09:09:22

Silenzium
Member
From: Germany
Registered: 2011-05-03
Posts: 32

Re: [SOLVED]READ FPDMA QUEUED & Reallocated_Sector_Ct++ - New HDD failing?

Reallocated_Sector_Ct is now at 904. I will return that HDD.

Offline

Board footer

Powered by FluxBB