You are not logged in.

#1 2015-12-06 13:24:15

Ba7a7chy
Member
Registered: 2013-05-04
Posts: 45

HD Failures [failed command: WRITE FPDMA QUEUED] [Solved]

I started getting HD failures lately, When getting them the whole /home partition (on /dev/sdb*) became Read Only until restart, the, after restart I would have to manually run fsck on /dev/sdb1 or the system wont boot.
I thought it was due to the HD it self, but, after changing the HD I still get the same issues.
the issues come and go but are most prominent when the HD does heavy IO jobs.

Notes:

* Using Lenovo T430 laptop
* Two HD (/dev/sda -- SSD, /dev/sdb -- Mechanical 1TB )
* The issues are only on /dev/sdb

HD Info for /dev/sdb

/dev/sdb:

ATA device, with non-removable media
	Model Number:       WDC WD10JPVX-22JC3T0                    
	Serial Number:      WD-WXG1E6534FAD
	Firmware Revision:  01.01A01
	Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
	Supported: 9 8 7 6 5 
	Likely used: 9
Configuration:
	Logical		max	current
	cylinders	16383	65535
	heads		16	1
	sectors/track	63	63
	--
	CHS current addressable sectors:    4128705
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors: 1953525168
	Logical  Sector size:                   512 bytes
	Physical Sector size:                  4096 bytes
	Logical Sector-0 offset:                  0 bytes
	device size with M = 1024*1024:      953869 MBytes
	device size with M = 1000*1000:     1000204 MBytes (1000 GB)
	cache/buffer size  = 8192 KBytes
	Nominal Media Rotation Rate: 5400
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, with device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 16
	Advanced power management level: 96
	DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	   *	Advanced Power Management feature set
	    	Power-Up In Standby feature set
	   *	SET_FEATURES required to spinup after power up
	    	SET_MAX security extension
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	64-bit World wide name
	   *	IDLE_IMMEDIATE with UNLOAD
	   *	{READ,WRITE}_DMA_EXT_GPL commands
	   *	Segmented DOWNLOAD_MICROCODE
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Gen3 signaling speed (6.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Host-initiated interface power management
	   *	Phy event counters
	   *	Idle-Unload when NCQ is active
	   *	NCQ priority information
	   *	Host automatic Partial to Slumber transitions
	   *	Device automatic Partial to Slumber transitions
	   *	READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
	   *	DMA Setup Auto-Activate optimization
	    	Device-initiated interface power management
	   *	Software settings preservation
	   *	SMART Command Transport (SCT) feature set
	   *	SCT Write Same (AC2)
	   *	SCT Features Control (AC4)
	   *	SCT Data Tables (AC5)
	    	unknown 206[12] (vendor specific)
	    	unknown 206[13] (vendor specific)
	    	unknown 206[14] (vendor specific)
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
		supported: enhanced erase
	182min for SECURITY ERASE UNIT. 182min for ENHANCED SECURITY ERASE UNIT. 
Logical Unit WWN Device Identifier: 50014ee65b80466a
	NAA		: 5
	IEEE OUI	: 0014ee
	Unique ID	: 65b80466a
Checksum: correct

smartctl --all /dev/sdb

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue Mobile
Device Model:     WDC WD10JPVX-22JC3T0
Serial Number:    WD-WXG1E6534FAD
LU WWN Device Id: 5 0014ee 65b80466a
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Sun Dec  6 15:27:28 2015 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(17040) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 191) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x7035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   185   178   021    Pre-fail  Always       -       1733
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3359
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       152
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       34
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   197   197   000    Old_age   Always       -       11949
194 Temperature_Celsius     0x0022   117   101   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       5
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

In the journalctl -f the errors looks like this:

Dec 06 15:12:21 ano kernel: ata2.00: exception Emask 0x10 SAct 0x20000000 SErr 0x400100 action 0x6 frozen
Dec 06 15:12:21 ano kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Dec 06 15:12:21 ano kernel: ata2: SError: { UnrecovData Handshk }
Dec 06 15:12:21 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:21 ano kernel: ata2.00: cmd 61/40:e8:78:5e:04/00:00:3a:00:00/40 tag 29 ncq 32768 out
                                     res 40/00:ec:78:5e:04/00:00:3a:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:21 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:21 ano kernel: ata2: hard resetting link
Dec 06 15:12:22 ano kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 06 15:12:22 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:22 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:22 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:22 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:22 ano kernel: ata2.00: configured for UDMA/33
Dec 06 15:12:22 ano kernel: ata2: EH complete
Dec 06 15:12:43 ano kernel: ata2.00: exception Emask 0x10 SAct 0x200 SErr 0x400100 action 0x6 frozen
Dec 06 15:12:43 ano kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Dec 06 15:12:43 ano kernel: ata2: SError: { UnrecovData Handshk }
Dec 06 15:12:43 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:43 ano kernel: ata2.00: cmd 61/80:48:88:69:89/00:00:4e:00:00/40 tag 9 ncq 65536 out
                                     res 40/00:4c:88:69:89/00:00:4e:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:43 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:43 ano kernel: ata2: hard resetting link
Dec 06 15:12:44 ano kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 06 15:12:44 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:44 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:44 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:44 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:44 ano kernel: ata2.00: configured for UDMA/33
Dec 06 15:12:44 ano kernel: ata2: EH complete
Dec 06 15:12:52 ano kernel: ata2.00: exception Emask 0x10 SAct 0x7f8003ff SErr 0x400100 action 0x6 frozen
Dec 06 15:12:52 ano kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Dec 06 15:12:52 ano kernel: ata2: SError: { UnrecovData Handshk }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:00:a0:0b:40/00:00:4f:00:00/40 tag 0 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:08:b0:0f:40/00:00:4f:00:00/40 tag 1 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:10:b8:0f:41/00:00:4f:00:00/40 tag 2 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/10:18:20:08:00/00:00:00:00:00/40 tag 3 ncq 8192 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:20:10:08:00/00:00:00:00:00/40 tag 4 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:28:00:08:00/00:00:00:00:00/40 tag 5 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:30:40:08:00/00:00:00:00:00/40 tag 6 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:38:f0:08:00/00:00:00:00:00/40 tag 7 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:40:10:09:00/00:00:00:00:00/40 tag 8 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:48:38:09:00/00:00:00:00:00/40 tag 9 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:b8:80:08:c0/00:00:43:00:00/40 tag 23 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:c0:78:09:c0/00:00:43:00:00/40 tag 24 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:c8:08:08:00/00:00:4d:00:00/40 tag 25 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:d0:30:09:00/00:00:4d:00:00/40 tag 26 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:d8:10:08:80/00:00:4e:00:00/40 tag 27 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:e0:08:08:40/00:00:4f:00:00/40 tag 28 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:e8:80:08:40/00:00:4f:00:00/40 tag 29 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:f0:c8:09:40/00:00:4f:00:00/40 tag 30 ncq 4096 out
                                     res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2: hard resetting link
Dec 06 15:12:52 ano kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 06 15:12:52 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:52 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:52 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:52 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:52 ano kernel: ata2.00: configured for UDMA/33
Dec 06 15:12:52 ano kernel: ata2: EH complete

What should I do ? how should I debug this?

Last edited by Ba7a7chy (2015-12-08 09:46:48)

Offline

#2 2015-12-06 22:36:17

berny99
Member
From: Canary Islands (Spain)
Registered: 2010-10-05
Posts: 18

Re: HD Failures [failed command: WRITE FPDMA QUEUED] [Solved]

Hi, check with dmesg if you are using SWIOTLB software bounce buffering for PCI-DMA. If you are, try if putting swiotlb=131072 on the grub kernel command line get rid of your errors.
If not your HD might be failing. To test if the problem is really only on the second HD try bonnie++ also on the first HD and see what happen.
Cheers :-)

Offline

#3 2015-12-08 09:43:38

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: HD Failures [failed command: WRITE FPDMA QUEUED] [Solved]

Looks like some kind of SATA communication error.

If the errors stop repeating after downshift to 1.5G

Dec 06 15:12:52 ano kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

then setting speed to 1.5G on sdb with libata.force should get rid of them for good.

Offline

#4 2015-12-08 09:46:17

Ba7a7chy
Member
Registered: 2013-05-04
Posts: 45

Re: HD Failures [failed command: WRITE FPDMA QUEUED] [Solved]

Apparently the issue was a malfunctioning caddy (2nd HD tray) after changing it all is working good.

Thanks for helping out smile

Offline

Board footer

Powered by FluxBB