You are not logged in.
I started getting HD failures lately, When getting them the whole /home partition (on /dev/sdb*) became Read Only until restart, the, after restart I would have to manually run fsck on /dev/sdb1 or the system wont boot.
I thought it was due to the HD it self, but, after changing the HD I still get the same issues.
the issues come and go but are most prominent when the HD does heavy IO jobs.
Notes:
* Using Lenovo T430 laptop
* Two HD (/dev/sda -- SSD, /dev/sdb -- Mechanical 1TB )
* The issues are only on /dev/sdb
HD Info for /dev/sdb
/dev/sdb:
ATA device, with non-removable media
Model Number: WDC WD10JPVX-22JC3T0
Serial Number: WD-WXG1E6534FAD
Firmware Revision: 01.01A01
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Supported: 9 8 7 6 5
Likely used: 9
Configuration:
Logical max current
cylinders 16383 65535
heads 16 1
sectors/track 63 63
--
CHS current addressable sectors: 4128705
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1953525168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 953869 MBytes
device size with M = 1000*1000: 1000204 MBytes (1000 GB)
cache/buffer size = 8192 KBytes
Nominal Media Rotation Rate: 5400
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 96
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* Idle-Unload when NCQ is active
* NCQ priority information
* Host automatic Partial to Slumber transitions
* Device automatic Partial to Slumber transitions
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
unknown 206[14] (vendor specific)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
182min for SECURITY ERASE UNIT. 182min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee65b80466a
NAA : 5
IEEE OUI : 0014ee
Unique ID : 65b80466a
Checksum: correctsmartctl --all /dev/sdb
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Blue Mobile
Device Model: WDC WD10JPVX-22JC3T0
Serial Number: WD-WXG1E6534FAD
LU WWN Device Id: 5 0014ee 65b80466a
Firmware Version: 01.01A01
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is: Sun Dec 6 15:27:28 2015 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (17040) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 191) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 185 178 021 Pre-fail Always - 1733
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3359
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 152
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 34
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4
193 Load_Cycle_Count 0x0032 197 197 000 Old_age Always - 11949
194 Temperature_Celsius 0x0022 117 101 000 Old_age Always - 30
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 5
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.In the journalctl -f the errors looks like this:
Dec 06 15:12:21 ano kernel: ata2.00: exception Emask 0x10 SAct 0x20000000 SErr 0x400100 action 0x6 frozen
Dec 06 15:12:21 ano kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Dec 06 15:12:21 ano kernel: ata2: SError: { UnrecovData Handshk }
Dec 06 15:12:21 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:21 ano kernel: ata2.00: cmd 61/40:e8:78:5e:04/00:00:3a:00:00/40 tag 29 ncq 32768 out
res 40/00:ec:78:5e:04/00:00:3a:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:21 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:21 ano kernel: ata2: hard resetting link
Dec 06 15:12:22 ano kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 06 15:12:22 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:22 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:22 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:22 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:22 ano kernel: ata2.00: configured for UDMA/33
Dec 06 15:12:22 ano kernel: ata2: EH complete
Dec 06 15:12:43 ano kernel: ata2.00: exception Emask 0x10 SAct 0x200 SErr 0x400100 action 0x6 frozen
Dec 06 15:12:43 ano kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Dec 06 15:12:43 ano kernel: ata2: SError: { UnrecovData Handshk }
Dec 06 15:12:43 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:43 ano kernel: ata2.00: cmd 61/80:48:88:69:89/00:00:4e:00:00/40 tag 9 ncq 65536 out
res 40/00:4c:88:69:89/00:00:4e:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:43 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:43 ano kernel: ata2: hard resetting link
Dec 06 15:12:44 ano kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 06 15:12:44 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:44 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:44 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:44 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:44 ano kernel: ata2.00: configured for UDMA/33
Dec 06 15:12:44 ano kernel: ata2: EH complete
Dec 06 15:12:52 ano kernel: ata2.00: exception Emask 0x10 SAct 0x7f8003ff SErr 0x400100 action 0x6 frozen
Dec 06 15:12:52 ano kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Dec 06 15:12:52 ano kernel: ata2: SError: { UnrecovData Handshk }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:00:a0:0b:40/00:00:4f:00:00/40 tag 0 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:08:b0:0f:40/00:00:4f:00:00/40 tag 1 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:10:b8:0f:41/00:00:4f:00:00/40 tag 2 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/10:18:20:08:00/00:00:00:00:00/40 tag 3 ncq 8192 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:20:10:08:00/00:00:00:00:00/40 tag 4 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:28:00:08:00/00:00:00:00:00/40 tag 5 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:30:40:08:00/00:00:00:00:00/40 tag 6 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:38:f0:08:00/00:00:00:00:00/40 tag 7 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:40:10:09:00/00:00:00:00:00/40 tag 8 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:48:38:09:00/00:00:00:00:00/40 tag 9 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:b8:80:08:c0/00:00:43:00:00/40 tag 23 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:c0:78:09:c0/00:00:43:00:00/40 tag 24 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:c8:08:08:00/00:00:4d:00:00/40 tag 25 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:d0:30:09:00/00:00:4d:00:00/40 tag 26 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:d8:10:08:80/00:00:4e:00:00/40 tag 27 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:e0:08:08:40/00:00:4f:00:00/40 tag 28 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:e8:80:08:40/00:00:4f:00:00/40 tag 29 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Dec 06 15:12:52 ano kernel: ata2.00: cmd 61/08:f0:c8:09:40/00:00:4f:00:00/40 tag 30 ncq 4096 out
res 40/00:4c:38:09:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 06 15:12:52 ano kernel: ata2.00: status: { DRDY }
Dec 06 15:12:52 ano kernel: ata2: hard resetting link
Dec 06 15:12:52 ano kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 06 15:12:52 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:52 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:52 ano kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Dec 06 15:12:52 ano kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Dec 06 15:12:52 ano kernel: ata2.00: configured for UDMA/33
Dec 06 15:12:52 ano kernel: ata2: EH completeWhat should I do ? how should I debug this?
Last edited by Ba7a7chy (2015-12-08 09:46:48)
Offline
Hi, check with dmesg if you are using SWIOTLB software bounce buffering for PCI-DMA. If you are, try if putting swiotlb=131072 on the grub kernel command line get rid of your errors.
If not your HD might be failing. To test if the problem is really only on the second HD try bonnie++ also on the first HD and see what happen.
Cheers :-)
Offline
Looks like some kind of SATA communication error.
If the errors stop repeating after downshift to 1.5G
Dec 06 15:12:52 ano kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)then setting speed to 1.5G on sdb with libata.force should get rid of them for good.
Offline
Apparently the issue was a malfunctioning caddy (2nd HD tray) after changing it all is working good.
Thanks for helping out ![]()
Offline