You are not logged in.
the issue
I've scoured the internet for a resolution to this, but am coming up empty. Now and then I'm getting freezes on my laptop, resolving in ~10-15sec (estimate). Alt-tab will switch windows, but X doesn't redraw the contents (empty window), conky stops updating, dmesg waits to output until the freeze is done, etc. Here is the exact error, which is important as there are an immense amount of bug reports and posts that don't match this exactly:
[79808.774925] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[79808.774939] ata7.00: failed command: READ DMA EXT
[79808.774943] ata7.00: cmd 25/00:08:98:9f:b3/00:00:1c:00:00/e0 tag 16 dma 4096 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[79808.774945] ata7.00: status: { DRDY }
[79808.774947] ata7: hard resetting link
[79809.092578] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[79809.093340] ata7.00: configured for UDMA/100
[79809.093342] ata7.00: device reported invalid CHS sector 0
[79809.093351] ata7: EH complete
Here's two that happened today, with an additional blk_update_request line:
[95928.148608] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[95928.148612] ata7.00: failed command: READ DMA EXT
[95928.148614] ata7.00: cmd 25/00:08:08:d9:ca/00:00:1c:00:00/e0 tag 13 dma 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[95928.148616] ata7.00: status: { DRDY }
[95928.148618] ata7: hard resetting link
[95928.466272] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[95928.466970] ata7.00: configured for UDMA/100
[95928.466973] ata7.00: device reported invalid CHS sector 0
[95928.466980] sd 6:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[95928.466981] sd 6:0:0:0: [sda] tag#13 Sense Key : 0x5 [current] [descriptor]
[95928.466983] sd 6:0:0:0: [sda] tag#13 ASC=0x21 ASCQ=0x4
[95928.466984] sd 6:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 1c ca d9 08 00 00 08 00
[95928.466985] blk_update_request: I/O error, dev sda, sector 483055880
[95928.466994] ata7: EH complete
[96887.421346] sd 11:0:0:0: [sdc] Synchronizing SCSI cache
[96887.453688] usb 3-6: USB disconnect, device number 34
[102694.211262] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[102694.211266] ata7.00: failed command: WRITE DMA EXT
[102694.211269] ata7.00: cmd 35/00:18:50:6b:0f/00:00:1a:00:00/e0 tag 21 dma 12288 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[102694.211270] ata7.00: status: { DRDY }
[102694.211272] ata7: hard resetting link
[102694.528840] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[102694.529542] ata7.00: configured for UDMA/100
[102694.529544] ata7.00: device reported invalid CHS sector 0
[102694.529553] ata7: EH complete
similar reports
When I search on the various lines above, I often find that one more other lines don't match my errors. Here's a rundown:
| id | matches | | | | suggestions | | | | | | | resolution |
|----+-------------------+---------------+----------+-----------------------------+-------------+-----------+-------+----------+-----------+--------+---------------+------------|
| | failed cmd | res line | status | error | noncq | bad drive | cable | firmware | eth0 down | 3 gb/s | pcie_aspm=off | |
|----+-------------------+---------------+----------+-----------------------------+-------------+-----------+-------+----------+-----------+--------+---------------+------------|
| 1 | READ FPDMA QUEUED | | DRDY | | x | | x | | | | | ? |
| 2 | READ DMA | ATA bus error | DRDY Err | ICRC ABRT | | | | | x | | | eth0 down |
| 3 | READ FPDMA QUEUED | ATA bus error | DRDY Err | ICRC ABRT | x | x | | | | | | ? |
| 4 | FLUSH CACHE EXT | timeout | DRDY | RecovComm Persist PHYRdyChg | | x | | | | | | |
| 5 | READ FPDMA QUEUED | timeout | DRDY | | | x | | | | | | ? |
| 6 | READ DMA EXT | media error | DRDY ERR | UNC | x | | | | | x | | ? |
1. Ask Ubuntu post
2. Arch ata2.00 exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (post is about freezing on shutdown)
3. Arch ATA bus error
4. Arch Short freeze, SATA error, home remounted readonly.
5. Debian bug report;
closed as invalid and probably a hardware issue (as stated by the mod, not the user)
6. Ubuntu bug; no followup to it.
system details
$ lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:16.3 Serial controller: Intel Corporation 8 Series/C220 Series Chipset Family KT Controller (rev 04)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d4)
00:1c.4 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 (rev d4)
00:1c.6 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #7 (rev d4)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation QM87 Express LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation GK106GLM [Quadro K2100M] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)
3b:00.0 SATA controller: Marvell Technology Group Ltd. 88SS9183 PCIe SSD Controller (rev 14)
3c:00.0 PCI bridge: Pericom Semiconductor Device 2404 (rev 05)
3d:01.0 PCI bridge: Pericom Semiconductor Device 2404 (rev 05)
3d:02.0 PCI bridge: Pericom Semiconductor Device 2404 (rev 05)
3d:03.0 PCI bridge: Pericom Semiconductor Device 2404 (rev 05)
3e:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
60:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5249 PCI Express Card Reader (rev 01)
$ lsusb
Bus 002 Device 002: ID 8087:8000 Intel Corp.
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:8008 Intel Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 004: ID 04f2:b477 Chicony Electronics Co., Ltd
Bus 003 Device 042: ID 174c:5106 ASMedia Technology Inc. ASM1051 SATA 3Gb/s bridge
Bus 003 Device 003: ID 138a:003f Validity Sensors, Inc. VFS495 Fingerprint Reader
Bus 003 Device 002: ID 0781:5571 SanDisk Corp. Cruzer Fit
Bus 003 Device 005: ID 8087:07dc Intel Corp.
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
$ lsusb -v -d 174c:5106
Bus 003 Device 042: ID 174c:5106 ASMedia Technology Inc. ASM1051 SATA 3Gb/s bridge
Couldn't open device, some information will be missing
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.10
bDeviceClass 0
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x174c ASMedia Technology Inc.
idProduct 0x5106 ASM1051 SATA 3Gb/s bridge
bcdDevice 80.00
iManufacturer 2
iProduct 3
iSerial 1
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 85
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 80 Bulk-Only
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 1
bNumEndpoints 4
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 98
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Data-in pipe (0x03)
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Data-out pipe (0x04)
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Status pipe (0x02)
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x04 EP 4 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Command pipe (0x01)
$ sudo smartctl -l selftest /dev/sdc
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.5-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1056 -
# 2 Extended offline Aborted by host 90% 568 -
$ sudo smartctl -a /dev/sdc
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.5-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 850 EVO 120GB
Serial Number: S21TNWAG408041F
LU WWN Device Id: 5 002538 d7002782d
Firmware Version: EMT01B6Q
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Nov 11 14:49:24 2015 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 64) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1056
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 245
177 Wear_Leveling_Count 0x0013 098 098 000 Pre-fail Always - 23
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 067 038 000 Old_age Always - 33
195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 69
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 2668294555
SMART Error Log Version: 1
No Errors Logged
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1056 -
# 2 Extended offline Aborted by host 90% 568 -
Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum.
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
255 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I'm a little perplexed by it picking up an ASMedia 1051 as I specifically bought this drive due to reports of it using a UAS-capabile ASMedia chip, the 1153E (example, another example).
How might I further diagnose these errors?
Last edited by jwhendy (2015-11-12 19:38:15)
Offline
Try adding in your bootloader cmdline:
libata.force=1.5Gbps
Then reboot, and check that it takes effect by looking at:
cat /proc/cmdline
Offline
@brebs: definitely a solution I've seen proposed. How do I find out what it's currently trying to run at? I've commonly seen people suggest running at 3Gbps, so if my system is trying to do 6, I'd much rather try 3 vs. stepping all the way down to 1.5...
For example, dmesg reports:
[102694.528840] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
But lsusb says:
Bus 003 Device 042: ID 174c:5106 ASMedia Technology Inc. ASM1051 SATA 3Gb/s bridge
While a working system works... as an analytical type, would it be possible to understand why this might solve the issue? For example, what's happening at the current speed that causes hangs? And why might lsusb say 3Gbps when the drive/enclosure is supposed to be 6Gbps capable? Again, I'll definitely try this, but understanding why I need to artificially slow things down would calm my curiosity Feel free to post a link or other information; I'm happy to read up on other information regarding this solution.
Offline
Are you sure ata7 is this bridge? Could you run something like dmesg |egrep '(ata)|(sd[a-z])'? One can get lost figuring out which disk is which...
Offline
I'm not sure why - some hardware mis-compatibility bug, I suppose. Maybe bad cable quality also.
If I were you I'd just try 1.5Gbps first, and see if that's stable. It settled things down on an older system for me.
This is 150 megabytes per second, which is more than a rotational drive will be physically capable of anyway.
Offline
@mich41: holy crap! Wow, I never figured that it might not be the external USB drive... I get a ton of output from your command, but here it is (truncated a bit as I figure the unique drives involved should all be listed by that point?):
$ dmesg |egrep '(ata)|(sd[a-z])'
[ 0.000000] Command line: BOOT_IMAGE=../vmlinuz-linux root=/dev/mapper/root cryptdevice=UUID=5efd2b85-7d45-46f4-8407-1f08cca9847f:root:allow-discards crypto=sha512:aes-xts-plain64:512:: libata.force=noncq rw initrd=../intel-ucode.img,../initramfs-linux.img
[ 0.000000] BIOS-e820: [mem 0x000000003bf7f000-0x000000003bffefff] ACPI data
[ 0.000000] ACPI: SSDT 0x000000003BFCC000 00042C (v01 HPQOEM SataAhci 00001000 INTL 20130927)
[ 0.000000] ACPI: SSDT 0x000000003BFC0000 000B8F (v01 CpuRef CpuSsdt 00003000 INTL 20130927)
[ 0.000000] ACPI: SSDT 0x000000003BFBE000 000913 (v01 SaSsdt SaSsdt 00003000 INTL 20130927)
[ 0.000000] Kernel command line: BOOT_IMAGE=../vmlinuz-linux root=/dev/mapper/root cryptdevice=UUID=5efd2b85-7d45-46f4-8407-1f08cca9847f:root:allow-discards crypto=sha512:aes-xts-plain64:512:: libata.force=noncq rw initrd=../intel-ucode.img,../initramfs-linux.img
[ 0.000000] Memory: 24334700K/24814704K available (5611K kernel code, 922K rwdata, 1768K rodata, 1180K init, 1152K bss, 480004K reserved, 0K cma-reserved)
[ 4.153620] ACPI : EC: GPE = 0x16, I/O: command/status = 0x66, data = 0x62
[ 4.342977] Write protecting the kernel read-only data: 8192k
[ 5.558738] libata version 3.00 loaded.
[ 5.591662] ata1: DUMMY
[ 5.591663] ata2: DUMMY
[ 5.591666] ata3: SATA max UDMA/133 abar m2048@0xce934000 port 0xce934200 irq 36
[ 5.591669] ata4: SATA max UDMA/133 abar m2048@0xce934000 port 0xce934280 irq 36
[ 5.591669] ata5: DUMMY
[ 5.591670] ata6: DUMMY
[ 5.607445] ata7: SATA max UDMA/133 abar m512@0xce800000 port 0xce800100 irq 37
[ 5.914132] ata3: SATA link down (SStatus 0 SControl 300)
[ 5.927511] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 5.927931] ata7.00: FORCE: horkage modified (noncq)
[ 5.927936] ata7.00: ATA-9: SanDisk SD6PP4M-256G-1006, A200806, max UDMA/100
[ 5.927937] ata7.00: 500118192 sectors, multi 1: LBA48 NCQ (not used)
[ 5.928509] ata7.00: configured for UDMA/100
[ 6.234524] ata4: SATA link down (SStatus 0 SControl 300)
[ 6.236302] sd 6:0:0:0: [sda] 500118192 512-byte logical blocks: (256 GB/238 GiB)
[ 6.236446] sd 6:0:0:0: [sda] Write Protect is off
[ 6.236449] sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 6.236499] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 6.243198] sda: sda1 sda2 sda3
[ 6.243717] sd 6:0:0:0: [sda] Attached SCSI disk
[ 7.119858] sd 7:0:0:0: [sdb] 15633408 512-byte logical blocks: (8.00 GB/7.45 GiB)
[ 7.120645] sd 7:0:0:0: [sdb] Write Protect is off
[ 7.120649] sd 7:0:0:0: [sdb] Mode Sense: 43 00 00 00
[ 7.120889] sd 7:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 7.123090] sdb: sdb1 sdb2
[ 7.124060] sd 7:0:0:0: [sdb] Attached SCSI removable disk
[ 19.745020] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[ 20.002464] systemd[1]: Listening on LVM2 metadata daemon socket.
[ 20.037249] EXT4-fs (dm-0): re-mounted. Opts: data=ordered
[ 6095.278788] sd 6:0:0:0: [sda] Synchronizing SCSI cache
[ 6095.279519] sd 6:0:0:0: [sda] Stopping disk
[ 6096.608907] sd 6:0:0:0: [sda] Starting disk
[ 6096.929252] ata4: SATA link down (SStatus 0 SControl 300)
[ 6096.929254] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 6096.930023] ata7.00: configured for UDMA/100
[ 6096.939246] ata3: SATA link down (SStatus 0 SControl 300)
[ 8276.422910] sd 6:0:0:0: [sda] Synchronizing SCSI cache
[ 8276.423588] sd 6:0:0:0: [sda] Stopping disk
[ 8277.726306] sd 6:0:0:0: [sda] Starting disk
[ 8278.046694] ata4: SATA link down (SStatus 0 SControl 300)
[ 8278.046697] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 8278.047541] ata7.00: configured for UDMA/100
[ 8278.053352] ata3: SATA link down (SStatus 0 SControl 300)
[11642.306662] sd 6:0:0:0: [sda] Synchronizing SCSI cache
[11642.307351] sd 6:0:0:0: [sda] Stopping disk
[11643.640100] sd 6:0:0:0: [sda] Starting disk
[11643.960522] ata4: SATA link down (SStatus 0 SControl 300)
[11643.960525] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[11643.961304] ata7.00: configured for UDMA/100
[11644.030555] ata3: SATA link down (SStatus 0 SControl 300)
[11672.351852] sd 6:0:0:0: [sda] Synchronizing SCSI cache
[11672.352791] sd 6:0:0:0: [sda] Stopping disk
[11673.715876] sd 6:0:0:0: [sda] Starting disk
[11674.029582] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[11674.030412] ata7.00: configured for UDMA/100
[11674.036208] ata4: SATA link down (SStatus 0 SControl 300)
[11674.042886] ata3: SATA link down (SStatus 0 SControl 300)
[11702.655366] sd 6:0:0:0: [sda] Synchronizing SCSI cache
[11702.656075] sd 6:0:0:0: [sda] Stopping disk
[11703.934992] sd 6:0:0:0: [sda] Starting disk
[11704.255365] ata4: SATA link down (SStatus 0 SControl 300)
[11704.255367] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[11704.256178] ata7.00: configured for UDMA/100
[11704.262008] ata3: SATA link down (SStatus 0 SControl 300)
[11732.953097] sd 6:0:0:0: [sda] Synchronizing SCSI cache
[11732.953898] sd 6:0:0:0: [sda] Stopping disk
[11734.214368] sd 6:0:0:0: [sda] Starting disk
[11734.527982] ata4: SATA link down (SStatus 0 SControl 300)
[11734.534674] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[11734.535500] ata7.00: configured for UDMA/100
[11734.541327] ata3: SATA link down (SStatus 0 SControl 300)
I also googled around and found this SO post on how to identify drive numbers. I get this:
$ cat /proc/scsi/scsi
Attached devices:
Host: scsi6 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: SanDisk SD6PP4M- Rev: 806
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: SanDisk Model: Cruzer Fit Rev: 1.27
Type: Direct-Access ANSI SCSI revision: 06
Host: scsi14 Channel: 00 Id: 00 Lun: 00
Vendor: StoreJet Model: Transcend Rev: 0
Type: Direct-Access ANSI SCSI revision: 06
So... does that mean this is actually my little Sandisk Cruzer fit thumb drive causing the issues (i.e. does scsi7 == ata7)?
I use that drive to store a clone of /boot so if I don't have my other external drive I can still boot. It is currently mounted.
Offline
Also, another one-liner in the comments appears to confirm it's /dev/sdb (my thumb drive):
]$ ls -l /sys/block/sd* | sed -e 's@.*-> \.\..*/ata@/ata@' -e 's@/host@ @' -e 's@/target.*/@ @'
/ata7 6 sda
lrwxrwxrwx 1 root root 0 Nov 11 16:30 /sys/block/sdb -> ../devices/pci0000:00/0000:00:14.0/usb3/3-2/3-2:1.0 7 sdb
lrwxrwxrwx 1 root root 0 Nov 11 14:11 /sys/block/sdc -> ../devices/pci0000:00/0000:00:14.0/usb3/3-6/3-6:1.0 14 sdc
Maybe it's due to this drive being USB2; I'll definitely try the 1.5Gbps kernel line and rebooting. I'll probably need a couple of days to verify it's not freezing anymore, at which point I'll post an update. Thanks for the suggestions (and good eyes)!
Offline
[ 5.927936] ata7.00: ATA-9: SanDisk SD6PP4M-256G-1006, A200806, max UDMA/100
Looks like an SSD, that's why it's running at 6G.
And nope, scsi7 needn't be ata7. You can find these mappings in full dmesg, it just seems that my grep removed a bit too much.
Offline
@mich41: I completely missed that, and got hung up on the "sd X" bit later:
### this was sd 7, so I thought sdb == ata7
[ 7.119858] sd 7:0:0:0: [sdb] 15633408 512-byte logical blocks: (8.00 GB/7.45 GiB)
Yes, that's my internal m.2 sata drive. I'm seeing this in a whole separate light. I've been applying things like ignore UAS to my external SSD, when I should have been applying it to the internal drive.
I changed the title to simply reflect the error messages since this appears to be turning out not to be the external drive at all. Thanks for taking a look.
Offline
Actually, I take that back. Not going to be related to UAS, as that's a USB thing. Even though I didn't know which drive it was, I still have the correct throttling in syslinux.cfg since it was based on the device number (assuming one just uses the N in ataN):
LABEL arch-uuid
MENU LABEL arch-uuid
LINUX ../vmlinuz-linux
APPEND root=/dev/mapper/root cryptdevice=UUID=5efd2b85-7d45-46f4-8407-1f08cca9847f:root:allow-discards crypto=sha512:aes-xts-plain64:512:: libata.force=noncq,7:3.0Gbps rw
INITRD ../intel-ucode.img,../initramfs-linux.img
Does that seem reasonable? It produces the 3.0Gbps change.
Before:
[ 5.927511] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
After:
[ 5.897599] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Offline
Given that it is an SSD I would first try only noncqtrim, then maybe either reducing the speed or disabling ncq completely.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Rookie, I probably should have posted my syslinux.cfg in the original post, but I've had noncq in my kernel options almost from day 1 after the initial occurrence (it's one of the first suggestions I ran across with similar reports).
You can see it in my last post. I just added the 3gbps bit.
Offline
Then I guess there isn't much more you can disable. You could try reducing the speed once again but I suspect that will not help much. You could try the suggestions given here [1] and see if there are any changes.
One thing that caught my eye now is that on your truncated dmesg output you have lots of starting and stopping sda, that is your problematic ssd right? I was wondering if by any change you are using any of the suggestions of powertop for optimizing power consumption or using something else to try to reduce power usage. My personal experience with the suggestions of powertop is that they can spell trouble if you don't know what they mean/do and still apply them without testing every change very thoroughly.
My last guess is that the ssd you are using might need some quirk to work properly, you would need to ask for help in the kernel bugzilla to find out which quirk might be needed. You would need to say exactly what hardware you have and what you've tried to make things work and you might get lucky that someone might recognize the problem and know exactly what to do to fix it.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
@Rookie: So far, it's looking good. I got one error, but I believe that's due to trying to pass two libata.force options separately the first time (libata.force=noncq, libata.force=7:3.0Gbps). My current syslinux.cfg is like the above, but I booted off the variant here and haven't rebooted yet. The error is back to "WRITE FPDMA QUEUED," which is what I used to get before noncq, at which point it changed to "WRITE DMA EXT."
I just checked, and I believe those starting/stopping lines are due to suspends. Note that I ran @mich41's grep command, so it wasn't pulling in everything. I just checked dmesg and the number of start/stop is roughly what I'd expect given my number of suspend/resumes. Here's an example:
]$ dmesg |grep -b10 -a10 Stopping
72042-[ 6928.757658] usb 3-9: USB disconnect, device number 7
72098-[ 8612.458436] usb 3-6: USB disconnect, device number 4
72154-[ 8634.738559] PM: Syncing filesystems ... done.
72203-[ 8634.742856] PM: Preparing system for sleep (mem)
72255-[ 8634.762826] Freezing user space processes ... (elapsed 0.001 seconds) done.
72334-[ 8634.764036] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
72418-[ 8634.765112] PM: Suspending system (mem)
72461-[ 8634.765128] Suspending console(s) (use no_console_suspend to debug)
72532-[ 8634.765287] wlp62s0: deauthenticating from 84:80:2d:b5:b2:8f by local choice (Reason: 3=DEAUTH_LEAVING)
72639-[ 8634.779450] sd 6:0:0:0: [sda] Synchronizing SCSI cache
72697:[ 8634.780136] sd 6:0:0:0: [sda] Stopping disk
72744-[ 8634.974789] parport_pc 00:04: disabled
72786-[ 8634.975406] e1000e: EEE TX LPI TIMER: 00000011
72836-[ 8635.254745] PM: suspend of devices complete after 488.894 msecs
72903-[ 8635.278258] PM: late suspend of devices complete after 23.482 msecs
72974-[ 8635.280909] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI
73042-[ 8635.282025] ehci-pci 0000:00:1a.0: System wakeup enabled by ACPI
73110-[ 8635.282108] e1000e 0000:00:19.0: System wakeup enabled by ACPI
73176-[ 8635.282201] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
73244-[ 8635.295189] PM: noirq suspend of devices complete after 16.908 msecs
73316-[ 8635.295436] ACPI: Preparing to enter system sleep state S3
If I go a lot further after looking for "Starting," eventually I get a "PM: Finishing wakeup" line. But definitely a good call to look for that sort of thing. To my knowledge I'm not using any power management except acpi/acpid.
I'll see if this throttling continues to help, and otherwise pursue the kernel bug option (thanks for the link!).
Offline
Just an update on this. I spend a couple weeks at 3.0 Gbps and even more since then at 1.5 Gbps. It does not seem to be having an effect and I still get periodic lockups for ~5-10 seconds. Example dmesg from two separate occurrences:
[ 5112.536807] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 5112.536812] ata7.00: failed command: READ DMA EXT
[ 5112.536815] ata7.00: cmd 25/00:08:08:a2:ac/00:00:16:00:00/e0 tag 10 dma 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 5112.536817] ata7.00: status: { DRDY }
[ 5112.536819] ata7: hard resetting link
[ 5112.854468] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 5112.855173] ata7.00: configured for UDMA/100
[ 5112.855177] ata7.00: device reported invalid CHS sector 0
[ 5112.855187] ata7: EH complete
[ 5213.564235] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 5213.564239] ata7.00: failed command: WRITE DMA EXT
[ 5213.564242] ata7.00: cmd 35/00:10:98:cf:b1/00:00:16:00:00/e0 tag 12 dma 8192 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 5213.564243] ata7.00: status: { DRDY }
[ 5213.564262] ata7: hard resetting link
[ 5213.875260] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 5213.876005] ata7.00: configured for UDMA/100
[ 5213.876009] ata7.00: device reported invalid CHS sector 0
[ 5213.876020] ata7: EH complete
Here's the syslinux.cfg line I've been using for some time:
LABEL arch-uuid
MENU LABEL arch-uuid
LINUX ../vmlinuz-linux
APPEND root=/dev/mapper/root cryptdevice=UUID=5efd2b85-7d45-46f4-8407-1f08cca9847f:root:allow-discards crypto=sha512:aes-xts-plain64:512:: libata.force=noncq,7:1.5Gbps rw
INITRD ../intel-ucode.img,../initramfs-linux.img
Should I move on to a kernel bug? Anything else to look into?
Offline
What is that ",7" in your APPEND line? Maybe the kernel is confused about the options you're giving it.
Try simply:
libata.force=1.5Gbps
Offline
According to http://lxr.free-electrons.com/source/dr … ?v4.3#L697 , it seems that you'll only get this error (or actually warning?) if the disk does not support any LBA mode. It sounds pretty unlikely to me that it could be true for modern HDD or SSD. So can you paste the output of:
smartctl --identify=wb /dev/sda | grep -i lba
(smartctl is provided by the package smartmontools)
Also, try to check if the BIOS/UEFI has an option for switching between CHS and LBA access. Btw I can see that you have two SATA controllers (Intel and Marvell), have you tried switching to another and see if it helps?
Last edited by tom.ty89 (2016-01-02 06:29:35)
Offline
@brebs: the ",7:1.5Gbps" is an additional option to "libata.force=", per the kernel option docs:
libata.force= [LIBATA] Force configurations. The format is comma
separated list of "[ID:]VAL" where ID is
PORT[.DEVICE].
So, you pass options in a comma separated list, optionally preceded by a target device number followed by a colon. Thus, in my line:
libata.force=noncq,7:1.5Gbps
I'm passing `noncq` to all devices and a speed threshold of `1.5Gbps` only to device 7 (per my errors on ata7 in dmesg). Hope that was correct... I could try your suggestion to just pass it globally, but it's now the only SSD I'm using and only partition mounted, so I'm not sure what difference it would make. Lastly, it appears the option is taking, as without it I get messages of 6Gbps, and with `7:3.0Gbps` I see it at 3. So, just adding that as it seems to have the desired effect.
$ dmesg | grep -i gbps
5.593778] ata7: FORCE: PHY spd limit set to 1.5Gbps
[ 5.913958] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
@tom.ty89: hadn't heard of that, but it appears supported:
$ sudo smartctl --identify=wb /dev/sda |grep -i lba
49 9 1 LBA supported
69 5 1 Trimmed LBA range(s) returning zeroed data supported
105 - 0x0010 Max blocks of LBA Range Entries per DS MANAGEMENT cmd
I'll check the BIOS options next time I reboot, though from googling "HP bios chs," I found this list of options listed in the HP docs:
Primary and secondary device submenu:
- Type
- CHS Format
- Cylinders
- Heads
- Sectors
- Maximum Capacity
- LBA Format
- Total Sectors
- Maximum Capacity
- Multi-Sector Transfers
- LBA Mode Control
- 32 Bit I/O
- Transfer Mode
- Ultra DMA Mode
Would you suggest one or the other, or just that I should try changing it to whatever's it's not currently set to? How do I select my SATA controller? I have an internal M.2 SSD and an SSD-capable SATA3 expansion slot... could they be dedicated controllers? I'm wondering about that "Ultra DMA Mode" as well given the error is "failed command: READ DMA EXT." I'll have to read about DMA to see if there's anything to that.
Thanks for the suggestions!
----------
EDIT: In the list of libata errors, I searched for "DMA" and found this:
ICRC: Interface CRC error during Ultra DMA transfer - often either a bad cable or power problem, though possibly an incorrect Ultra DMA mode setting by the driver
I don't actually have the ICRC error message, but it's interesting that errors and the Ultra DMA setting can be related... I'll definitely try fiddling with that in the BIOS options.
Last edited by jwhendy (2016-01-02 07:38:02)
Offline
I didn't realize it was a laptop. Never mind then, the Marvell pcie ssd controller is probably for the M.2 drive(s).
I doubt that the notebook would actually have all of those options anyway, we'll see what to tune after you have checked what it got.
FWIW, there is:
libata.dma= [LIBATA] DMA control
libata.dma=0 Disable all PATA and SATA DMA
libata.dma=1 PATA and SATA Disk DMA only
libata.dma=2 ATAPI (CDROM) DMA only
libata.dma=4 Compact Flash DMA only
Combinations also work, so libata.dma=3 enables DMA
for disks and CDROMs, but not CFs
though it isn't quite a real solution.
You can also set the udma mode to 100 or lower with libata.force=. I doubt that it matters though.
Offline
@tom.ty89: you were right; definitely nothing like those options in BIOS!
I can look into the UDMA mode... hdparm sees it's set to udma5:
$ sudo hdparm -i /dev/sda
[sudo] password for jwhendy:
/dev/sda:
Model=SanDisk SD6PP4M-256G-1006, FwRev=A200806, SerialNo=144958400789
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
BuffType=unknown, BuffSize=unknown, MaxMultSect=1, MultSect=off
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=500118192
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: unknown setting WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-3,4,5,6,7
After digging around in the errors again, I saw this note for the timeout error:
timeout: Controller failed to respond to an active ATA command. This could be any number of causes. Most often this is due to an unrelated interrupt subsystem bug (try booting with 'pci=nomsi' or 'acpi=off' or 'noapic'), which failed to deliver an interrupt when we were expecting one from the hardware.
Currently I'm booted to kernel 4.3.3 and the options: libata.force=noncq pci=nomsi
So far, so good, though I don't know what triggers the issue normally, so it'll just take some time to see. I noticed there are subset msi restrictions that can be used (like pcie_hp=nomsi and pcie_pme=nomsi), so if generally disabling it works I may try one of those to see if I can narrow in on the affected area.
Thanks for the continued suggestions/comments.
Offline
If I'm not recalling incorrectly I've seen somewhere that for sata drives the udma mode is just indicative of the equivalent speed with a parallel interface so changing it (if it is even possible) might not do anything. There is a good chance I'm wrong though, so do confirm if this is true or not.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
@R00KIE: I was wondering the same thing when it came up (thinking udma/[100, 66, 33] were analogs for [6.0, 3.0, 1.5]Gbps), but noticed this in the libata option docs
* SATA link speed limit: 1.5Gbps or 3.0Gbps.
* Transfer mode: pio[0-7], mwdma[0-4] and udma[0-7].
udma[/][16,25,33,44,66,100,133] notation is also
allowed.
Since there are two dedicated options, I took that to mean they were separate things, but am not sure...
I currently haven't had a hang since adding pci=nomsi, but it's only been ~3/4 of a day. I see these messages referring to msi in dmesg:
$ dmesg |grep -i msi
[ 4.240632] acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
[ 5.642872] rtsx_pci 0000:60:00.0: rtsx_pci_acquire_irq: pcr->msi_en = 0, pci->irq = 17
[ 17.165503] e1000e 0000:00:19.0 0000:00:19.0 (uninitialized): Failed to initialize MSI interrupts. Falling back to legacy interrupts.
[ 17.200152] iwlwifi 0000:3e:00.0: pci_enable_msi failed(0Xffffffea)
[ 17.339692] snd_hda_intel 0000:01:00.1: Disabling MSI
I wouldn't think the drivers explicitly turning off msi (ethernet, wireless, and sound) would have been the cause of my hangs?? We'll see, but so far so good!
Offline