You are not logged in.

#1 2017-05-28 00:50:07

pwz
Member
Registered: 2017-05-27
Posts: 4

SSD appears to be healthy but got blk_update_request: I/O error

I got the following kernel message

May 26 14:03:42 ... kernel: sd 0:0:0:0: [sda] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
May 26 14:03:42 ... kernel: sd 0:0:0:0: [sda] tag#1 CDB: opcode=0x2a 2a 00 04 16 10 d0 00 00 10 00
May 26 14:03:42 ... kernel: blk_update_request: I/O error, dev sda, sector 68554960
May 26 14:03:42 ... kernel: Aborting journal on device dm-1-8.
May 26 14:03:42 ... kernel: EXT4-fs error (device dm-1): ext4_journal_check_start:56: Detected aborted journal
May 26 14:03:42 ... kernel: EXT4-fs (dm-1): Remounting filesystem read-only
May 26 14:03:42 ... kernel: EXT4-fs error (device dm-1): ext4_journal_check_start:56: Detected aborted journal
May 26 14:03:42 ... kernel: EXT4-fs (dm-1): Remounting filesystem read-only
May 26 14:03:42 ... kernel: EXT4-fs (dm-1): ext4_writepages: jbd2_start: 1024 pages, ino 786457; err -30

Something similar has also happened in February

Feb 21 11:33:25 ... kernel: sd 0:0:0:0: [sda] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
Feb 21 11:33:31 ... kernel: sd 0:0:0:0: [sda] tag#5 CDB: opcode=0x2a 2a 00 04 17 c8 60 00 00 38 00
Feb 21 11:33:32 ... kernel: blk_update_request: I/O error, dev sda, sector 68667488
Feb 21 11:33:32 ... kernel: Aborting journal on device dm-1-8.
Feb 21 11:33:34 ... kernel: EXT4-fs error (device dm-1) in ext4_do_update_inode:4931: Journal has aborted

However, badblocks did not reveal any errors. Short and extended SMART tests also completed without errors. SMART attributes also does not show any problems

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   100   100   ---    -    0
  9 Power_On_Hours          -O--CK   098   098   ---    -    8258
 12 Power_Cycle_Count       -O--CK   096   096   ---    -    3228
175 Program_Fail_Count_Chip -O--CK   100   100   ---    -    0
176 Erase_Fail_Count_Chip   -O--CK   100   100   ---    -    0
177 Wear_Leveling_Count     PO--C-   099   099   ---    -    4
178 Used_Rsvd_Blk_Cnt_Chip  PO--C-   076   076   ---    -    938
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   077   077   ---    -    1826
180 Unused_Rsvd_Blk_Cnt_Tot PO--C-   077   077   ---    -    6238
181 Program_Fail_Cnt_Total  -O--CK   100   100   ---    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   ---    -    0
183 Runtime_Bad_Block       PO--C-   100   100   ---    -    0
187 Uncorrectable_Error_Cnt -O--CK   100   100   ---    -    0
195 ECC_Error_Rate          -O-RC-   200   200   ---    -    0
198 Offline_Uncorrectable   ----CK   100   100   ---    -    0
199 CRC_Error_Count         -OSRCK   253   253   ---    -    0
232 Available_Reservd_Space PO--C-   076   076   ---    -    3094
241 Total_LBAs_Written      -O--CK   033   033   ---    -    2875630704
242 Total_LBAs_Read         -O--CK   060   060   ---    -    1734709377

Used_Rsvd_Blk_Cnt_Tot and Used_Rsvd_Blk_Cnt_Chip are non-zero but I diffed against a smartctl output from March 2015 and those haven't changed at all.

61,62c61,62
<   9 Power_On_Hours          -O--CK   099   099   ---    -    2331
<  12 Power_Cycle_Count       -O--CK   098   098   ---    -    1425
---
>   9 Power_On_Hours          -O--CK   098   098   ---    -    8258
>  12 Power_Cycle_Count       -O--CK   096   096   ---    -    3228
65c65
< 177 Wear_Leveling_Count     PO--C-   099   099   ---    -    2
---
> 177 Wear_Leveling_Count     PO--C-   099   099   ---    -    4
77,78c77,78
< 241 Total_LBAs_Written      -O--CK   014   014   ---    -    3705900715
< 242 Total_LBAs_Read         -O--CK   059   059   ---    -    1736333720
---
> 241 Total_LBAs_Written      -O--CK   033   033   ---    -    2875630704
> 242 Total_LBAs_Read         -O--CK   060   060   ---    -    1734709377

Is my drive actually failing or am I experiencing some intermittent software issue?

Offline

#2 2017-05-28 22:19:36

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: SSD appears to be healthy but got blk_update_request: I/O error

You should post the full output of 'smartctl -a /dev/sda' as it might contain hints about the problem. You should also check if there are any firmware updates for your drive as those might include stability fixes.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#3 2017-05-29 07:21:36

pwz
Member
Registered: 2017-05-27
Posts: 4

Re: SSD appears to be healthy but got blk_update_request: I/O error

Full smartctl output is below.

The drive is an OEM drive from another Dell laptop and corresponds to the Samsung 470 series retail drives. http://en.community.dell.com/support-fo … t/19378224 and http://en.community.dell.com/support-fo … 9#20753649 indicate that Dell does not offer firmware updates and Samsung 470 updates will not work on the PM810.

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     SAMSUNG SSD PM810 TM 256GB
Firmware Version: AXM08D1Q
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 1
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon May 29 02:48:16 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		( 1680) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  28) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   ---    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   ---    Old_age   Always       -       8279
 12 Power_Cycle_Count       0x0032   096   096   ---    Old_age   Always       -       3237
175 Program_Fail_Count_Chip 0x0032   100   100   ---    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   ---    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0013   099   099   ---    Pre-fail  Always       -       5
178 Used_Rsvd_Blk_Cnt_Chip  0x0013   076   076   ---    Pre-fail  Always       -       938
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   077   077   ---    Pre-fail  Always       -       1826
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   077   077   ---    Pre-fail  Always       -       6238
181 Program_Fail_Cnt_Total  0x0032   100   100   ---    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   ---    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   ---    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   ---    Old_age   Always       -       0
195 ECC_Error_Rate          0x001a   200   200   ---    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   ---    Old_age   Offline      -       0
199 CRC_Error_Count         0x003e   253   253   ---    Old_age   Always       -       0
232 Available_Reservd_Space 0x0013   076   076   ---    Pre-fail  Always       -       3094
241 Total_LBAs_Written      0x0032   028   028   ---    Old_age   Always       -       3091971816
242 Total_LBAs_Read         0x0032   071   071   ---    Old_age   Always       -       1232492527

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      8255         -
# 2  Extended offline    Completed without error       00%      8254         -
# 3  Short offline       Completed without error       00%      8254         -
# 4  Extended offline    Completed without error       00%      7388         -
# 5  Short offline       Completed without error       00%      7388         -
# 6  Extended offline    Completed without error       00%      7192         -
# 7  Short offline       Completed without error       00%      7192         -
# 8  Extended offline    Completed without error       00%      5574         -
# 9  Extended offline    Completed without error       00%      2407         -
#10  Short offline       Completed without error       00%      2173         -
#11  Extended offline    Completed without error       00%      1449         -
#12  Short offline       Completed without error       00%      1449         -
#13  Short offline       Completed without error       00%        83         -
#14  Extended offline    Completed without error       00%         0         -
#15  Short offline       Completed without error       00%         0         -
#16  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Offline

#4 2017-05-29 08:47:54

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: SSD appears to be healthy but got blk_update_request: I/O error

Can you also post a full dmesg/journal output where you see the problem and also the output of 'lspci -nnk'? Can you trigger the problem on demand? Do you remember what you where doing/program you were using when you noticed the problem?

The SSD itself has logged no errors so either the problem is between the SSD and the cpu or linux is asking the SSD to do something that it chokes on and times out. The full dmesg output might help in figuring out which is it.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#5 2017-05-29 13:55:04

pwz
Member
Registered: 2017-05-27
Posts: 4

Re: SSD appears to be healthy but got blk_update_request: I/O error

I have attached the output of lscpi -nnk and all journal output around where the problem occured.

I cannot trigger the problem on demand. I noticed the problem when I opened a terminal and noticed that the filesystem had become read-only. But that was several hours after the problem had occurred. Chromium was still working so I didn't notice anything had went wrong at the time. The journal output seems to show that the problem occured around 15 min after I took the computer out of suspend and I started iftop at the same time. But I cannot reproduce the problem with iftop again.

May 26 13:45:35 kernel: ACPI: Low-level resume complete
May 26 13:45:35 kernel: ACPI : EC: EC started
May 26 13:45:35 kernel: PM: Restoring platform NVS memory
May 26 13:45:35 kernel: Suspended for 30597.000 seconds
May 26 13:45:35 kernel: Enabling non-boot CPUs ...
May 26 13:45:35 kernel: x86: Booting SMP configuration:
May 26 13:45:35 kernel: smpboot: Booting Node 0 Processor 1 APIC 0x1
May 26 13:45:35 kernel:  cache: parent cpu1 should not be sleeping
May 26 13:45:35 kernel: CPU1 is up
May 26 13:45:35 kernel: smpboot: Booting Node 0 Processor 2 APIC 0x2
May 26 13:45:35 kernel:  cache: parent cpu2 should not be sleeping
May 26 13:45:35 kernel: CPU2 is up
May 26 13:45:35 kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3
May 26 13:45:35 kernel:  cache: parent cpu3 should not be sleeping
May 26 13:45:35 kernel: CPU3 is up
May 26 13:45:35 kernel: ACPI: Waking up from system sleep state S3
May 26 13:45:35 kernel: thinkpad_acpi: EC reports that Thermal Table has changed
May 26 13:45:35 kernel: acpi LNXPOWER:02: Turning OFF
May 26 13:45:35 kernel: ACPI : EC: interrupt unblocked
May 26 13:45:35 kernel: ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI
May 26 13:45:35 kernel: xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
May 26 13:45:35 kernel: PM: noirq resume of devices complete after 40.625 msecs
May 26 13:45:35 kernel: PM: early resume of devices complete after 0.358 msecs
May 26 13:45:35 kernel: ACPI : EC: event unblocked
May 26 13:45:35 kernel: e1000e 0000:00:19.0: System wakeup disabled by ACPI
May 26 13:45:35 kernel: sd 0:0:0:0: [sda] Starting disk
May 26 13:45:35 kernel: sd 2:0:0:0: [sdc] Starting disk
May 26 13:45:35 kernel: sd 1:0:0:0: [sdb] Starting disk
May 26 13:45:35 kernel: rtc_cmos 00:02: System wakeup disabled by ACPI
May 26 13:45:35 kernel: rtc_cmos 00:02: Alarms can be up to one month in the future
May 26 13:45:35 kernel: iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
May 26 13:45:35 kernel: iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
May 26 13:45:35 kernel: xhci_hcd 0000:00:14.0: port 6 resume PLC timeout
May 26 13:45:35 kernel: xhci_hcd 0000:00:14.0: port 5 resume PLC timeout
May 26 13:45:35 kernel: iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
May 26 13:45:35 kernel: iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
May 26 13:45:35 kernel: usb 2-8: reset high-speed USB device number 4 using xhci_hcd
May 26 13:45:36 kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 26 13:45:36 kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 26 13:45:36 kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
May 26 13:45:36 kernel: ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
May 26 13:45:36 kernel: ata2.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
May 26 13:45:36 kernel: ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
May 26 13:45:36 kernel: ata2.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata2.00: configured for UDMA/133
May 26 13:45:36 kernel: ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
May 26 13:45:36 kernel: ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
May 26 13:45:36 kernel: ata3.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
May 26 13:45:36 kernel: ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
May 26 13:45:36 kernel: ata3.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata3.00: configured for UDMA/133
May 26 13:45:36 kernel: usb 2-6: reset full-speed USB device number 2 using xhci_hcd
May 26 13:45:36 kernel: psmouse serio1: synaptics: queried max coordinates: x [..5676], y [..4758]
May 26 13:45:36 kernel: psmouse serio1: synaptics: queried min coordinates: x [1266..], y [1096..]
May 26 13:45:36 kernel: usb 2-7: reset full-speed USB device number 3 using xhci_hcd
May 26 13:45:36 kernel: PM: resume of devices complete after 929.569 msecs
May 26 13:45:36 kernel: usb 2-7:1.0: rebind failed: -517
May 26 13:45:36 kernel: usb 2-7:1.1: rebind failed: -517
May 26 13:45:36 kernel: PM: Finishing wakeup.
May 26 13:45:36 kernel: Restarting tasks ... done.
May 26 13:45:36 kernel: Bluetooth: hci0: read Intel version: 370810011003110e00
May 26 13:45:36 kernel: Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.8.10-fw-1.10.3.11.e.bseq
May 26 13:45:36 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 26 13:45:36 kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
May 26 13:45:36 kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
May 26 13:45:36 kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
May 26 13:45:36 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
May 26 13:45:36 kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
May 26 13:45:36 kernel: ata1.00: configured for UDMA/133
May 26 13:45:36 kernel: Bluetooth: hci0: Intel Bluetooth firmware patch completed and activated
May 26 13:45:34 systemd[1]: Time has been changed
May 26 13:45:34 bluetoothd[446]: Unable to get io data for Headset Voice gateway: getpeername: Transport endpoint is not connected (107)
May 26 13:45:34 systemd-logind[417]: Lid closed.
May 26 13:45:34 wpa_actiond[20624]: Interface 'wlp3s0' lost connection to network 'APAR-5G'
May 26 13:45:34 systemd-logind[417]: Lid opened.
May 26 13:45:34 bluetoothd[446]: Endpoint unregistered: sender=:1.12 path=/MediaEndpoint/A2DPSource
May 26 13:45:34 systemd[506]: Time has been changed
May 26 13:45:34 bluetoothd[446]: Endpoint unregistered: sender=:1.12 path=/MediaEndpoint/A2DPSink
May 26 13:45:34 systemd[1]: Starting Load/Save RF Kill Switch Status...
May 26 13:45:34 dbus[420]: [system] Rejected send message, 1 matched rules; type="method_return", sender=":1.12" (uid=1000 pid=580 comm="/usr/bin/pulseaudio --daemonize=no ") interface="(unset)" member="(unset)" error name="(unset)" requested_reply="0" destination=":1.2" (uid=0 pid=446 comm="/usr/lib/bluetooth/bluetoothd ")
May 26 13:45:34 systemd[1]: bluetooth.target: Unit not needed anymore. Stopping.
May 26 13:45:34 dbus[420]: [system] Rejected send message, 1 matched rules; type="error", sender=":1.12" (uid=1000 pid=580 comm="/usr/bin/pulseaudio --daemonize=no ") interface="(unset)" member="(unset)" error name="org.bluez.MediaEndpoint1.Error.NotImplemented" requested_reply="0" destination=":1.2" (uid=0 pid=446 comm="/usr/lib/bluetooth/bluetoothd ")
May 26 13:45:34 systemd[1]: Stopped target Bluetooth.
May 26 13:45:34 dbus[420]: [system] Rejected send message, 1 matched rules; type="error", sender=":1.12" (uid=1000 pid=580 comm="/usr/bin/pulseaudio --daemonize=no ") interface="(unset)" member="(unset)" error name="org.bluez.MediaEndpoint1.Error.NotImplemented" requested_reply="0" destination=":1.2" (uid=0 pid=446 comm="/usr/lib/bluetooth/bluetoothd ")
May 26 13:45:34 systemd-sleep[2188]: System resumed.
May 26 13:45:34 dbus[420]: [system] Rejected send message, 1 matched rules; type="error", sender=":1.12" (uid=1000 pid=580 comm="/usr/bin/pulseaudio --daemonize=no ") interface="(unset)" member="(unset)" error name="org.bluez.MediaEndpoint1.Error.NotImplemented" requested_reply="0" destination=":1.2" (uid=0 pid=446 comm="/usr/lib/bluetooth/bluetoothd ")
May 26 13:45:35 systemd[1]: Started Suspend.
May 26 13:45:34 dbus[420]: [system] Rejected send message, 1 matched rules; type="error", sender=":1.12" (uid=1000 pid=580 comm="/usr/bin/pulseaudio --daemonize=no ") interface="(unset)" member="(unset)" error name="org.bluez.MediaEndpoint1.Error.NotImplemented" requested_reply="0" destination=":1.2" (uid=0 pid=446 comm="/usr/lib/bluetooth/bluetoothd ")
May 26 13:45:35 systemd[1]: sleep.target: Unit not needed anymore. Stopping.
May 26 13:45:37 bluetoothd[446]: Endpoint registered: sender=:1.12 path=/MediaEndpoint/A2DPSource
May 26 13:45:35 systemd[1]: Stopped target Sleep.
May 26 13:45:37 bluetoothd[446]: Endpoint registered: sender=:1.12 path=/MediaEndpoint/A2DPSink
May 26 13:45:35 systemd[1]: Reached target Suspend.
May 26 13:45:35 systemd-logind[417]: Operation 'sleep' finished.
May 26 13:45:35 systemd[1]: suspend.target: Unit is bound to inactive unit systemd-suspend.service. Stopping, too.
May 26 13:45:35 systemd[1]: Stopped target Suspend.
May 26 13:45:35 systemd[1]: Reached target Bluetooth.
May 26 13:45:36 systemd[1]: Started Load/Save RF Kill Switch Status.
May 26 13:45:37 kernel: wlp3s0: authenticate with 66:77:7d:69:19:40
May 26 13:45:37 kernel: wlp3s0: send auth to 66:77:7d:69:19:40 (try 1/3)
May 26 13:45:37 kernel: wlp3s0: authenticated
May 26 13:45:37 kernel: wlp3s0: associate with 66:77:7d:69:19:40 (try 1/3)
May 26 13:45:37 kernel: wlp3s0: RX AssocResp from 66:77:7d:69:19:40 (capab=0x411 status=0 aid=1)
May 26 13:45:37 kernel: wlp3s0: associated
May 26 13:45:37 kernel: wlp3s0: Limiting TX power to 21 (24 - 3) dBm as advertised by 66:77:7d:69:19:40
May 26 13:45:37 wpa_actiond[20624]: Interface 'wlp3s0' reestablished connection to network 'APAR-5G'
May 26 13:47:15 systemd-logind[417]: Lid closed.
May 26 14:01:02 systemd-logind[417]: Lid opened.
May 26 14:03:42 kernel: sd 0:0:0:0: [sda] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
May 26 14:03:42 kernel: sd 0:0:0:0: [sda] tag#1 CDB: opcode=0x2a 2a 00 04 16 10 d0 00 00 10 00
May 26 14:03:42 kernel: blk_update_request: I/O error, dev sda, sector 68554960
May 26 14:03:42 kernel: Aborting journal on device dm-1-8.
May 26 14:03:42 sudo[3955]:    admin : TTY=pts/0 ; PWD=/home/admin ; USER=root ; COMMAND=/usr/bin/iftop
May 26 14:03:42 sudo[3955]: pam_unix(sudo:session): session opened for user root by (uid=0)
May 26 14:03:42 kernel: EXT4-fs error (device dm-1): ext4_journal_check_start:56: Detected aborted journal
May 26 14:03:42 kernel: EXT4-fs (dm-1): Remounting filesystem read-only
May 26 14:03:42 kernel: EXT4-fs error (device dm-1): ext4_journal_check_start:56: Detected aborted journal
May 26 14:03:42 kernel: EXT4-fs (dm-1): Remounting filesystem read-only
May 26 14:03:42 kernel: EXT4-fs (dm-1): ext4_writepages: jbd2_start: 1024 pages, ino 786457; err -30
May 26 14:03:53 sudo[3955]: pam_unix(sudo:session): session closed for user root
May 26 14:04:11 systemd-logind[417]: Lid closed.
May 26 14:04:15 systemd-logind[417]: Lid opened.
May 26 14:04:18 systemd-logind[417]: Lid closed.
May 26 14:05:11 systemd-logind[417]: Lid opened.
May 26 14:05:18 systemd-logind[417]: Lid closed.
May 26 18:01:12 systemd-logind[417]: Lid opened.
May 26 18:04:51 sudo[19570]:    admin : TTY=pts/2 ; PWD=/home/admin/backups ; USER=root ; COMMAND=/usr/bin/journalctl -xe
May 26 18:04:51 sudo[19570]: pam_unix(sudo:session): session opened for user root by (uid=0)
May 26 18:05:15 sudo[19570]: pam_unix(sudo:session): session closed for user root
May 26 18:07:17 sudo[19862]:    admin : TTY=pts/0 ; PWD=/home/admin ; USER=root ; COMMAND=/usr/bin/gsmartcontrol
May 26 18:07:17 sudo[19862]: pam_unix(sudo:session): session opened for user root by (uid=0)
May 26 18:10:45 sudo[19862]: pam_unix(sudo:session): session closed for user root
00:00.0 Host bridge [0600]: Intel Corporation Broadwell-U Host Bridge -OPI [8086:1604] (rev 09)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: bdw_uncore
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 5500 [8086:1616] (rev 09)
	Subsystem: Lenovo Device [17aa:5036]
	Kernel driver in use: i915
	Kernel modules: i915
00:03.0 Audio device [0403]: Intel Corporation Broadwell-U Audio Controller [8086:160c] (rev 09)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
00:14.0 USB controller [0c03]: Intel Corporation Wildcat Point-LP USB xHCI Controller [8086:9cb1] (rev 03)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
00:16.0 Communication controller [0780]: Intel Corporation Wildcat Point-LP MEI Controller #1 [8086:9cba] (rev 03)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: mei_me
	Kernel modules: mei_me
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-LM [8086:15a2] (rev 03)
	Subsystem: Lenovo Device [17aa:2226]
	Kernel driver in use: e1000e
	Kernel modules: e1000e
00:1b.0 Audio device [0403]: Intel Corporation Wildcat Point-LP High Definition Audio Controller [8086:9ca0] (rev 03)
	Subsystem: Lenovo Device [17aa:5036]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
00:1c.0 PCI bridge [0604]: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 [8086:9c9a] (rev e3)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:1c.1 PCI bridge [0604]: Intel Corporation Wildcat Point-LP PCI Express Root Port #3 [8086:9c94] (rev e3)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:1d.0 USB controller [0c03]: Intel Corporation Wildcat Point-LP USB EHCI Controller [8086:9ca6] (rev 03)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: ehci-pci
	Kernel modules: ehci_pci
00:1f.0 ISA bridge [0601]: Intel Corporation Wildcat Point-LP LPC Controller [8086:9cc3] (rev 03)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: lpc_ich
	Kernel modules: lpc_ich
00:1f.2 SATA controller [0106]: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] [8086:9c83] (rev 03)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: ahci
	Kernel modules: ahci
00:1f.3 SMBus [0c05]: Intel Corporation Wildcat Point-LP SMBus Controller [8086:9ca2] (rev 03)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
00:1f.6 Signal processing controller [1180]: Intel Corporation Wildcat Point-LP Thermal Management Controller [8086:9ca4] (rev 03)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: intel_pch_thermal
	Kernel modules: intel_pch_thermal
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader [10ec:5227] (rev 01)
	Subsystem: Lenovo Device [17aa:5034]
	Kernel driver in use: rtsx_pci
	Kernel modules: rtsx_pci
03:00.0 Network controller [0280]: Intel Corporation Wireless 7265 [8086:095b] (rev 59)
	Subsystem: Intel Corporation Dual Band Wireless-AC 7265 [8086:5210]
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi

Offline

#6 2017-05-29 15:50:22

seth
Member
Registered: 2012-09-03
Posts: 51,299

Re: SSD appears to be healthy but got blk_update_request: I/O error

You could identify the file(s) on the sector(s) and try to trigger the problem by reading them (dd/cp)
See https://wiki.archlinux.org/index.php/Id … aged_files

The reproducability of certain bad blocks should hint whether this is a device or communication (kernel, memory, cable, temperature, ...) issue

Offline

#7 2017-05-29 18:52:44

pwz
Member
Registered: 2017-05-27
Posts: 4

Re: SSD appears to be healthy but got blk_update_request: I/O error

I'm using lvm so block number calculations become more complicated but I followed this post https://serverfault.com/questions/51089 … 905#510905

Both of the sectors listed in the OP corresponds to inode 8, which given the low number and the fact that ncheck fails to return a path indicates that it is part of the journal. I already fscked the partition and everything appears to be working normally at the moment.
Reading the listed sectors with hdparm or dd did not produce any errors and hdparm returned consistent results.

Offline

#8 2017-05-29 20:19:16

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: SSD appears to be healthy but got blk_update_request: I/O error

Suspend and hibernate on linux can be hit or miss, if anything keep regular backups and keep using your machine. If you manage to find a way to reliably trigger the problem you want to report it upstream as they are the ones who can better debug and solve or workaround the problem.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

Board footer

Powered by FluxBB