You are not logged in.

#1 2023-06-05 00:35:29

mys_721tx
Member
Registered: 2020-08-13
Posts: 8

NVME disk dropping off (nvme nvme0: I/O 960 (Read) QID 9 timeout)

In the past year my computer has encountered this problem couple times essentially identical to thepanu (https://bbs.archlinux.org/viewtopic.php?id=271434). The NVME drive will timeout without warning after running for couple days and leaves the filesystem to read only.

I have a Samsung 980 Pro as nvme0 and a Samsung 970 Plus as nvme1.

NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda           8:0    0 12.7T  0 disk 
sdb           8:16   0 12.7T  0 disk /mnt/storage
nvme0n1     259:0    0  1.8T  0 disk 
├─nvme0n1p1 259:1    0  260M  0 part 
├─nvme0n1p2 259:2    0   16M  0 part 
├─nvme0n1p3 259:3    0  1.8T  0 part 
└─nvme0n1p4 259:4    0  650M  0 part 
nvme1n1     259:5    0  1.8T  0 disk 
├─nvme1n1p1 259:6    0  260M  0 part /efi
├─nvme1n1p2 259:7    0    1G  0 part [SWAP]
└─nvme1n1p3 259:8    0  1.8T  0 part /

Then upon reboot, the drive (nvme0) is not detected by UEFI until a cold reboot, similar to thepanu. I suspect there is some EFI parameter to prevent the drive to be detected.

Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 960 (Read) QID 9 timeout, aborting
Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 743 (Read) QID 11 timeout, aborting
Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 453 (Read) QID 17 timeout, aborting
Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 405 (Read) QID 25 timeout, aborting
Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 197 (Read) QID 27 timeout, aborting
Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 16 (Read) QID 31 timeout, aborting
Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 459 (Write) QID 36 timeout, aborting
Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 0 QID 0 timeout, reset controller
Jun 04 13:05:02 claw kernel: nvme nvme0: I/O 20 QID 5 timeout, reset controller
Jun 04 13:05:02 claw kernel: INFO: task jbd2/nvme0n1p3-:656 blocked for more than 122 seconds.
Jun 04 13:05:02 claw kernel:       Tainted: P           OE      6.3.5-arch1-1 #1
Jun 04 13:05:02 claw kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 04 13:05:02 claw kernel: task:jbd2/nvme0n1p3- state:D stack:0     pid:656   ppid:2      flags:0x00004000
Jun 04 13:05:02 claw kernel: Call Trace:
Jun 04 13:05:02 claw kernel:  <TASK>
Jun 04 13:05:02 claw kernel:  __schedule+0x443/0x1400
Jun 04 13:05:02 claw kernel:  ? ll_back_merge_fn+0x16d/0x200
Jun 04 13:05:02 claw kernel:  schedule+0x5e/0xd0
Jun 04 13:05:02 claw kernel:  io_schedule+0x46/0x70
Jun 04 13:05:02 claw kernel:  bit_wait_io+0x11/0x70
Jun 04 13:05:02 claw kernel:  __wait_on_bit+0x46/0x140
Jun 04 13:05:02 claw kernel:  ? __pfx_bit_wait_io+0x10/0x10
Jun 04 13:05:02 claw kernel:  out_of_line_wait_on_bit+0x95/0xc0
Jun 04 13:05:02 claw kernel:  ? __pfx_wake_bit_function+0x10/0x10
Jun 04 13:05:02 claw kernel:  jbd2_journal_commit_transaction+0x118c/0x1a00 [jbd2 c1478f781fd10ba934f65cc9c8e6aae49ec2f390]
Jun 04 13:05:02 claw kernel:  kjournald2+0xad/0x280 [jbd2 c1478f781fd10ba934f65cc9c8e6aae49ec2f390]
Jun 04 13:05:02 claw kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Jun 04 13:05:02 claw kernel:  ? __pfx_kjournald2+0x10/0x10 [jbd2 c1478f781fd10ba934f65cc9c8e6aae49ec2f390]
Jun 04 13:05:02 claw kernel:  kthread+0xde/0x110
Jun 04 13:05:02 claw kernel:  ? __pfx_kthread+0x10/0x10
Jun 04 13:05:02 claw kernel:  ret_from_fork+0x2c/0x50
Jun 04 13:05:02 claw kernel:  </TASK>
Jun 04 13:05:02 claw kernel: INFO: task chronyd:1277 blocked for more than 122 seconds.
Jun 04 13:05:02 claw kernel:       Tainted: P           OE      6.3.5-arch1-1 #1
Jun 04 13:05:02 claw kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 04 13:05:02 claw kernel: task:chronyd         state:D stack:0     pid:1277  ppid:1      flags:0x00000002
Jun 04 13:05:02 claw kernel: Call Trace:
Jun 04 13:05:02 claw kernel:  <TASK>
Jun 04 13:05:02 claw kernel:  __schedule+0x443/0x1400
Jun 04 13:05:02 claw kernel:  ? xas_load+0x41/0x50
Jun 04 13:05:02 claw kernel:  schedule+0x5e/0xd0
Jun 04 13:05:02 claw kernel:  io_schedule+0x46/0x70
Jun 04 13:05:02 claw kernel:  bit_wait_io+0x11/0x70
Jun 04 13:05:02 claw kernel:  __wait_on_bit+0x46/0x140
Jun 04 13:05:02 claw kernel:  ? __pfx_bit_wait_io+0x10/0x10
Jun 04 13:05:02 claw kernel:  out_of_line_wait_on_bit+0x95/0xc0
Jun 04 13:05:02 claw kernel:  ? __pfx_wake_bit_function+0x10/0x10
Jun 04 13:05:02 claw kernel:  do_get_write_access+0x266/0x410 [jbd2 c1478f781fd10ba934f65cc9c8e6aae49ec2f390]
Jun 04 13:05:02 claw kernel:  jbd2_journal_get_write_access+0x5f/0x80 [jbd2 c1478f781fd10ba934f65cc9c8e6aae49ec2f390]
Jun 04 13:05:02 claw kernel:  __ext4_journal_get_write_access+0x85/0x180 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  ext4_reserve_inode_write+0x61/0xc0 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  __ext4_mark_inode_dirty+0x78/0x240 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  ? jbd2__journal_start+0xfc/0x1f0 [jbd2 c1478f781fd10ba934f65cc9c8e6aae49ec2f390]
Jun 04 13:05:02 claw kernel:  ext4_dirty_inode+0x5b/0x80 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  __mark_inode_dirty+0x5a/0x390
Jun 04 13:05:02 claw kernel:  generic_update_time+0x7c/0xc0
Jun 04 13:05:02 claw kernel:  file_modified_flags+0xe0/0x100
Jun 04 13:05:02 claw kernel:  ext4_buffered_write_iter+0x55/0x140 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  vfs_write+0x239/0x3f0
Jun 04 13:05:02 claw kernel:  ksys_write+0x6f/0xf0
Jun 04 13:05:02 claw kernel:  do_syscall_64+0x60/0x90
Jun 04 13:05:02 claw kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 13:05:02 claw kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Jun 04 13:05:02 claw kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 13:05:02 claw kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 13:05:02 claw kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 13:05:02 claw kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jun 04 13:05:02 claw kernel: RIP: 0033:0x7f0579671bff
Jun 04 13:05:02 claw kernel: RSP: 002b:00007ffeae93c7a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Jun 04 13:05:02 claw kernel: RAX: ffffffffffffffda RBX: 0000000000000075 RCX: 00007f0579671bff
Jun 04 13:05:02 claw kernel: RDX: 0000000000000075 RSI: 000055c0fd0cfaf0 RDI: 000000000000000e
Jun 04 13:05:02 claw kernel: RBP: 000055c0fd0cfaf0 R08: 0000000000000000 R09: 00007ffeae93bfd0
Jun 04 13:05:02 claw kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000075
Jun 04 13:05:02 claw kernel: R13: 000055c0fd0b0330 R14: 0000000000000075 R15: 00007f057974fca0
Jun 04 13:05:02 claw kernel:  </TASK>
Jun 04 13:05:02 claw kernel: INFO: task LIBUV_WORKER:7660 blocked for more than 122 seconds.
Jun 04 13:05:02 claw kernel:       Tainted: P           OE      6.3.5-arch1-1 #1
Jun 04 13:05:02 claw kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 04 13:05:02 claw kernel: task:LIBUV_WORKER    state:D stack:0     pid:7660  ppid:1      flags:0x00000002
Jun 04 13:05:02 claw kernel: Call Trace:
Jun 04 13:05:02 claw kernel:  <TASK>
Jun 04 13:05:02 claw kernel:  __schedule+0x443/0x1400
Jun 04 13:05:02 claw kernel:  ? __switch_to_asm+0x3e/0x80
Jun 04 13:05:02 claw kernel:  schedule+0x5e/0xd0
Jun 04 13:05:02 claw kernel:  schedule_preempt_disabled+0x15/0x30
Jun 04 13:05:02 claw kernel:  rwsem_down_write_slowpath+0x203/0x690
Jun 04 13:05:02 claw kernel:  ? futex_wait_queue+0x63/0x90
Jun 04 13:05:02 claw kernel:  down_write+0x5b/0x60
Jun 04 13:05:02 claw kernel:  ext4_file_write_iter+0x572/0x8a0 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  ? apparmor_file_permission+0x70/0x170
Jun 04 13:05:02 claw kernel:  vfs_write+0x239/0x3f0
Jun 04 13:05:02 claw kernel:  __x64_sys_pwrite64+0x98/0xd0
Jun 04 13:05:02 claw kernel:  do_syscall_64+0x60/0x90
Jun 04 13:05:02 claw kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 13:05:02 claw kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Jun 04 13:05:02 claw kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 13:05:02 claw kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 13:05:02 claw kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 13:05:02 claw kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jun 04 13:05:02 claw kernel: RIP: 0033:0x7f6adfc20c17
Jun 04 13:05:02 claw kernel: RSP: 002b:00007f6ab430bad0 EFLAGS: 00000297 ORIG_RAX: 0000000000000012
Jun 04 13:05:02 claw kernel: RAX: ffffffffffffffda RBX: 00007f6ab430c3a8 RCX: 00007f6adfc20c17
Jun 04 13:05:02 claw kernel: RDX: 000000000000c000 RSI: 0000562920baf000 RDI: 000000000000006b
Jun 04 13:05:02 claw kernel: RBP: 00007f6ae05f6658 R08: 0000000000000001 R09: 00000000ffffffff
Jun 04 13:05:02 claw kernel: R10: 000000000041d000 R11: 0000000000000297 R12: 0000000000000001
Jun 04 13:05:02 claw kernel: R13: 000056291b9e6468 R14: 0000000000000000 R15: 0000000000000002
Jun 04 13:05:02 claw kernel:  </TASK>
Jun 04 13:05:02 claw kernel: INFO: task kworker/u256:2:345467 blocked for more than 122 seconds.
Jun 04 13:05:02 claw kernel:       Tainted: P           OE      6.3.5-arch1-1 #1
Jun 04 13:05:02 claw kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 04 13:05:02 claw kernel: task:kworker/u256:2  state:D stack:0     pid:345467 ppid:2      flags:0x00004000
Jun 04 13:05:02 claw kernel: Workqueue: writeback wb_workfn (flush-259:5)
Jun 04 13:05:02 claw kernel: Call Trace:
Jun 04 13:05:02 claw kernel:  <TASK>
Jun 04 13:05:02 claw kernel:  __schedule+0x443/0x1400
Jun 04 13:05:02 claw kernel:  ? __kfence_alloc+0xc0/0x6b0
Jun 04 13:05:02 claw kernel:  ? mempool_alloc+0x89/0x1b0
Jun 04 13:05:02 claw kernel:  ? __pfx_wbt_inflight_cb+0x10/0x10
Jun 04 13:05:02 claw kernel:  ? __pfx_wbt_cleanup_cb+0x10/0x10
Jun 04 13:05:02 claw kernel:  schedule+0x5e/0xd0
Jun 04 13:05:02 claw kernel:  io_schedule+0x46/0x70
Jun 04 13:05:02 claw kernel:  rq_qos_wait+0xc0/0x140
Jun 04 13:05:02 claw kernel:  ? __pfx_rq_qos_wake_function+0x10/0x10
Jun 04 13:05:02 claw kernel:  ? __pfx_wbt_inflight_cb+0x10/0x10
Jun 04 13:05:02 claw kernel:  wbt_wait+0xa6/0x110
Jun 04 13:05:02 claw kernel:  __rq_qos_throttle+0x27/0x40
Jun 04 13:05:02 claw kernel:  blk_mq_submit_bio+0x262/0x5e0
Jun 04 13:05:02 claw kernel:  __submit_bio+0xf5/0x180
Jun 04 13:05:02 claw kernel:  submit_bio_noacct_nocheck+0x332/0x370
Jun 04 13:05:02 claw kernel:  ? submit_bio_noacct+0x7b/0x4d0
Jun 04 13:05:02 claw kernel:  ext4_io_submit+0x24/0x40 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  ext4_do_writepages+0x2ec/0xd10 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  ext4_writepages+0xaf/0x160 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  do_writepages+0xd2/0x1e0
Jun 04 13:05:02 claw kernel:  ? __wb_calc_thresh+0x4b/0x130
Jun 04 13:05:02 claw kernel:  __writeback_single_inode+0x3d/0x360
Jun 04 13:05:02 claw kernel:  writeback_sb_inodes+0x1ed/0x4b0
Jun 04 13:05:02 claw kernel:  __writeback_inodes_wb+0x4c/0xf0
Jun 04 13:05:02 claw kernel:  wb_writeback+0x172/0x2f0
Jun 04 13:05:02 claw kernel:  wb_workfn+0x2b5/0x510
Jun 04 13:05:02 claw kernel:  ? __schedule+0x44b/0x1400
Jun 04 13:05:02 claw kernel:  ? __mod_timer+0x11f/0x370
Jun 04 13:05:02 claw kernel:  process_one_work+0x1c7/0x3d0
Jun 04 13:05:02 claw kernel:  worker_thread+0x51/0x390
Jun 04 13:05:02 claw kernel:  ? __pfx_worker_thread+0x10/0x10
Jun 04 13:05:02 claw kernel:  kthread+0xde/0x110
Jun 04 13:05:02 claw kernel:  ? __pfx_kthread+0x10/0x10
Jun 04 13:05:02 claw kernel:  ret_from_fork+0x2c/0x50
Jun 04 13:05:02 claw kernel:  </TASK>
Jun 04 13:05:02 claw kernel: INFO: task kworker/u256:1:357313 blocked for more than 122 seconds.
Jun 04 13:05:02 claw kernel:       Tainted: P           OE      6.3.5-arch1-1 #1
Jun 04 13:05:02 claw kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 04 13:05:02 claw kernel: task:kworker/u256:1  state:D stack:0     pid:357313 ppid:2      flags:0x00004000
Jun 04 13:05:02 claw kernel: Workqueue: writeback wb_workfn (flush-259:5)
Jun 04 13:05:02 claw kernel: Call Trace:
Jun 04 13:05:02 claw kernel:  <TASK>
Jun 04 13:05:02 claw kernel:  __schedule+0x443/0x1400
Jun 04 13:05:02 claw kernel:  ? __kfence_alloc+0x653/0x6b0
Jun 04 13:05:02 claw kernel:  ? mempool_alloc+0x89/0x1b0
Jun 04 13:05:02 claw kernel:  ? __pfx_wbt_inflight_cb+0x10/0x10
Jun 04 13:05:02 claw kernel:  ? __pfx_wbt_cleanup_cb+0x10/0x10
Jun 04 13:05:02 claw kernel:  schedule+0x5e/0xd0
Jun 04 13:05:02 claw kernel:  io_schedule+0x46/0x70
Jun 04 13:05:02 claw kernel:  rq_qos_wait+0xc0/0x140
Jun 04 13:05:02 claw kernel:  ? __pfx_rq_qos_wake_function+0x10/0x10
Jun 04 13:05:02 claw kernel:  ? __pfx_wbt_inflight_cb+0x10/0x10
Jun 04 13:05:02 claw kernel:  wbt_wait+0xa6/0x110
Jun 04 13:05:02 claw kernel:  __rq_qos_throttle+0x27/0x40
Jun 04 13:05:02 claw kernel:  blk_mq_submit_bio+0x262/0x5e0
Jun 04 13:05:02 claw kernel:  __submit_bio+0xf5/0x180
Jun 04 13:05:02 claw kernel:  submit_bio_noacct_nocheck+0x332/0x370
Jun 04 13:05:02 claw kernel:  ? submit_bio_noacct+0x7b/0x4d0
Jun 04 13:05:02 claw kernel:  ext4_io_submit+0x24/0x40 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  ext4_do_writepages+0x2ec/0xd10 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  ? acpi_ex_access_region+0x2c2/0x510
Jun 04 13:05:02 claw kernel:  ext4_writepages+0xaf/0x160 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 13:05:02 claw kernel:  do_writepages+0xd2/0x1e0
Jun 04 13:05:02 claw kernel:  ? __wb_calc_thresh+0x4b/0x130
Jun 04 13:05:02 claw kernel:  __writeback_single_inode+0x3d/0x360
Jun 04 13:05:02 claw kernel:  writeback_sb_inodes+0x1ed/0x4b0
Jun 04 13:05:02 claw kernel:  __writeback_inodes_wb+0x4c/0xf0
Jun 04 13:05:02 claw kernel:  wb_writeback+0x172/0x2f0
Jun 04 13:05:02 claw kernel:  wb_workfn+0x2b5/0x510
Jun 04 13:05:02 claw kernel:  ? __schedule+0x44b/0x1400
Jun 04 13:05:02 claw kernel:  ? __mod_timer+0x11f/0x370
Jun 04 13:05:02 claw kernel:  process_one_work+0x1c7/0x3d0
Jun 04 13:05:02 claw kernel:  worker_thread+0x51/0x390
Jun 04 13:05:02 claw kernel:  ? __pfx_worker_thread+0x10/0x10
Jun 04 13:05:02 claw kernel:  kthread+0xde/0x110
Jun 04 13:05:02 claw kernel:  ? __pfx_kthread+0x10/0x10
Jun 04 13:05:02 claw kernel:  ret_from_fork+0x2c/0x50
Jun 04 13:05:02 claw kernel:  </TASK>
Jun 04 13:05:02 claw kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Jun 04 13:05:02 claw kernel: nvme nvme0: Abort status: 0x371
Jun 04 13:05:02 claw kernel: nvme nvme0: Abort status: 0x371
Jun 04 13:05:02 claw kernel: nvme nvme0: Abort status: 0x371
Jun 04 13:05:02 claw kernel: nvme nvme0: Abort status: 0x371
Jun 04 13:05:02 claw kernel: nvme nvme0: Abort status: 0x371
Jun 04 13:05:02 claw kernel: nvme nvme0: Abort status: 0x371
Jun 04 13:05:02 claw kernel: nvme nvme0: Abort status: 0x371
Jun 04 13:05:02 claw kernel: nvme nvme0: Abort status: 0x371
Jun 04 13:05:02 claw kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Jun 04 13:05:02 claw kernel: nvme nvme0: Disabling device after reset failure: -19
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 764952864 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 1194817536 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 766564136 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 766604472 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 725504160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 16736152 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 655206896 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 1955134152 op 0x1:(WRITE) flags 0x800 phys_seg 12 prio class 2
Jun 04 13:05:02 claw kernel: I/O error, dev nvme0n1, sector 186484736 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 2
Jun 04 13:05:02 claw kernel: EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O error 10 writing to inode 97781872 starting block 23310592)
Jun 04 13:05:02 claw kernel: Buffer I/O error on device nvme0n1p3, logical block 22981632
Jun 04 13:05:02 claw kernel: Aborting journal on device nvme0n1p3-8.
Jun 04 13:05:02 claw kernel: EXT4-fs error (device nvme0n1p3): ext4_journal_check_start:83: comm LIBUV_WORKER: Detected aborted journal
Jun 04 13:05:02 claw kernel: Buffer I/O error on dev nvme0n1p3, logical block 243826688, lost sync page write
Jun 04 13:05:02 claw kernel: JBD2: I/O error when updating journal superblock for nvme0n1p3-8.
Jun 04 13:05:02 claw kernel: Buffer I/O error on dev nvme0n1p3, logical block 0, lost sync page write
Jun 04 13:05:02 claw kernel: EXT4-fs (nvme0n1p3): I/O error while writing superblock
Jun 04 13:05:02 claw kernel: EXT4-fs (nvme0n1p3): Remounting filesystem read-only
Jun 04 13:05:02 claw kernel: nvme0n1: detected capacity change from 3907029168 to 0
Jun 04 13:05:02 claw kernel: EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O error 10 writing to inode 97781833 starting block 158995596)
Jun 04 13:05:02 claw kernel: Buffer I/O error on device nvme0n1p3, logical block 158666636
Jun 04 13:05:02 claw kernel: EXT4-fs error (device nvme0n1p3) in ext4_reserve_inode_write:5914: Journal has aborted
Jun 04 13:05:02 claw kernel: EXT4-fs error (device nvme0n1p3) in ext4_orphan_add:188: Journal has aborted
Jun 04 13:05:02 claw kernel: EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O error 10 writing to inode 97783246 starting block 363859666)
Jun 04 13:05:02 claw kernel: Buffer I/O error on dev nvme0n1p3, logical block 88604683, lost async page write
Jun 04 13:05:02 claw kernel: Buffer I/O error on device nvme0n1p3, logical block 363530706
Jun 04 13:05:02 claw kernel: Buffer I/O error on dev nvme0n1p3, logical block 43, lost async page write
Jun 04 13:05:02 claw kernel: EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O error 10 writing to inode 97781834 starting block 89307443)
Jun 04 13:05:02 claw kernel: Buffer I/O error on device nvme0n1p3, logical block 88978483
Jun 04 13:05:02 claw kernel: Buffer I/O error on dev nvme0n1p3, logical block 0, lost sync page write
Jun 04 13:05:02 claw kernel: EXT4-fs (nvme0n1p3): I/O error while writing superblock
Jun 04 13:05:02 claw kernel: EXT4-fs error (device nvme0n1p3): ext4_dirty_inode:6118: inode #97781834: comm chronyd: mark_inode_dirty error
Jun 04 13:05:02 claw kernel: Buffer I/O error on dev nvme0n1p3, logical block 0, lost sync page write
Jun 04 13:05:02 claw kernel: EXT4-fs (nvme0n1p3): I/O error while writing superblock
Jun 04 13:05:02 claw kernel: EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O error 10 writing to inode 97781106 starting block 24513549)
Jun 04 13:05:02 claw kernel: Buffer I/O error on dev nvme0n1p3, logical block 0, lost sync page write
Jun 04 13:05:02 claw kernel: Buffer I/O error on device nvme0n1p3, logical block 24184589
Jun 04 13:05:02 claw kernel: EXT4-fs (nvme0n1p3): I/O error while writing superblock
Jun 04 13:05:02 claw kernel: Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
Jun 04 13:05:02 claw systemd[1]: systemd-journald.service: Main process exited, code=killed, status=6/ABRT
Jun 04 13:05:02 claw systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
Jun 04 13:05:02 claw systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
Jun 04 13:05:02 claw systemd[1]: Stopped Journal Service.

Has anyone else encountered similar problem?

Offline

#2 2023-06-05 06:48:06

seth
Member
Registered: 2012-09-03
Posts: 52,365

Offline

#3 2023-06-05 06:55:03

mys_721tx
Member
Registered: 2020-08-13
Posts: 8

Re: NVME disk dropping off (nvme nvme0: I/O 960 (Read) QID 9 timeout)

Thanks! I have updated the kernel parameter. I will let it run for couple days and report back.

Offline

#4 2023-06-05 06:56:51

seth
Member
Registered: 2012-09-03
Posts: 52,365

Re: NVME disk dropping off (nvme nvme0: I/O 960 (Read) QID 9 timeout)

nb. that there's the timeout and the iommu situation, I'd probably test both first and then see whether I can restore the iommu.

Offline

Board footer

Powered by FluxBB