You are not logged in.

#1 2021-11-19 18:12:24

thepanu
Member
Registered: 2009-01-15
Posts: 45

NVME disk dropping off (nvme nvme0: I/O 955 QID 3 timeout, aborting)

Background
I recently added an 1TB nvme drive to my system (Crucial CT1000P2SSD8).

It was added alongside existing 1TB sata SSD (Crucial CT1000MX500SSD4 ).

I wanted to make the new drive my main disk mainly due to possibility of any speed gain due to faster disk. So I just dd’d my existing partitions to the new disk, gave them new UUID’s and edited fstab and syslinux.cfg to match.

After the boot everything was ok, although I did have to run fsck on the first boot, but I didn’t think anything about that as it didn't repeat and no data seemed to be missing. Boot went fine and I began shuffling around larger stuff between the disks.

Symptoms
After a few days system began to show unresponsiveness. Especially after being suspended to a disk and resuming. I began seeing following errors in dmesg.

[14934.252070] nvme nvme0: I/O 955 QID 3 timeout, aborting
[14964.332445] nvme nvme0: I/O 955 QID 3 timeout, reset controller
[14994.412008] nvme nvme0: I/O 24 QID 0 timeout, reset controller

As nothing was visible in journalctl from the previous boot I began to pass dmesg output to a file on a separate drive (the original one). This allowed me to catch the error messages even though the drive in question dropped out.

I have tried setting the power control threshold lower as suggested for example in here, even though to my understanding that shouldn’t be needed with current kernel versions.

Additionally I also tried the lts kernel with and without the setting above.

I also tried to see if nvme reset entered immediatly after first symptoms were visible (in dmesg) would  make any difference. It didn’t.

System responds to SysRq-reisub.

When the drive drops out, it occasionally is not detected for booting either, unless full power cycle is done (as opposed to boot with to SysRq-b)

Changed back to my old drive and system has been stable for few days now. And the drive in question has been mounted the whole time as well but no issues so far.


Question
The main question I try to find answer to is if this is an software or hardware issue. Any help is appreciated.


Logs and other information
I have quite lengthy dmesg logs, is there something specific I should search for?

Dmesg after resume (I let this run for a while to see what would come of it)

[14934.252070] nvme nvme0: I/O 955 QID 3 timeout, aborting
[14964.332445] nvme nvme0: I/O 955 QID 3 timeout, reset controller
[14994.412008] nvme nvme0: I/O 24 QID 0 timeout, reset controller
[15112.169492] INFO: task jbd2/nvme0n1p3-:280 blocked for more than 122 seconds.
[15112.169495]       Not tainted 5.10.79-1-lts #1
[15112.169496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169497] task:jbd2/nvme0n1p3- state:D stack:    0 pid:  280 ppid:     2 flags:0x00004000
[15112.169499] Call Trace:
[15112.169504]  __schedule+0x295/0x7f0
[15112.169506]  schedule+0x46/0xb0
[15112.169507]  io_schedule+0x42/0x70
[15112.169510]  wait_on_page_bit_common+0x12e/0x3c0
[15112.169511]  ? file_fdatawait_range+0x20/0x20
[15112.169513]  wait_on_page_writeback+0x25/0x70
[15112.169514]  __filemap_fdatawait_range+0x8c/0xf0
[15112.169516]  filemap_fdatawait_range_keep_errors+0xe/0x40
[15112.169521]  jbd2_journal_commit_transaction+0xd20/0x1b40 [jbd2]
[15112.169527]  kjournald2+0xd8/0x2a0 [jbd2]
[15112.169529]  ? add_wait_queue_exclusive+0x70/0x70
[15112.169532]  ? load_superblock.part.0+0xb0/0xb0 [jbd2]
[15112.169534]  kthread+0x11b/0x140
[15112.169535]  ? kthread_associate_blkcg+0xa0/0xa0
[15112.169537]  ret_from_fork+0x22/0x30
[15112.169539] INFO: task systemd-journal:316 blocked for more than 122 seconds.
[15112.169540]       Not tainted 5.10.79-1-lts #1
[15112.169540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169541] task:systemd-journal state:D stack:    0 pid:  316 ppid:     1 flags:0x00000324
[15112.169542] Call Trace:
[15112.169544]  __schedule+0x295/0x7f0
[15112.169545]  schedule+0x46/0xb0
[15112.169546]  schedule_preempt_disabled+0xa/0x10
[15112.169547]  __mutex_lock.constprop.0+0x15c/0x490
[15112.169553]  reiserfs_write_lock+0x26/0x40 [reiserfs]
[15112.169557]  reiserfs_dirty_inode+0x3b/0xb0 [reiserfs]
[15112.169559]  __mark_inode_dirty+0x192/0x2f0
[15112.169561]  generic_update_time+0x71/0xd0
[15112.169562]  file_update_time+0x135/0x150
[15112.169563]  filemap_page_mkwrite+0x4c/0x150
[15112.169565]  do_page_mkwrite+0x56/0xb0
[15112.169566]  ? _copy_to_user+0x2d/0x40
[15112.169567]  do_wp_page+0x245/0x2f0
[15112.169568]  handle_mm_fault+0x113a/0x1980
[15112.169571]  do_user_addr_fault+0x1b8/0x3f0
[15112.169573]  exc_page_fault+0x64/0x150
[15112.169574]  ? asm_exc_page_fault+0x8/0x30
[15112.169575]  asm_exc_page_fault+0x1e/0x30
[15112.169577] RIP: 0033:0x7f94df1fab00
[15112.169578] RSP: 002b:00007ffd06ca2f10 EFLAGS: 00010246
[15112.169579] RAX: 00007f94dd7d4dc0 RBX: 000055d8d2efcebb RCX: 0000000000038e38
[15112.169580] RDX: 0000000000000000 RSI: 000055d8d2efcebb RDI: 00007f94dda8d128
[15112.169580] RBP: 000055d8d2f0e5a0 R08: 000000000053e0e8 R09: 0000000000000081
[15112.169581] R10: 00007ffd06ce0080 R11: 00007ffd06ce0090 R12: 0000000000000041
[15112.169581] R13: 00000000002847d0 R14: 0000000000000000 R15: 00007ffd06ca2f30
[15112.169588] INFO: task systemd-timesyn:492 blocked for more than 122 seconds.
[15112.169589]       Not tainted 5.10.79-1-lts #1
[15112.169589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169590] task:systemd-timesyn state:D stack:    0 pid:  492 ppid:     1 flags:0x000003a0
[15112.169591] Call Trace:
[15112.169592]  __schedule+0x295/0x7f0
[15112.169594]  schedule+0x46/0xb0
[15112.169595]  schedule_preempt_disabled+0xa/0x10
[15112.169596]  __mutex_lock.constprop.0+0x15c/0x490
[15112.169597]  ? step_into+0x29d/0x700
[15112.169600]  reiserfs_write_lock+0x26/0x40 [reiserfs]
[15112.169603]  reiserfs_setattr+0x8e/0x3b0 [reiserfs]
[15112.169604]  notify_change+0x343/0x4d0
[15112.169606]  ? vfs_utimes+0x145/0x230
[15112.169607]  vfs_utimes+0x145/0x230
[15112.169608]  do_utimes+0xc1/0x120
[15112.169610]  __x64_sys_utimensat+0x9d/0xe0
[15112.169612]  ? syscall_trace_enter.constprop.0+0x12c/0x1a0
[15112.169613]  do_syscall_64+0x33/0x40
[15112.169614]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15112.169615] RIP: 0033:0x7f2e15901223
[15112.169616] RSP: 002b:00007ffd9452aa78 EFLAGS: 00000206 ORIG_RAX: 0000000000000118
[15112.169617] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f2e15901223
[15112.169617] RDX: 0000000000000000 RSI: 00007ffd9452aa90 RDI: 00000000ffffff9c
[15112.169618] RBP: 0000000000000010 R08: 00000000ffffffff R09: 00000000000081a4
[15112.169618] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[15112.169619] R13: 00000000ffffffff R14: ffffffffffffffff R15: 00000000ffffffff
[15112.169622] INFO: task Xorg:608 blocked for more than 122 seconds.
[15112.169623]       Not tainted 5.10.79-1-lts #1
[15112.169624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169624] task:Xorg            state:D stack:    0 pid:  608 ppid:   532 flags:0x00000084
[15112.169625] Call Trace:
[15112.169627]  __schedule+0x295/0x7f0
[15112.169628]  schedule+0x46/0xb0
[15112.169629]  schedule_preempt_disabled+0xa/0x10
[15112.169630]  __mutex_lock.constprop.0+0x15c/0x490
[15112.169633]  reiserfs_write_lock+0x26/0x40 [reiserfs]
[15112.169635]  reiserfs_dirty_inode+0x3b/0xb0 [reiserfs]
[15112.169637]  __mark_inode_dirty+0x192/0x2f0
[15112.169638]  generic_update_time+0x71/0xd0
[15112.169638]  file_update_time+0x135/0x150
[15112.169640]  __generic_file_write_iter+0x96/0x1d0
[15112.169641]  generic_file_write_iter+0x5d/0xb0
[15112.169643]  new_sync_write+0x159/0x1f0
[15112.169645]  vfs_write+0x1b4/0x270
[15112.169646]  ksys_write+0x67/0xe0
[15112.169647]  do_syscall_64+0x33/0x40
[15112.169648]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15112.169649] RIP: 0033:0x7fe3e4aeb93f
[15112.169649] RSP: 002b:00007ffeafeadd70 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[15112.169650] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fe3e4aeb93f
[15112.169651] RDX: 000000000000000d RSI: 00007ffeafeadf70 RDI: 0000000000000004
[15112.169651] RBP: 00007ffeafeadf70 R08: 0000000000000000 R09: 0000000000000001
[15112.169652] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000d
[15112.169652] R13: 000055dfe0ad1590 R14: 000000000000000d R15: 00007fe3e4bbd700
[15112.169662] INFO: task Cache2 I/O:951 blocked for more than 122 seconds.
[15112.169663]       Not tainted 5.10.79-1-lts #1
[15112.169664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169664] task:Cache2 I/O      state:D stack:    0 pid:  951 ppid:     1 flags:0x00000080
[15112.169665] Call Trace:
[15112.169666]  __schedule+0x295/0x7f0
[15112.169668]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
[15112.169668]  schedule+0x46/0xb0
[15112.169669]  io_schedule+0x42/0x70
[15112.169670]  bit_wait_io+0xd/0x50
[15112.169671]  __wait_on_bit+0x2a/0x90
[15112.169672]  out_of_line_wait_on_bit+0x92/0xb0
[15112.169674]  ? var_wake_function+0x20/0x20
[15112.169677]  do_get_write_access+0x261/0x3c0 [jbd2]
[15112.169680]  jbd2_journal_get_write_access+0x63/0x80 [jbd2]
[15112.169690]  __ext4_journal_get_write_access+0x72/0x120 [ext4]
[15112.169701]  __ext4_new_inode+0x4ab/0x16b0 [ext4]
[15112.169713]  ext4_create+0x11c/0x1d0 [ext4]
[15112.169715]  path_openat+0xd56/0xf90
[15112.169716]  do_filp_open+0xa5/0x150
[15112.169718]  do_sys_openat2+0xb0/0x160
[15112.169720]  __x64_sys_openat+0x53/0x90
[15112.169721]  do_syscall_64+0x33/0x40
[15112.169722]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15112.169722] RIP: 0033:0x7f8e49fa1f04
[15112.169723] RSP: 002b:00007f8e3b245b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[15112.169724] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f8e49fa1f04
[15112.169724] RDX: 0000000000000242 RSI: 00007f8d9030ff08 RDI: 00000000ffffff9c
[15112.169725] RBP: 00007f8d9030ff08 R08: 0000000000000000 R09: 00000000ffffffff
[15112.169725] R10: 0000000000000180 R11: 0000000000000293 R12: 0000000000000242
[15112.169726] R13: 0000000000000180 R14: 00007f8e3b246588 R15: 0000000000000180
[15112.169736] INFO: task mozStorage #1:1123 blocked for more than 122 seconds.
[15112.169737]       Not tainted 5.10.79-1-lts #1
[15112.169737] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169738] task:mozStorage #1   state:D stack:    0 pid: 1123 ppid:     1 flags:0x00000080
[15112.169739] Call Trace:
[15112.169740]  __schedule+0x295/0x7f0
[15112.169741]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
[15112.169742]  schedule+0x46/0xb0
[15112.169743]  io_schedule+0x42/0x70
[15112.169744]  bit_wait_io+0xd/0x50
[15112.169745]  __wait_on_bit+0x2a/0x90
[15112.169746]  out_of_line_wait_on_bit+0x92/0xb0
[15112.169747]  ? var_wake_function+0x20/0x20
[15112.169750]  do_get_write_access+0x261/0x3c0 [jbd2]
[15112.169753]  jbd2_journal_get_write_access+0x63/0x80 [jbd2]
[15112.169762]  __ext4_journal_get_write_access+0x72/0x120 [ext4]
[15112.169772]  ext4_reserve_inode_write+0x7f/0xb0 [ext4]
[15112.169781]  __ext4_mark_inode_dirty+0x6f/0x230 [ext4]
[15112.169790]  ? __ext4_journal_start_sb+0xf3/0x110 [ext4]
[15112.169799]  ext4_dirty_inode+0x5f/0x80 [ext4]
[15112.169801]  __mark_inode_dirty+0x192/0x2f0
[15112.169802]  generic_update_time+0x71/0xd0
[15112.169802]  file_update_time+0x135/0x150
[15112.169803]  ? generic_write_checks+0x69/0xc0
[15112.169812]  ext4_buffered_write_iter+0x5a/0x160 [ext4]
[15112.169814]  new_sync_write+0x159/0x1f0
[15112.169816]  vfs_write+0x1b4/0x270
[15112.169817]  ksys_write+0x67/0xe0
[15112.169818]  do_syscall_64+0x33/0x40
[15112.169819]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15112.169820] RIP: 0033:0x7f8e49fa16ff
[15112.169820] RSP: 002b:00007f8dfd151070 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[15112.169821] RAX: ffffffffffffffda RBX: 0000000000000057 RCX: 00007f8e49fa16ff
[15112.169821] RDX: 0000000000000018 RSI: 00007f8dfd151210 RDI: 0000000000000057
[15112.169822] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[15112.169822] R10: 00007ffc39bcc080 R11: 0000000000000293 R12: 00007f8dfd151210
[15112.169823] R13: 0000000000000018 R14: 00007f8e20284308 R15: 00000000001aa7e0
[15112.169825] INFO: task glean.dispatche:1125 blocked for more than 122 seconds.
[15112.169825]       Not tainted 5.10.79-1-lts #1
[15112.169826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169826] task:glean.dispatche state:D stack:    0 pid: 1125 ppid:     1 flags:0x00000080
[15112.169827] Call Trace:
[15112.169829]  __schedule+0x295/0x7f0
[15112.169830]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
[15112.169831]  schedule+0x46/0xb0
[15112.169831]  io_schedule+0x42/0x70
[15112.169832]  bit_wait_io+0xd/0x50
[15112.169833]  __wait_on_bit+0x2a/0x90
[15112.169834]  out_of_line_wait_on_bit+0x92/0xb0
[15112.169836]  ? var_wake_function+0x20/0x20
[15112.169838]  do_get_write_access+0x261/0x3c0 [jbd2]
[15112.169841]  jbd2_journal_get_write_access+0x63/0x80 [jbd2]
[15112.169850]  __ext4_journal_get_write_access+0x72/0x120 [ext4]
[15112.169860]  ext4_orphan_add+0xd0/0x2c0 [ext4]
[15112.169869]  ext4_setattr+0x870/0x8f0 [ext4]
[15112.169870]  notify_change+0x343/0x4d0
[15112.169879]  ? ext4_file_open+0xd5/0x290 [ext4]
[15112.169881]  ? do_truncate+0x88/0xe0
[15112.169882]  do_truncate+0x88/0xe0
[15112.169883]  path_openat+0xcb7/0xf90
[15112.169884]  do_filp_open+0xa5/0x150
[15112.169886]  do_sys_openat2+0xb0/0x160
[15112.169887]  __x64_sys_openat+0x53/0x90
[15112.169888]  do_syscall_64+0x33/0x40
[15112.169889]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15112.169890] RIP: 0033:0x7f8e49fa1f04
[15112.169891] RSP: 002b:00007f8df9efe400 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[15112.169891] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8e49fa1f04
[15112.169892] RDX: 0000000000080241 RSI: 00007f8d97242f40 RDI: 00000000ffffff9c
[15112.169892] RBP: 00007f8d97242f40 R08: 0000000000000000 R09: 0000000000000001
[15112.169893] R10: 00000000000001b6 R11: 0000000000000293 R12: 0000000000080241
[15112.169893] R13: 00007f8d8142a5e0 R14: 00007f8df9efe4e0 R15: 00007f8d97242f40
[15112.169895] INFO: task DOM Worker:1157 blocked for more than 122 seconds.
[15112.169896]       Not tainted 5.10.79-1-lts #1
[15112.169897] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169897] task:DOM Worker      state:D stack:    0 pid: 1157 ppid:     1 flags:0x00000080
[15112.169898] Call Trace:
[15112.169899]  __schedule+0x295/0x7f0
[15112.169900]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
[15112.169901]  schedule+0x46/0xb0
[15112.169902]  io_schedule+0x42/0x70
[15112.169903]  bit_wait_io+0xd/0x50
[15112.169904]  __wait_on_bit+0x2a/0x90
[15112.169905]  out_of_line_wait_on_bit+0x92/0xb0
[15112.169906]  ? var_wake_function+0x20/0x20
[15112.169909]  do_get_write_access+0x261/0x3c0 [jbd2]
[15112.169912]  jbd2_journal_get_write_access+0x63/0x80 [jbd2]
[15112.169921]  __ext4_journal_get_write_access+0x72/0x120 [ext4]
[15112.169930]  ? __ext4_handle_dirty_metadata+0x4b/0x1a0 [ext4]
[15112.169938]  __ext4_new_inode+0x935/0x16b0 [ext4]
[15112.169939]  ? d_splice_alias+0x8b/0x290
[15112.169947]  ext4_create+0x11c/0x1d0 [ext4]
[15112.169948]  path_openat+0xd56/0xf90
[15112.169949]  do_filp_open+0xa5/0x150
[15112.169950]  do_sys_openat2+0xb0/0x160
[15112.169951]  __x64_sys_openat+0x53/0x90
[15112.169952]  do_syscall_64+0x33/0x40
[15112.169953]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15112.169954] RIP: 0033:0x7f8e49fa1f04
[15112.169954] RSP: 002b:00007f8de24fbf30 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[15112.169955] RAX: ffffffffffffffda RBX: 00007f8e49fa1e30 RCX: 00007f8e49fa1f04
[15112.169955] RDX: 0000000000000641 RSI: 00007f8dd6e478e0 RDI: 00000000ffffff9c
[15112.169956] RBP: 00007f8dd6e478e0 R08: 0000000000000000 R09: 00007f8de506a840
[15112.169956] R10: 0000000000000180 R11: 0000000000000293 R12: 0000000000000641
[15112.169956] R13: 0000000000000003 R14: 00007f8de24fbef0 R15: fffffffffffffffc
[15112.169958] INFO: task mozStorage #4:1235 blocked for more than 122 seconds.
[15112.169958]       Not tainted 5.10.79-1-lts #1
[15112.169959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.169959] task:mozStorage #4   state:D stack:    0 pid: 1235 ppid:     1 flags:0x00000080
[15112.169960] Call Trace:
[15112.169961]  __schedule+0x295/0x7f0
[15112.169962]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
[15112.169963]  schedule+0x46/0xb0
[15112.169963]  io_schedule+0x42/0x70
[15112.169964]  bit_wait_io+0xd/0x50
[15112.169965]  __wait_on_bit+0x2a/0x90
[15112.169966]  out_of_line_wait_on_bit+0x92/0xb0
[15112.169967]  ? var_wake_function+0x20/0x20
[15112.169969]  do_get_write_access+0x261/0x3c0 [jbd2]
[15112.169971]  jbd2_journal_get_write_access+0x63/0x80 [jbd2]
[15112.169978]  __ext4_journal_get_write_access+0x72/0x120 [ext4]
[15112.169986]  ext4_reserve_inode_write+0x7f/0xb0 [ext4]
[15112.169993]  __ext4_mark_inode_dirty+0x6f/0x230 [ext4]
[15112.170001]  ? __ext4_journal_start_sb+0xf3/0x110 [ext4]
[15112.170008]  ext4_dirty_inode+0x5f/0x80 [ext4]
[15112.170009]  __mark_inode_dirty+0x192/0x2f0
[15112.170009]  generic_update_time+0x71/0xd0
[15112.170010]  file_update_time+0x135/0x150
[15112.170011]  ? generic_write_checks+0x69/0xc0
[15112.170018]  ext4_buffered_write_iter+0x5a/0x160 [ext4]
[15112.170019]  new_sync_write+0x159/0x1f0
[15112.170021]  vfs_write+0x1b4/0x270
[15112.170022]  ksys_write+0x67/0xe0
[15112.170022]  do_syscall_64+0x33/0x40
[15112.170023]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15112.170024] RIP: 0033:0x7f8e49fa16ff
[15112.170024] RSP: 002b:00007f8dfd2160c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[15112.170025] RAX: ffffffffffffffda RBX: 0000000000000035 RCX: 00007f8e49fa16ff
[15112.170025] RDX: 0000000000000018 RSI: 00007f8dfd216260 RDI: 0000000000000035
[15112.170026] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[15112.170026] R10: 00007ffc39bcc080 R11: 0000000000000293 R12: 00007f8dfd216260
[15112.170027] R13: 0000000000000018 R14: 00007f8e49816e48 R15: 0000000000050110
[15112.170029] INFO: task LS Thread:1435 blocked for more than 122 seconds.
[15112.170030]       Not tainted 5.10.79-1-lts #1
[15112.170030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15112.170031] task:LS Thread       state:D stack:    0 pid: 1435 ppid:     1 flags:0x00000080
[15112.170031] Call Trace:
[15112.170032]  __schedule+0x295/0x7f0
[15112.170033]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
[15112.170034]  schedule+0x46/0xb0
[15112.170035]  io_schedule+0x42/0x70
[15112.170036]  bit_wait_io+0xd/0x50
[15112.170036]  __wait_on_bit+0x2a/0x90
[15112.170037]  out_of_line_wait_on_bit+0x92/0xb0
[15112.170038]  ? var_wake_function+0x20/0x20
[15112.170040]  do_get_write_access+0x261/0x3c0 [jbd2]
[15112.170043]  jbd2_journal_get_write_access+0x63/0x80 [jbd2]
[15112.170050]  __ext4_journal_get_write_access+0x72/0x120 [ext4]
[15112.170057]  ? __ext4_handle_dirty_metadata+0x4b/0x1a0 [ext4]
[15112.170064]  __ext4_new_inode+0x935/0x16b0 [ext4]
[15112.170072]  ext4_create+0x11c/0x1d0 [ext4]
[15112.170073]  path_openat+0xd56/0xf90
[15112.170075]  do_filp_open+0xa5/0x150
[15112.170076]  do_sys_openat2+0xb0/0x160
[15112.170077]  __x64_sys_openat+0x53/0x90
[15112.170078]  do_syscall_64+0x33/0x40
[15112.170079]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15112.170079] RIP: 0033:0x7f8e49fa1f04
[15112.170080] RSP: 002b:00007f8dd6cfd7c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[15112.170080] RAX: ffffffffffffffda RBX: 00000000000a0042 RCX: 00007f8e49fa1f04
[15112.170081] RDX: 00000000000a0042 RSI: 00007f8d816e63f5 RDI: 00000000ffffff9c
[15112.170081] RBP: 00007f8d816e63f5 R08: 0000000000000000 R09: 0000000000000000
[15112.170082] R10: 00000000000001a4 R11: 0000000000000293 R12: 00000000000a0042
[15112.170082] R13: 00000000000001a4 R14: 00000000000001a4 R15: 00007f8d816e63f5
[15115.484841] systemd[1]: systemd-timesyncd.service: Watchdog timeout (limit 3min)!
[15115.484884] systemd[1]: systemd-timesyncd.service: Killing process 492 (systemd-timesyn) with signal SIGABRT.
[15133.734554] systemd[1]: systemd-journald.service: State 'stop-watchdog' timed out. Killing.
[15133.734591] systemd[1]: systemd-journald.service: Killing process 316 (systemd-journal) with signal SIGKILL.
[15153.138824] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[15153.209346] nvme nvme0: Abort status: 0x371
[15205.733367] systemd[1]: systemd-timesyncd.service: State 'stop-watchdog' timed out. Killing.
[15205.733397] systemd[1]: systemd-timesyncd.service: Killing process 492 (systemd-timesyn) with signal SIGKILL.
[15223.983136] systemd[1]: systemd-journald.service: Processes still around after SIGKILL. Ignoring.
[15281.286890] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[15281.286895] nvme nvme0: Removing after probe failure status: -19
[15295.981995] systemd[1]: systemd-timesyncd.service: Processes still around after SIGKILL. Ignoring.
[15314.231598] systemd[1]: systemd-journald.service: State 'final-sigterm' timed out. Killing.
[15314.231636] systemd[1]: systemd-journald.service: Killing process 316 (systemd-journal) with signal SIGKILL.
[15386.230514] systemd[1]: systemd-timesyncd.service: State 'final-sigterm' timed out. Killing.
[15386.230551] systemd[1]: systemd-timesyncd.service: Killing process 492 (systemd-timesyn) with signal SIGKILL.
[15404.480119] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[15404.480127] systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
[15404.480184] systemd[1]: systemd-journald.service: Unit process 316 (systemd-journal) remains running after unit stopped.
[15404.480451] audit: type=1131 audit(1637158914.329:129): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[15404.480752] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
[15404.481727] systemd[1]: Stopping Flush Journal to Persistent Storage...
[15409.346309] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[15409.346426] blk_update_request: I/O error, dev nvme0n1, sector 83376896 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
[15409.346429] Buffer I/O error on dev nvme0n1p2, logical block 10165856, lost async page write
[15409.346438] blk_update_request: I/O error, dev nvme0n1, sector 1610516856 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[15409.346439] blk_update_request: I/O error, dev nvme0n1, sector 45848616 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
[15409.346441] Buffer I/O error on dev nvme0n1p2, logical block 5474821, lost async page write
[15409.346444] blk_update_request: I/O error, dev nvme0n1, sector 2050176 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
[15409.346445] Buffer I/O error on dev nvme0n1p2, logical block 16, lost async page write
[15409.346447] blk_update_request: I/O error, dev nvme0n1, sector 141248512 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
[15409.346448] Buffer I/O error on dev nvme0n1p2, logical block 17399808, lost async page write
[15409.346451] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 32743852 starting block 201314609)
[15409.346454] blk_update_request: I/O error, dev nvme0n1, sector 199862784 op 0x1:(WRITE) flags 0x800 phys_seg 42 prio class 0
[15409.346501] blk_update_request: I/O error, dev nvme0n1, sector 1265185424 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[15409.346503] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 34411325 starting block 158148179)
[15409.346505] Buffer I/O error on device nvme0n1p3, logical block 137411922
[15409.346508] blk_update_request: I/O error, dev nvme0n1, sector 1265284920 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[15409.346510] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 34415712 starting block 158160616)
[15409.346511] Buffer I/O error on device nvme0n1p3, logical block 137424359
[15409.346566] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 44652893 starting block 201314609)
[15409.346572] Buffer I/O error on device nvme0n1p3, logical block 180578353
[15409.346576] Buffer I/O error on device nvme0n1p3, logical block 180578354
[15409.346625] blk_update_request: I/O error, dev nvme0n1, sector 1605446120 op 0x1:(WRITE) flags 0x800 phys_seg 4 prio class 0
[15409.346626] JBD2: Detected IO errors while flushing file data on nvme0n1p3-8
[15409.346631] blk_update_request: I/O error, dev nvme0n1, sector 160182768 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[15409.346634] Buffer I/O error on dev nvme0n1p2, logical block 19766590, lost async page write
[15409.346637] REISERFS error (device nvme0n1p2): vs-13050 reiserfs_update_sd_size: i/o failure occurred trying to update [4 81977 0x0 SD] stat data
[15409.346643] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 34280897 starting block 200680769)
[15409.346643] Aborting journal on device nvme0n1p3-8.
[15409.346645] REISERFS (device nvme0n1p2): Remounting filesystem read-only
[15409.346650] EXT4-fs error (device nvme0n1p3) in ext4_reserve_inode_write:5761: Journal has aborted
[15409.346652] EXT4-fs error (device nvme0n1p3) in ext4_reserve_inode_write:5761: Journal has aborted
[15409.346654] EXT4-fs error (device nvme0n1p3) in ext4_create:2634: IO failure
[15409.346655] Buffer I/O error on dev nvme0n1p3, logical block 4227072, lost sync page write
[15409.346659] EXT4-fs error (device nvme0n1p3) in ext4_reserve_inode_write:5761: Journal has aborted
[15409.346661] EXT4-fs error (device nvme0n1p3) in ext4_do_update_inode:5170: Journal has aborted
[15409.346663] EXT4-fs (nvme0n1p3): failed to convert unwritten extents to written extents -- potential data loss!  (inode 32743852, error -30)
[15409.346667] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 32704632 starting block 200680765)
[15409.346668] EXT4-fs (nvme0n1p3): Delayed block allocation failed for inode 34316841 at logical offset 0 with max blocks 42 with error 30
[15409.346669] JBD2: Error -5 detected when updating journal superblock for nvme0n1p3-8.
[15409.346672] Buffer I/O error on device nvme0n1p3, logical block 180578351
[15409.346675] Buffer I/O error on dev nvme0n1p3, logical block 0, lost sync page write
[15409.346676] EXT4-fs (nvme0n1p3): This should not happen!! Data will be lost

[15409.346678] EXT4-fs error (device nvme0n1p3) in ext4_writepages:2822: Journal has aborted
[15409.346680] Buffer I/O error on device nvme0n1p3, logical block 180578352
[15409.346682] EXT4-fs (nvme0n1p3): previous I/O error to superblock detected
[15409.346690] Buffer I/O error on dev nvme0n1p2, logical block 3495, lost sync page write
[15409.346692] Buffer I/O error on dev nvme0n1p3, logical block 0, lost sync page write
[15409.346698] Buffer I/O error on dev nvme0n1p2, logical block 3496, lost sync page write
[15409.346700] EXT4-fs (nvme0n1p3): I/O error while writing superblock
[15409.346705] REISERFS warning (device nvme0n1p2): clm-6006 reiserfs_dirty_inode: writing inode 52333 on readonly FS
[15409.346713] EXT4-fs (nvme0n1p3): Delayed block allocation failed for inode 32743936 at logical offset 0 with max blocks 1 with error 30
[15409.346715] EXT4-fs (nvme0n1p3): This should not happen!! Data will be lost

[15409.346718] EXT4-fs (nvme0n1p3): previous I/O error to superblock detected
[15409.346719] EXT4-fs error (device nvme0n1p3) in ext4_writepages:2822: Journal has aborted
[15409.346728] EXT4-fs error (device nvme0n1p3): ext4_setattr:5482: inode #34342442: comm ThreadPoolForeg: mark_inode_dirty error
[15409.346733] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 33751305 starting block 135457398)
[15409.346736] Buffer I/O error on device nvme0n1p3, logical block 114721141
[15409.346742] EXT4-fs (nvme0n1p3): I/O error while writing superblock
[15409.346745] EXT4-fs error (device nvme0n1p3): ext4_ext_tree_init:829: inode #44667536: comm LS Thread: mark_inode_dirty error
[15409.346754] EXT4-fs (nvme0n1p3): I/O error while writing superblock
[15409.346758] EXT4-fs error (device nvme0n1p3): ext4_check_bdev_write_error:215: comm LS Thread: Error while async write back metadata
[15409.346762] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 33751462 starting block 171255069)
[15409.346768] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 33751462 starting block 171254788)
[15409.346773] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:345: I/O error 10 writing to inode 33751462 starting block 171254530)
[15409.346785] EXT4-fs error (device nvme0n1p3): ext4_journal_check_start:83: Detected aborted journal
[15409.346790] EXT4-fs error (device nvme0n1p3): ext4_journal_check_start:83: Detected aborted journal
[15409.346825] Buffer I/O error on device nvme0n1p3, logical block 135136256
[15409.346828] Buffer I/O error on device nvme0n1p3, logical block 135136257
[15409.346831] EXT4-fs error (device nvme0n1p3): ext4_journal_check_start:83: Detected aborted journal
[15409.346833] Buffer I/O error on device nvme0n1p3, logical block 135136258
[15409.347291] EXT4-fs error (device nvme0n1p3): ext4_journal_check_start:83: Detected aborted journal
[15409.348507] systemd[1]: systemd-journal-flush.service: Control process exited, code=exited, status=1/FAILURE
[15409.348615] systemd[1]: systemd-journal-flush.service: Failed with result 'exit-code'.
[15409.348875] systemd[1]: Stopped Flush Journal to Persistent Storage.
[15409.348924] audit: type=1131 audit(1637158919.199:130): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journal-flush comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[15409.349358] systemd[1]: Stopped Journal Service.
[15409.349372] audit: type=1130 audit(1637158919.199:131): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[15409.349377] audit: type=1131 audit(1637158919.199:132): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[15409.350054] audit: type=1334 audit(1637158919.199:133): prog-id=33 op=LOAD
[15409.350149] audit: type=1334 audit(1637158919.199:134): prog-id=34 op=LOAD
[15409.350179] audit: type=1334 audit(1637158919.199:135): prog-id=35 op=LOAD
[15409.350677] systemd[1]: Starting Journal Service...
[15409.351117] Core dump to |/bin/false pipe failed
[15409.351320] audit: type=1701 audit(1637158919.199:136): auid=1000 uid=1000 gid=100 ses=2 pid=45075 comm="steamerrorrepor" exe="/home/thepanu/.local/share/Steam/linux32/steamerrorreporter" sig=7 res=1
[15409.359996] audit: type=1305 audit(1637158919.209:137): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 res=1
[15409.374822] systemd[1]: systemd-timesyncd.service: Main process exited, code=killed, status=9/KILL
[15409.374826] systemd[1]: systemd-timesyncd.service: Failed with result 'watchdog'.
[15409.375073] audit: type=1131 audit(1637158919.229:138): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-timesyncd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[15409.375410] systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 1.
[15409.375497] systemd[1]: Stopped Network Time Synchronization.
[15409.376249] REISERFS error (device nvme0n1p2): zam-7001 reiserfs_find_entry: io error
[15409.376262] systemd[1]: systemd-timesyncd.service: Failed to run 'start' task: Input/output error
[15409.376324] systemd[1]: systemd-timesyncd.service: Failed with result 'resources'.
[15409.376490] systemd[1]: Failed to start Network Time Synchronization.
[15409.376654] systemd[1]: Started Journal Service.
[15409.378397] REISERFS error (device nvme0n1p2): zam-7001 reiserfs_find_entry: io error
[15409.379498] REISERFS error (device nvme0n1p2): zam-7001 reiserfs_find_entry: io error
[15409.380823] REISERFS error (device nvme0n1p2): zam-7001 reiserfs_find_entry: io error
[15409.382084] REISERFS error (device nvme0n1p2): zam-7001 reiserfs_find_entry: io error
[15409.384713] Core dump to |/bin/false pipe failed

lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0 931.5G  0 disk 
├─sda1        8:1    0  1000M  0 part /boot
├─sda2        8:2    0 175.8G  0 part /
└─sda3        8:3    0 754.8G  0 part /data/extra
nvme0n1     259:0    0 931.5G  0 disk 
├─nvme0n1p1 259:1    0  1000M  0 part 
├─nvme0n1p2 259:2    0  78.1G  0 part 
└─nvme0n1p3 259:3    0 852.4G  0 part /mnt/siirto

Smartctl -a

=== START OF INFORMATION SECTION ===
Model Number:                       CT1000P2SSD8
Serial Number:                      2113E5915F23
Firmware Version:                   P2CR033
PCI Vendor/Subsystem ID:            0xc0a9
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 5090000037
Local Time is:                      Fri Nov 19 20:05:51 2021 EET
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.50W       -        -    0  0  0  0        0       0
 1 +     1.90W       -        -    1  1  1  1        0       0
 2 +     1.50W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3     5000    1900
 4 -   0.0020W       -        -    4  4  4  4    13000  100000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         1
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    0%
Data Units Read:                    837,322 [428 GB]
Data Units Written:                 2,404,262 [1.23 TB]
Host Read Commands:                 5,758,694
Host Write Commands:                7,440,289
Controller Busy Time:               255
Power Cycles:                       44
Power On Hours:                     88
Unsafe Shutdowns:                   20
Media and Data Integrity Errors:    0
Error Information Log Entries:      97
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 16 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0         97     0  0x0001  0x4005  0x028            0     0     -

Drive firmware is on the latest version.

nvme smart-log

Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning			: 0
temperature				: 36 C (309 Kelvin)
available_spare				: 100%
available_spare_threshold		: 5%
percentage_used				: 0%
endurance group critical warning summary: 0
data_units_read				: 837,322
data_units_written			: 2,404,262
host_read_commands			: 5,758,694
host_write_commands			: 7,440,289
controller_busy_time			: 255
power_cycles				: 44
power_on_hours				: 88
unsafe_shutdowns			: 20
media_errors				: 0
num_err_log_entries			: 97
Warning Temperature Time		: 0
Critical Composite Temperature Time	: 0
Thermal Management T1 Trans Count	: 0
Thermal Management T2 Trans Count	: 0
Thermal Management T1 Total Time	: 0
Thermal Management T2 Total Time	: 0

Last edited by thepanu (2021-11-20 08:28:53)

Offline

Board footer

Powered by FluxBB