You are not logged in.

#1 2023-06-25 15:24:50

gagootron
Member
Registered: 2023-06-25
Posts: 5

NVME errors with amd_iommu enabled

My initial goal was to get GPU passthrough to a Windows vm to work. However some of my nvme disks will fail when iommu is enabled.

I got a PC with the following hardware:

  • ASUS Prime X670-P Motherboard

  • AMD Ryzen 7 7800X3D CPU

  • AMD RX 7900XTX GPU

  • 2x16G GskillTrident Z5 DDR5 RAM

  • Samsung 990 PRO 1TB (root disk)

  • 2x Lexar NM620 1TB (somehow with different controllers)

When I enable iommu in the bios one of my Lexar SSDs will spam my dmesg with the following:

Jun 25 16:37:41.347672 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10000 flags=0x0020]
Jun 25 16:37:41.347787 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10300 flags=0x0020]
Jun 25 16:37:41.347843 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10400 flags=0x0020]
Jun 25 16:37:41.347897 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10500 flags=0x0020]
Jun 25 16:37:41.347947 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10600 flags=0x0020]
Jun 25 16:37:41.347995 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10700 flags=0x0020]
Jun 25 16:37:41.348042 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10800 flags=0x0020]
Jun 25 16:37:41.348088 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10b00 flags=0x0020]
Jun 25 16:37:41.348135 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10c00 flags=0x0020]
Jun 25 16:37:41.348181 beast kernel: nvme 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xffc10e00 flags=0x0020]

The SSD does work for a few minutes, but then the nvme driver crashes. The other two SSDs show no problems in this setup.

I figured out that setting the kernel parameter iommu=pt fixes the Lexar SSD, but then my Samsung SSD fails instead.
This time there is no warning in dmesg and instead a kernel trace is produced, followed by my system crashing.

Jun 04 18:26:13.471177 beast kernel: CPU: 14 PID: 4873 Comm: worker Tainted: G      D W          6.3.5-arch1-1 #1 649d963afc0261175aabf0511660febbb7b06177
Jun 04 18:26:13.471186 beast kernel: Hardware name: ASUS System Product Name/PRIME X670-P, BIOS 1616 05/16/2023
Jun 04 18:26:13.471191 beast kernel: RIP: 0010:nvme_setup_cmd+0x1b6/0x4c0 [nvme_core]
Jun 04 18:26:13.471196 beast kernel: Code: 0f 84 c0 01 00 00 41 89 d4 41 89 d5 41 c1 ec 11 41 83 e4 01 41 c1 e4 0e 41 81 e5 00 01 08 00 0f 85 a2 02 00 00 ba 01 00 00 00 <66> 89 55 00 49 8b 40 48 8b 80 98 01 00 00 48 c7 45 08 00 00 00 00
Jun 04 18:26:13.471202 beast kernel: RSP: 0018:ffffafa9041cfaf8 EFLAGS: 00010246
Jun 04 18:26:13.471207 beast kernel: RAX: 0000000000000001 RBX: ffff8a9492000400 RCX: ffffffffc04c96c0
Jun 04 18:26:13.471214 beast kernel: RDX: 0000000000000001 RSI: ffff8a9492000400 RDI: 0000000000000000
Jun 04 18:26:13.471219 beast kernel: RBP: 0029003c40298028 R08: ffff8a9481035c00 R09: 0000000000000000
Jun 04 18:26:13.471224 beast kernel: R10: ffff8a948c895978 R11: ffff8a9a8739c000 R12: 0000000000000000
Jun 04 18:26:13.471228 beast kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff8a9492000400
Jun 04 18:26:13.471233 beast kernel: FS:  00007f46bd1e66c0(0000) GS:ffff8a9bb8980000(0000) knlGS:0000000000000000
Jun 04 18:26:13.471237 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 04 18:26:13.471242 beast kernel: CR2: 00000191c0190000 CR3: 0000000208142000 CR4: 0000000000750ee0
Jun 04 18:26:13.471247 beast kernel: PKRU: 55555554
Jun 04 18:26:13.471253 beast kernel: Call Trace:
Jun 04 18:26:13.471257 beast kernel:  <TASK>
Jun 04 18:26:13.471262 beast kernel:  ? die+0x36/0x90
Jun 04 18:26:13.471266 beast kernel:  ? do_trap+0xda/0x100
Jun 04 18:26:13.471271 beast kernel:  ? do_error_trap+0x6a/0x90
Jun 04 18:26:13.471276 beast kernel:  ? exc_stack_segment+0x37/0x50
Jun 04 18:26:13.471281 beast kernel:  ? asm_exc_stack_segment+0x26/0x30
Jun 04 18:26:13.471286 beast kernel:  ? nvme_setup_cmd+0x1b6/0x4c0 [nvme_core 326020f8627a4fd401aa10b0ef07483848e1278e]
Jun 04 18:26:13.471291 beast kernel:  ? ktime_get+0x3c/0xa0
Jun 04 18:26:13.471296 beast kernel:  nvme_queue_rqs+0xa9/0x280 [nvme c888d86cb9fd4787541f3b2b7b1071619b219fa6]
Jun 04 18:26:13.471300 beast kernel:  blk_mq_flush_plug_list+0x2e6/0x310
Jun 04 18:26:13.471305 beast kernel:  __blk_flush_plug+0x102/0x160
Jun 04 18:26:13.471310 beast kernel:  blk_finish_plug+0x29/0x40
Jun 04 18:26:13.471316 beast kernel:  ext4_do_writepages+0x491/0xd10 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 18:26:13.471321 beast kernel:  ext4_writepages+0xaf/0x160 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 18:26:13.471326 beast kernel:  do_writepages+0xcf/0x1e0
Jun 04 18:26:13.471331 beast kernel:  filemap_fdatawrite_wbc+0x63/0x90
Jun 04 18:26:13.471335 beast kernel:  __filemap_fdatawrite_range+0x5c/0x80
Jun 04 18:26:13.471340 beast kernel:  file_write_and_wait_range+0x4a/0xb0
Jun 04 18:26:13.471344 beast kernel:  ext4_sync_file+0x101/0x3a0 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 18:26:13.471349 beast kernel:  __x64_sys_fdatasync+0x4c/0x90
Jun 04 18:26:13.471354 beast kernel:  do_syscall_64+0x5d/0x90
Jun 04 18:26:13.471358 beast kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 18:26:13.471364 beast kernel:  ? do_syscall_64+0x6c/0x90
Jun 04 18:26:13.471369 beast kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jun 04 18:26:13.471374 beast kernel: RIP: 0033:0x7f4b31b8d72a
Jun 04 18:26:13.471379 beast kernel: Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 b3 30 f8 ff 8b 7c 24 0c 89 c2 b8 4b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 13 31 f8 ff 8b 44 24
Jun 04 18:26:13.471384 beast kernel: RSP: 002b:00007f46bd1e56c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Jun 04 18:26:13.471389 beast kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4b31b8d72a
Jun 04 18:26:13.471394 beast kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000d
Jun 04 18:26:13.471399 beast kernel: RBP: 000055820af89140 R08: 0000000000000000 R09: 000055820ad3a6f4
Jun 04 18:26:13.471403 beast kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 000055820ad3a660
Jun 04 18:26:13.471410 beast kernel: R13: 0000558209990ed0 R14: 00007ffe4fbcaad0 R15: 00007f46bc9e6000
Jun 04 18:26:13.471417 beast kernel:  </TASK>
Jun 04 18:26:13.471423 beast kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bridge stp llc uinput cmac algif_hash algif_skcipher af_alg bnep hid_logitech_hidpp mousedev joydev xpad hid_logitech_dj ff_memless vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel uvcvideo snd_intel_dspcfg btusb videobuf2_vmalloc snd_usb_audio btrtl crct10dif_pclmul snd_intel_sdw_acpi crc32_pclmul uvc btbcm snd_hda_codec videobuf2_memops btintel snd_usbmidi_lib polyval_clmulni videobuf2_v4l2 polyval_generic btmtk snd_hda_core gf128mul r8169 snd_rawmidi ghash_clmulni_intel snd_hwdep eeepc_wmi videodev snd_seq_device sha512_ssse3 asus_wmi bluetooth aesni_intel snd_pcm realtek sp5100_tco ledtrig_audio snd_timer
Jun 04 18:26:13.471461 beast kernel:  sparse_keymap crypto_simd platform_profile videobuf2_common ecdh_generic mdio_devres snd cryptd libphy rapl rfkill wmi_bmof pcspkr i2c_piix4 ccp mc soundcore gpio_amdpt gpio_generic acpi_cpufreq mac_hid i2c_dev dm_multipath crypto_user fuse loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage usbhid amdgpu i2c_algo_bit drm_ttm_helper dm_mod ttm drm_buddy nvme gpu_sched crc32c_intel sr_mod drm_display_helper nvme_core xhci_pci cdrom video nvme_common xhci_pci_renesas cec wmi vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
Jun 04 18:26:13.471475 beast kernel: ---[ end trace 0000000000000000 ]---
Jun 04 18:26:13.471480 beast kernel: RIP: 0010:nvme_setup_cmd+0x1b6/0x4c0 [nvme_core]
Jun 04 18:26:13.471485 beast kernel: Code: 0f 84 c0 01 00 00 41 89 d4 41 89 d5 41 c1 ec 11 41 83 e4 01 41 c1 e4 0e 41 81 e5 00 01 08 00 0f 85 a2 02 00 00 ba 01 00 00 00 <66> 89 55 00 49 8b 40 48 8b 80 98 01 00 00 48 c7 45 08 00 00 00 00
Jun 04 18:26:13.471490 beast kernel: RSP: 0018:ffffafa90073fb60 EFLAGS: 00010246
Jun 04 18:26:13.471494 beast kernel: RAX: 0000000000000001 RBX: ffff8a9492200200 RCX: 0000000000000000
Jun 04 18:26:13.471499 beast kernel: RDX: 0000000000000001 RSI: ffff8a9492200200 RDI: 0000000000000000
Jun 04 18:26:13.471503 beast kernel: RBP: 0029003c40298028 R08: ffff8a9480e57000 R09: 0000000000000000
Jun 04 18:26:13.471508 beast kernel: R10: 0000000000000000 R11: 0000000000000199 R12: 0000000000000000
Jun 04 18:26:13.471513 beast kernel: R13: 0000000000000000 R14: ffff8a948bd15000 R15: 0000000000000000
Jun 04 18:26:13.471517 beast kernel: FS:  00007f46bd1e66c0(0000) GS:ffff8a9bb8980000(0000) knlGS:0000000000000000
Jun 04 18:26:13.471522 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 04 18:26:13.471526 beast kernel: CR2: 00000191c0190000 CR3: 0000000208142000 CR4: 0000000000750ee0
Jun 04 18:26:13.471531 beast kernel: PKRU: 55555554
Jun 04 18:26:13.471535 beast kernel: ------------[ cut here ]------------
Jun 04 18:26:13.471541 beast kernel: WARNING: CPU: 14 PID: 4873 at kernel/exit.c:814 do_exit+0x8a9/0xaf0
Jun 04 18:26:13.471545 beast kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bridge stp llc uinput cmac algif_hash algif_skcipher af_alg bnep hid_logitech_hidpp mousedev joydev xpad hid_logitech_dj ff_memless vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel uvcvideo snd_intel_dspcfg btusb videobuf2_vmalloc snd_usb_audio btrtl crct10dif_pclmul snd_intel_sdw_acpi crc32_pclmul uvc btbcm snd_hda_codec videobuf2_memops btintel snd_usbmidi_lib polyval_clmulni videobuf2_v4l2 polyval_generic btmtk snd_hda_core gf128mul r8169 snd_rawmidi ghash_clmulni_intel snd_hwdep eeepc_wmi videodev snd_seq_device sha512_ssse3 asus_wmi bluetooth aesni_intel snd_pcm realtek sp5100_tco ledtrig_audio snd_timer
Jun 04 18:26:13.471576 beast kernel:  sparse_keymap crypto_simd platform_profile videobuf2_common ecdh_generic mdio_devres snd cryptd libphy rapl rfkill wmi_bmof pcspkr i2c_piix4 ccp mc soundcore gpio_amdpt gpio_generic acpi_cpufreq mac_hid i2c_dev dm_multipath crypto_user fuse loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage usbhid amdgpu i2c_algo_bit drm_ttm_helper dm_mod ttm drm_buddy nvme gpu_sched crc32c_intel sr_mod drm_display_helper nvme_core xhci_pci cdrom video nvme_common xhci_pci_renesas cec wmi vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
Jun 04 18:26:13.471584 beast kernel: CPU: 14 PID: 4873 Comm: worker Tainted: G      D W          6.3.5-arch1-1 #1 649d963afc0261175aabf0511660febbb7b06177
Jun 04 18:26:13.471589 beast kernel: Hardware name: ASUS System Product Name/PRIME X670-P, BIOS 1616 05/16/2023
Jun 04 18:26:13.471594 beast kernel: RIP: 0010:do_exit+0x8a9/0xaf0
Jun 04 18:26:13.471598 beast kernel: Code: 89 ab 18 06 00 00 4c 89 a3 20 06 00 00 48 89 6c 24 10 e9 15 fe ff ff 48 8b bb 00 06 00 00 31 f6 e8 2c d9 ff ff e9 c9 fd ff ff <0f> 0b e9 ce f7 ff ff 4c 89 e6 bf 05 06 00 00 e8 83 16 01 00 e9 6b
Jun 04 18:26:13.471603 beast kernel: RSP: 0018:ffffafa9041cfed8 EFLAGS: 00010282
Jun 04 18:26:13.471608 beast kernel: RAX: 0000000000000000 RBX: ffff8a99deba0000 RCX: 0000000000000000
Jun 04 18:26:13.471613 beast kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: ffff8a94fcf02100
Jun 04 18:26:13.471617 beast kernel: RBP: ffff8a954bedd580 R08: 0000000000000000 R09: ffffafa9041cf820
Jun 04 18:26:13.471622 beast kernel: R10: 0000000000000003 R11: ffffffff9d4ca1e8 R12: 000000000000000b
Jun 04 18:26:13.471628 beast kernel: R13: ffff8a94fcf02100 R14: ffffafa9041cfa48 R15: ffff8a99deba0000
Jun 04 18:26:13.471633 beast kernel: FS:  00007f46bd1e66c0(0000) GS:ffff8a9bb8980000(0000) knlGS:0000000000000000
Jun 04 18:26:13.471638 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 04 18:26:13.471643 beast kernel: CR2: 00000191c0190000 CR3: 0000000208142000 CR4: 0000000000750ee0
Jun 04 18:26:13.471647 beast kernel: PKRU: 55555554
Jun 04 18:26:13.471652 beast kernel: Call Trace:
Jun 04 18:26:13.471656 beast kernel:  <TASK>
Jun 04 18:26:13.471661 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 04 18:26:13.471666 beast kernel:  ? __warn+0x81/0x130
Jun 04 18:26:13.471670 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 04 18:26:13.471674 beast kernel:  ? report_bug+0x171/0x1a0
Jun 04 18:26:13.471679 beast kernel:  ? handle_bug+0x3c/0x80
Jun 04 18:26:13.471684 beast kernel:  ? exc_invalid_op+0x17/0x70
Jun 04 18:26:13.471688 beast kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jun 04 18:26:13.471693 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 04 18:26:13.471698 beast kernel:  ? do_exit+0x70/0xaf0
Jun 04 18:26:13.471704 beast kernel:  make_task_dead+0x81/0x170
Jun 04 18:26:13.471709 beast kernel:  rewind_stack_and_make_dead+0x17/0x20
Jun 04 18:26:13.471713 beast kernel: RIP: 0033:0x7f4b31b8d72a
Jun 04 18:26:13.471718 beast kernel: Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 b3 30 f8 ff 8b 7c 24 0c 89 c2 b8 4b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 13 31 f8 ff 8b 44 24
Jun 04 18:26:13.471723 beast kernel: RSP: 002b:00007f46bd1e56c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Jun 04 18:26:13.471727 beast kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4b31b8d72a
Jun 04 18:26:13.471732 beast kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000d
Jun 04 18:26:13.471738 beast kernel: RBP: 000055820af89140 R08: 0000000000000000 R09: 000055820ad3a6f4
Jun 04 18:26:13.471743 beast kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 000055820ad3a660
Jun 04 18:26:13.471747 beast kernel: R13: 0000558209990ed0 R14: 00007ffe4fbcaad0 R15: 00007f46bc9e6000
Jun 04 18:26:13.471752 beast kernel:  </TASK>
Jun 04 18:26:13.471756 beast kernel: ---[ end trace 0000000000000000 ]---
Jun 04 18:26:13.471761 beast kernel: stack segment: 0000 [#3] PREEMPT SMP NOPTI
Jun 04 18:26:13.471765 beast kernel: CPU: 14 PID: 396 Comm: systemd-journal Tainted: G      D W          6.3.5-arch1-1 #1 649d963afc0261175aabf0511660febbb7b06177
Jun 04 18:26:13.471771 beast kernel: Hardware name: ASUS System Product Name/PRIME X670-P, BIOS 1616 05/16/2023
Jun 04 18:26:13.471775 beast kernel: RIP: 0010:nvme_setup_cmd+0x1b6/0x4c0 [nvme_core]
Jun 04 18:26:13.471780 beast kernel: Code: 0f 84 c0 01 00 00 41 89 d4 41 89 d5 41 c1 ec 11 41 83 e4 01 41 c1 e4 0e 41 81 e5 00 01 08 00 0f 85 a2 02 00 00 ba 01 00 00 00 <66> 89 55 00 49 8b 40 48 8b 80 98 01 00 00 48 c7 45 08 00 00 00 00
Jun 04 18:26:13.471785 beast kernel: RSP: 0018:ffffafa90276fad0 EFLAGS: 00010246
Jun 04 18:26:13.471789 beast kernel: RAX: 0000000000000001 RBX: ffff8a9492000600 RCX: ffffffffc04c96c0
Jun 04 18:26:13.471794 beast kernel: RDX: 0000000000000001 RSI: ffff8a9492000600 RDI: 0000000000000000
Jun 04 18:26:13.471798 beast kernel: RBP: 0029003c40298028 R08: ffff8a9481035c00 R09: 0000000000000000
Jun 04 18:26:13.471803 beast kernel: R10: ffff8a948c894078 R11: ffff8a9a8739f000 R12: 0000000000000000
Jun 04 18:26:13.471807 beast kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff8a9492000600
Jun 04 18:26:13.471812 beast kernel: FS:  00007f39e8781200(0000) GS:ffff8a9bb8980000(0000) knlGS:0000000000000000
Jun 04 18:26:13.471818 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 04 18:26:13.471824 beast kernel: CR2: 00007f39e8924010 CR3: 0000000108642000 CR4: 0000000000750ee0
Jun 04 18:26:13.471829 beast kernel: PKRU: 55555554
Jun 04 18:26:13.471835 beast kernel: Call Trace:
Jun 04 18:26:13.471839 beast kernel:  <TASK>
Jun 04 18:26:13.471843 beast kernel:  ? die+0x36/0x90
Jun 04 18:26:13.471848 beast kernel:  ? do_trap+0xda/0x100
Jun 04 18:26:13.471852 beast kernel:  ? do_error_trap+0x6a/0x90
Jun 04 18:26:13.471857 beast kernel:  ? exc_stack_segment+0x37/0x50
Jun 04 18:26:13.471861 beast kernel:  ? asm_exc_stack_segment+0x26/0x30
Jun 04 18:26:13.471866 beast kernel:  ? nvme_setup_cmd+0x1b6/0x4c0 [nvme_core 326020f8627a4fd401aa10b0ef07483848e1278e]
Jun 04 18:26:13.471871 beast kernel:  nvme_queue_rqs+0xa9/0x280 [nvme c888d86cb9fd4787541f3b2b7b1071619b219fa6]
Jun 04 18:26:13.471875 beast kernel:  blk_mq_flush_plug_list+0x2e6/0x310
Jun 04 18:26:13.471880 beast kernel:  __blk_flush_plug+0x102/0x160
Jun 04 18:26:13.471884 beast kernel:  blk_finish_plug+0x29/0x40
Jun 04 18:26:13.471889 beast kernel:  ext4_do_writepages+0x491/0xd10 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 18:26:13.471894 beast kernel:  ext4_writepages+0xaf/0x160 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 18:26:13.471901 beast kernel:  do_writepages+0xcf/0x1e0
Jun 04 18:26:13.471909 beast kernel:  ? __rseq_handle_notify_resume+0xa5/0x4e0
Jun 04 18:26:13.471914 beast kernel:  filemap_fdatawrite_wbc+0x63/0x90
Jun 04 18:26:13.471919 beast kernel:  __filemap_fdatawrite_range+0x5c/0x80
Jun 04 18:26:13.471923 beast kernel:  file_write_and_wait_range+0x4a/0xb0
Jun 04 18:26:13.471928 beast kernel:  ext4_sync_file+0x101/0x3a0 [ext4 a02705cb9706762da0ddc54e14d9aa45178f7020]
Jun 04 18:26:13.471933 beast kernel:  __x64_sys_fsync+0x3b/0x70
Jun 04 18:26:13.471937 beast kernel:  do_syscall_64+0x5d/0x90
Jun 04 18:26:13.471942 beast kernel:  ? exc_page_fault+0x7c/0x180
Jun 04 18:26:13.471947 beast kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jun 04 18:26:13.471952 beast kernel: RIP: 0033:0x7f39e831666a
Jun 04 18:26:13.471956 beast kernel: Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 73 31 f8 ff 8b 7c 24 0c 89 c2 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 d3 31 f8 ff 8b 44 24
Jun 04 18:26:13.471962 beast kernel: RSP: 002b:00007ffcecc6b120 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jun 04 18:26:13.471967 beast kernel: RAX: ffffffffffffffda RBX: 0000559016dac190 RCX: 00007f39e831666a
Jun 04 18:26:13.471987 beast kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001b
Jun 04 18:26:13.471992 beast kernel: RBP: 000000000000006d R08: 0000000000000001 R09: 00007ffcecc6b3f8
Jun 04 18:26:13.471997 beast kernel: R10: 7e32b2fe6d102153 R11: 0000000000000293 R12: 0000000000000001
Jun 04 18:26:13.472001 beast kernel: R13: 00007ffcecc6b268 R14: 00007ffcecc6b260 R15: 0000559016dac190
Jun 04 18:26:13.472006 beast kernel:  </TASK>
Jun 04 18:26:13.472011 beast kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bridge stp llc uinput cmac algif_hash algif_skcipher af_alg bnep hid_logitech_hidpp mousedev joydev xpad hid_logitech_dj ff_memless vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel uvcvideo snd_intel_dspcfg btusb videobuf2_vmalloc snd_usb_audio btrtl crct10dif_pclmul snd_intel_sdw_acpi crc32_pclmul uvc btbcm snd_hda_codec videobuf2_memops btintel snd_usbmidi_lib polyval_clmulni videobuf2_v4l2 polyval_generic btmtk snd_hda_core gf128mul r8169 snd_rawmidi ghash_clmulni_intel snd_hwdep eeepc_wmi videodev snd_seq_device sha512_ssse3 asus_wmi bluetooth aesni_intel snd_pcm realtek sp5100_tco ledtrig_audio snd_timer
Jun 04 18:26:13.472019 beast kernel:  sparse_keymap crypto_simd platform_profile videobuf2_common ecdh_generic mdio_devres snd cryptd libphy rapl rfkill wmi_bmof pcspkr i2c_piix4 ccp mc soundcore gpio_amdpt gpio_generic acpi_cpufreq mac_hid i2c_dev dm_multipath crypto_user fuse loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage usbhid amdgpu i2c_algo_bit drm_ttm_helper dm_mod ttm drm_buddy nvme gpu_sched crc32c_intel sr_mod drm_display_helper nvme_core xhci_pci cdrom video nvme_common xhci_pci_renesas cec wmi vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
Jun 04 18:26:13.472027 beast kernel: ---[ end trace 0000000000000000 ]---
Jun 04 18:26:13.472032 beast kernel: RIP: 0010:nvme_setup_cmd+0x1b6/0x4c0 [nvme_core]
Jun 04 18:26:13.472036 beast kernel: Code: 0f 84 c0 01 00 00 41 89 d4 41 89 d5 41 c1 ec 11 41 83 e4 01 41 c1 e4 0e 41 81 e5 00 01 08 00 0f 85 a2 02 00 00 ba 01 00 00 00 <66> 89 55 00 49 8b 40 48 8b 80 98 01 00 00 48 c7 45 08 00 00 00 00
Jun 04 18:26:13.472042 beast kernel: RSP: 0018:ffffafa90073fb60 EFLAGS: 00010246
Jun 04 18:26:13.472046 beast kernel: RAX: 0000000000000001 RBX: ffff8a9492200200 RCX: 0000000000000000
Jun 04 18:26:13.472051 beast kernel: RDX: 0000000000000001 RSI: ffff8a9492200200 RDI: 0000000000000000
Jun 04 18:26:13.472055 beast kernel: RBP: 0029003c40298028 R08: ffff8a9480e57000 R09: 0000000000000000
Jun 04 18:26:13.472060 beast kernel: R10: 0000000000000000 R11: 0000000000000199 R12: 0000000000000000
Jun 04 18:26:13.472064 beast kernel: R13: 0000000000000000 R14: ffff8a948bd15000 R15: 0000000000000000
Jun 04 18:26:13.472070 beast kernel: FS:  00007f39e8781200(0000) GS:ffff8a9bb8980000(0000) knlGS:0000000000000000
Jun 04 18:26:13.472075 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 04 18:26:13.472080 beast kernel: CR2: 00007f39e8924010 CR3: 0000000108642000 CR4: 0000000000750ee0
Jun 04 18:26:13.472084 beast kernel: PKRU: 55555554
Jun 04 18:26:13.472089 beast kernel: ------------[ cut here ]------------
Jun 04 18:26:13.472093 beast kernel: WARNING: CPU: 14 PID: 396 at kernel/exit.c:814 do_exit+0x8a9/0xaf0
Jun 04 18:26:13.472098 beast kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bridge stp llc uinput cmac algif_hash algif_skcipher af_alg bnep hid_logitech_hidpp mousedev joydev xpad hid_logitech_dj ff_memless vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel uvcvideo snd_intel_dspcfg btusb videobuf2_vmalloc snd_usb_audio btrtl crct10dif_pclmul snd_intel_sdw_acpi crc32_pclmul uvc btbcm snd_hda_codec videobuf2_memops btintel snd_usbmidi_lib polyval_clmulni videobuf2_v4l2 polyval_generic btmtk snd_hda_core gf128mul r8169 snd_rawmidi ghash_clmulni_intel snd_hwdep eeepc_wmi videodev snd_seq_device sha512_ssse3 asus_wmi bluetooth aesni_intel snd_pcm realtek sp5100_tco ledtrig_audio snd_timer
Jun 04 18:26:13.472106 beast kernel:  sparse_keymap crypto_simd platform_profile videobuf2_common ecdh_generic mdio_devres snd cryptd libphy rapl rfkill wmi_bmof pcspkr i2c_piix4 ccp mc soundcore gpio_amdpt gpio_generic acpi_cpufreq mac_hid i2c_dev dm_multipath crypto_user fuse loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage usbhid amdgpu i2c_algo_bit drm_ttm_helper dm_mod ttm drm_buddy nvme gpu_sched crc32c_intel sr_mod drm_display_helper nvme_core xhci_pci cdrom video nvme_common xhci_pci_renesas cec wmi vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
Jun 04 18:26:13.472113 beast kernel: CPU: 14 PID: 396 Comm: systemd-journal Tainted: G      D W          6.3.5-arch1-1 #1 649d963afc0261175aabf0511660febbb7b06177
Jun 04 18:26:13.472117 beast kernel: Hardware name: ASUS System Product Name/PRIME X670-P, BIOS 1616 05/16/2023
Jun 04 18:26:13.472122 beast kernel: RIP: 0010:do_exit+0x8a9/0xaf0
Jun 04 18:26:13.472127 beast kernel: Code: 89 ab 18 06 00 00 4c 89 a3 20 06 00 00 48 89 6c 24 10 e9 15 fe ff ff 48 8b bb 00 06 00 00 31 f6 e8 2c d9 ff ff e9 c9 fd ff ff <0f> 0b e9 ce f7 ff ff 4c 89 e6 bf 05 06 00 00 e8 83 16 01 00 e9 6b
Jun 04 18:26:13.472132 beast kernel: RSP: 0018:ffffafa90276fed8 EFLAGS: 00010282
Jun 04 18:26:13.472136 beast kernel: RAX: 0000000400000000 RBX: ffff8a9485e2a700 RCX: 0000000000000000
Jun 04 18:26:13.472141 beast kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: ffff8a948b639080
Jun 04 18:26:13.472146 beast kernel: RBP: ffff8a948bc1e780 R08: 0000000000000000 R09: ffffafa90276f800
Jun 04 18:26:13.472151 beast kernel: R10: 0000000000000003 R11: ffffffff9d4ca1e8 R12: 000000000000000b
Jun 04 18:26:13.472155 beast kernel: R13: ffff8a948b639080 R14: ffffafa90276fa28 R15: ffff8a9485e2a700
Jun 04 18:26:13.472160 beast kernel: FS:  00007f39e8781200(0000) GS:ffff8a9bb8980000(0000) knlGS:0000000000000000
Jun 04 18:26:13.472165 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 04 18:26:13.472169 beast kernel: CR2: 00007f39e8924010 CR3: 0000000108642000 CR4: 0000000000750ee0
Jun 04 18:26:13.472173 beast kernel: PKRU: 55555554
Jun 04 18:26:13.472178 beast kernel: Call Trace:
Jun 04 18:26:13.472182 beast kernel:  <TASK>
Jun 04 18:26:13.472187 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 04 18:26:13.472191 beast kernel:  ? __warn+0x81/0x130
Jun 04 18:26:13.472195 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 04 18:26:13.472200 beast kernel:  ? report_bug+0x171/0x1a0
Jun 04 18:26:13.472204 beast kernel:  ? handle_bug+0x3c/0x80
Jun 04 18:26:13.472209 beast kernel:  ? exc_invalid_op+0x17/0x70
Jun 04 18:26:13.472214 beast kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jun 04 18:26:13.472218 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 04 18:26:13.472224 beast kernel:  ? do_exit+0x70/0xaf0
Jun 04 18:26:13.472229 beast kernel:  ? do_syscall_64+0x5d/0x90
Jun 04 18:26:13.472234 beast kernel:  make_task_dead+0x81/0x170
Jun 04 18:26:13.472238 beast kernel:  rewind_stack_and_make_dead+0x17/0x20
Jun 04 18:26:13.472243 beast kernel: RIP: 0033:0x7f39e831666a
Jun 04 18:26:13.472247 beast kernel: Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 73 31 f8 ff 8b 7c 24 0c 89 c2 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 d3 31 f8 ff 8b 44 24
Jun 04 18:26:13.472252 beast kernel: RSP: 002b:00007ffcecc6b120 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jun 04 18:26:13.472257 beast kernel: RAX: ffffffffffffffda RBX: 0000559016dac190 RCX: 00007f39e831666a
Jun 04 18:26:13.472261 beast kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001b
Jun 04 18:26:13.472266 beast kernel: RBP: 000000000000006d R08: 0000000000000001 R09: 00007ffcecc6b3f8
Jun 04 18:26:13.472270 beast kernel: R10: 7e32b2fe6d102153 R11: 0000000000000293 R12: 0000000000000001
Jun 04 18:26:13.472275 beast kernel: R13: 00007ffcecc6b268 R14: 00007ffcecc6b260 R15: 0000559016dac190
Jun 04 18:26:13.472280 beast kernel:  </TASK>
Jun 04 18:26:13.472284 beast kernel: ---[ end trace 0000000000000000 ]---

Disabling iommu by setting amd_iommu=off prevents nvme issues, but of course also prevents GPU passthrough.(See below)

The time my Samsung SSD lasts before failing is seemingly random. I once managed to start my VM with GPU passthrough and even play a game for a bit before it failed. Another time i couldn't even login before crashing.

I'm running the lastest updates, both for arch and the motherboard.

output of uname -a

Linux beast 6.3.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 21 Jun 2023 20:46:20 +0000 x86_64 GNU/Linux

output of lspci -knn

00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14d8]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: pcieport
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: pcieport
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: pcieport
00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: pcieport
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14dd]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: pcieport
00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14dd]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: pcieport
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71)
	Subsystem: ASUSTeK Computer Inc. FCH SMBus Controller [1043:8877]
	Kernel driver in use: piix4_smbus
	Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
	Subsystem: ASUSTeK Computer Inc. FCH LPC Bridge [1043:8877]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e0]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e1]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e2]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e3]
	Kernel modules: k10temp
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e4]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e5]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e6]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e7]
01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 10)
	Kernel driver in use: pcieport
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 10)
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
	Kernel driver in use: pcieport
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8)
	Subsystem: Sapphire Technology Limited Navi 31 [Radeon RX 7900 XT/7900 XTX] [1da2:471e]
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:ab30]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:ab30]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a80c]
	Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
	Kernel driver in use: nvme
	Kernel modules: nvme
05:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f4] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
06:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
06:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
06:0c.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
06:0d.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
08:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f4] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
09:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
09:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
09:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
09:0c.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
09:0d.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:3328]
	Kernel driver in use: pcieport
0b:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
	DeviceName: Realtek RTL8125BG LAN
	Subsystem: ASUSTeK Computer Inc. RTL8125 2.5GbE Controller [1043:87d7]
	Kernel driver in use: r8169
	Kernel modules: r8169
0c:00.0 Non-Volatile memory controller [0108]: Shenzhen Longsys Electronics Co., Ltd. Device [1d97:5216] (rev 01)
	Subsystem: INNOGRIT Corporation Device [1dbe:5216]
	Kernel driver in use: nvme
	Kernel modules: nvme
0d:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f7] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:1142]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
0e:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f6] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:1062]
	Kernel driver in use: ahci
0f:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f7] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:1142]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
10:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f6] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:1062]
	Kernel driver in use: ahci
11:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 [1e4b:1202] (rev 01)
	Subsystem: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 [1e4b:1202]
	Kernel driver in use: nvme
	Kernel modules: nvme
12:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev cb)
	Subsystem: ASUSTeK Computer Inc. Raphael [1043:8877]
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
12:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640]
	Subsystem: ASUSTeK Computer Inc. Rembrandt Radeon High Definition Audio Controller [1043:8877]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
12:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] VanGogh PSP/CCP [1022:1649]
	Subsystem: ASUSTeK Computer Inc. VanGogh PSP/CCP [1043:8877]
	Kernel driver in use: ccp
	Kernel modules: ccp
12:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b6]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
12:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b7]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
12:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller [1022:15e3]
	DeviceName: Realtek ALC897 Audio
	Subsystem: ASUSTeK Computer Inc. Family 17h/19h HD Audio Controller [1043:87fb]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
13:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b8]
	Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci

Any help is greatly appreciated and thanks in advance!

Edit:
It seems the Samsung SSD needs IOMMU to be enabled. It took a few hours but it failed again. At least i belive it is the Samung SSD, hard to tell from the Call Trace...

Jun 25 20:44:13.511044 beast kernel: stack segment: 0000 [#1] PREEMPT SMP NOPTI
Jun 25 20:44:13.511141 beast kernel: CPU: 13 PID: 12916 Comm: FS Not tainted 6.3.9-arch1-1 #1 124dc55df4f5272ccb409f39ef4872fc2b3376a2
Jun 25 20:44:13.511152 beast kernel: Hardware name: ASUS System Product Name/PRIME X670-P, BIOS 1616 05/16/2023
Jun 25 20:44:13.511160 beast kernel: RIP: 0010:nvme_setup_cmd+0x38b/0x4c0 [nvme_core]
Jun 25 20:44:13.511167 beast kernel: Code: c1 e4 0e 41 81 e5 00 01 08 00 74 19 81 e2 00 00 08 00 66 41 81 cc 00 80 41 89 d5 41 f7 dd 45 19 ed 41 83 e5 07 b9 02 00 00 00 <66> 89 4d 00 e9 26 fe ff ff 41 89 d4 41 89 d5 41 c1 ec 11 41 83 e4
Jun 25 20:44:13.511175 beast kernel: RSP: 0018:ffffaba7975a79b0 EFLAGS: 00010202
Jun 25 20:44:13.511182 beast kernel: RAX: 0000000000000000 RBX: ffff9ac011df0000 RCX: 0000000000000002
Jun 25 20:44:13.511189 beast kernel: RDX: 0000000000080000 RSI: ffff9ac011df0000 RDI: 0000000000000000
Jun 25 20:44:13.511197 beast kernel: RBP: cac2e2e08ac262e6 R08: ffff9ac00b924400 R09: 0000000000000000
Jun 25 20:44:13.511204 beast kernel: R10: ffff9ac0034e5bf8 R11: ffff9ac006549800 R12: 0000000000008000
Jun 25 20:44:13.511210 beast kernel: R13: 0000000000000007 R14: 0000000000000000 R15: ffff9ac011df0000
Jun 25 20:44:13.511217 beast kernel: FS:  00007f1b40ffb6c0(0000) GS:ffff9ac738940000(0000) knlGS:0000000000000000
Jun 25 20:44:13.511224 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 25 20:44:13.511231 beast kernel: CR2: 00007f2ff1820010 CR3: 0000000107884000 CR4: 0000000000750ee0
Jun 25 20:44:13.511238 beast kernel: PKRU: 55555554
Jun 25 20:44:13.511245 beast kernel: Call Trace:
Jun 25 20:44:13.511251 beast kernel:  <TASK>
Jun 25 20:44:13.511259 beast kernel:  ? die+0x36/0x90
Jun 25 20:44:13.511267 beast kernel:  ? do_trap+0xda/0x100
Jun 25 20:44:13.511273 beast kernel:  ? do_error_trap+0x6a/0x90
Jun 25 20:44:13.511280 beast kernel:  ? exc_stack_segment+0x37/0x50
Jun 25 20:44:13.511288 beast kernel:  ? asm_exc_stack_segment+0x26/0x30
Jun 25 20:44:13.511293 beast kernel:  ? nvme_setup_cmd+0x38b/0x4c0 [nvme_core e7e0b9e519b86368398f73969d289c73c66364e4]
Jun 25 20:44:13.511300 beast kernel:  nvme_queue_rqs+0xa9/0x280 [nvme c06f049177296e604f2c63ca2639a91fa86519df]
Jun 25 20:44:13.511306 beast kernel:  blk_mq_flush_plug_list+0x2e6/0x310
Jun 25 20:44:13.511313 beast kernel:  __blk_flush_plug+0x102/0x160
Jun 25 20:44:13.511320 beast kernel:  blk_finish_plug+0x29/0x40
Jun 25 20:44:13.511326 beast kernel:  read_pages+0x1b4/0x260
Jun 25 20:44:13.511331 beast kernel:  page_cache_ra_unbounded+0x12e/0x180
Jun 25 20:44:13.511338 beast kernel:  filemap_get_pages+0x4da/0x630
Jun 25 20:44:13.511344 beast kernel:  ? atime_needs_update+0xa0/0x120
Jun 25 20:44:13.511351 beast kernel:  filemap_read+0xdf/0x350
Jun 25 20:44:13.511357 beast kernel:  vfs_read+0x23d/0x310
Jun 25 20:44:13.511364 beast kernel:  ksys_read+0x6f/0xf0
Jun 25 20:44:13.511375 beast kernel:  do_syscall_64+0x5d/0x90
Jun 25 20:44:13.511384 beast kernel:  ? ksys_lseek+0x86/0xb0
Jun 25 20:44:13.511394 beast kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Jun 25 20:44:13.511403 beast kernel:  ? do_syscall_64+0x6c/0x90
Jun 25 20:44:13.511412 beast kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Jun 25 20:44:13.511422 beast kernel:  ? do_syscall_64+0x6c/0x90
Jun 25 20:44:13.511430 beast kernel:  ? irqtime_account_irq+0x40/0xc0
Jun 25 20:44:13.511441 beast kernel:  ? __irq_exit_rcu+0x4b/0xf0
Jun 25 20:44:13.511452 beast kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jun 25 20:44:13.511460 beast kernel: RIP: 0033:0x7f9d2610fb5c
Jun 25 20:44:13.511470 beast kernel: Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 89 9c f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 df 9c f8 ff 48
Jun 25 20:44:13.511484 beast kernel: RSP: 002b:00007f1b40ff94b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Jun 25 20:44:13.511499 beast kernel: RAX: ffffffffffffffda RBX: 00007f1bc4022cc0 RCX: 00007f9d2610fb5c
Jun 25 20:44:13.511513 beast kernel: RDX: 00000000000cf000 RSI: 00007f16d61f09e0 RDI: 0000000000000033
Jun 25 20:44:13.511524 beast kernel: RBP: 00007f9d261ee5a0 R08: 0000000000000000 R09: 00007f16d631deb0
Jun 25 20:44:13.511534 beast kernel: R10: 00007f1b28000790 R11: 0000000000000246 R12: 00007f16d61f09e0
Jun 25 20:44:13.511621 beast kernel: R13: 00000000000cf490 R14: 0000000000000a68 R15: 00007f9d261edca0
Jun 25 20:44:13.511630 beast kernel:  </TASK>
Jun 25 20:44:13.511638 beast kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq uinput cmac algif_hash algif_skcipher af_alg xt_CHECKSUM xt_MASQUERADE hid_logitech_hidpp xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bridge stp llc bnep btusb btrtl btbcm btintel btmtk bluetooth joydev mousedev xpad hid_logitech_dj ecdh_generic ff_memless vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic uvcvideo snd_hda_codec_hdmi videobuf2_vmalloc kvm snd_hda_intel uvc snd_intel_dspcfg videobuf2_memops snd_intel_sdw_acpi crct10dif_pclmul snd_usb_audio videobuf2_v4l2 crc32_pclmul snd_hda_codec polyval_clmulni snd_usbmidi_lib polyval_generic videodev eeepc_wmi snd_hda_core gf128mul snd_rawmidi asus_wmi ghash_clmulni_intel snd_seq_device r8169 sha512_ssse3 snd_hwdep ledtrig_audio aesni_intel snd_pcm sparse_keymap snd_timer
Jun 25 20:44:13.511693 beast kernel:  crypto_simd platform_profile realtek videobuf2_common cryptd mdio_devres snd sp5100_tco pcspkr rapl rfkill wmi_bmof i2c_piix4 mc ccp libphy soundcore gpio_amdpt gpio_generic acpi_cpufreq mac_hid i2c_dev dm_multipath sg crypto_user fuse loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage usbhid amdgpu dm_mod i2c_algo_bit drm_ttm_helper ttm drm_buddy gpu_sched nvme crc32c_intel drm_display_helper sr_mod nvme_core xhci_pci video cdrom cec xhci_pci_renesas nvme_common wmi vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
Jun 25 20:44:13.511728 beast kernel: ---[ end trace 0000000000000000 ]---
Jun 25 20:44:13.511745 beast kernel: RIP: 0010:nvme_setup_cmd+0x38b/0x4c0 [nvme_core]
Jun 25 20:44:13.511756 beast kernel: Code: c1 e4 0e 41 81 e5 00 01 08 00 74 19 81 e2 00 00 08 00 66 41 81 cc 00 80 41 89 d5 41 f7 dd 45 19 ed 41 83 e5 07 b9 02 00 00 00 <66> 89 4d 00 e9 26 fe ff ff 41 89 d4 41 89 d5 41 c1 ec 11 41 83 e4
Jun 25 20:44:13.511766 beast kernel: RSP: 0018:ffffaba7975a79b0 EFLAGS: 00010202
Jun 25 20:44:13.511774 beast kernel: RAX: 0000000000000000 RBX: ffff9ac011df0000 RCX: 0000000000000002
Jun 25 20:44:13.511782 beast kernel: RDX: 0000000000080000 RSI: ffff9ac011df0000 RDI: 0000000000000000
Jun 25 20:44:13.511789 beast kernel: RBP: cac2e2e08ac262e6 R08: ffff9ac00b924400 R09: 0000000000000000
Jun 25 20:44:13.511801 beast kernel: R10: ffff9ac0034e5bf8 R11: ffff9ac006549800 R12: 0000000000008000
Jun 25 20:44:13.511814 beast kernel: R13: 0000000000000007 R14: 0000000000000000 R15: ffff9ac011df0000
Jun 25 20:44:13.511825 beast kernel: FS:  00007f1b40ffb6c0(0000) GS:ffff9ac738940000(0000) knlGS:0000000000000000
Jun 25 20:44:13.511851 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 25 20:44:13.511864 beast kernel: CR2: 00007f2ff1820010 CR3: 0000000107884000 CR4: 0000000000750ee0
Jun 25 20:44:13.511877 beast kernel: PKRU: 55555554
Jun 25 20:44:13.511886 beast kernel: ------------[ cut here ]------------
Jun 25 20:44:13.511899 beast kernel: WARNING: CPU: 13 PID: 12916 at kernel/exit.c:814 do_exit+0x8a9/0xaf0
Jun 25 20:44:13.511909 beast kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq uinput cmac algif_hash algif_skcipher af_alg xt_CHECKSUM xt_MASQUERADE hid_logitech_hidpp xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bridge stp llc bnep btusb btrtl btbcm btintel btmtk bluetooth joydev mousedev xpad hid_logitech_dj ecdh_generic ff_memless vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic uvcvideo snd_hda_codec_hdmi videobuf2_vmalloc kvm snd_hda_intel uvc snd_intel_dspcfg videobuf2_memops snd_intel_sdw_acpi crct10dif_pclmul snd_usb_audio videobuf2_v4l2 crc32_pclmul snd_hda_codec polyval_clmulni snd_usbmidi_lib polyval_generic videodev eeepc_wmi snd_hda_core gf128mul snd_rawmidi asus_wmi ghash_clmulni_intel snd_seq_device r8169 sha512_ssse3 snd_hwdep ledtrig_audio aesni_intel snd_pcm sparse_keymap snd_timer
Jun 25 20:44:13.511962 beast kernel:  crypto_simd platform_profile realtek videobuf2_common cryptd mdio_devres snd sp5100_tco pcspkr rapl rfkill wmi_bmof i2c_piix4 mc ccp libphy soundcore gpio_amdpt gpio_generic acpi_cpufreq mac_hid i2c_dev dm_multipath sg crypto_user fuse loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage usbhid amdgpu dm_mod i2c_algo_bit drm_ttm_helper ttm drm_buddy gpu_sched nvme crc32c_intel drm_display_helper sr_mod nvme_core xhci_pci video cdrom cec xhci_pci_renesas nvme_common wmi vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
Jun 25 20:44:13.511984 beast kernel: CPU: 13 PID: 12916 Comm: FS Tainted: G      D            6.3.9-arch1-1 #1 124dc55df4f5272ccb409f39ef4872fc2b3376a2
Jun 25 20:44:13.512002 beast kernel: Hardware name: ASUS System Product Name/PRIME X670-P, BIOS 1616 05/16/2023
Jun 25 20:44:13.512016 beast kernel: RIP: 0010:do_exit+0x8a9/0xaf0
Jun 25 20:44:13.512025 beast kernel: Code: 89 ab 18 06 00 00 4c 89 a3 20 06 00 00 48 89 6c 24 10 e9 15 fe ff ff 48 8b bb 00 06 00 00 31 f6 e8 2c d9 ff ff e9 c9 fd ff ff <0f> 0b e9 ce f7 ff ff 4c 89 e6 bf 05 06 00 00 e8 83 16 01 00 e9 6b
Jun 25 20:44:13.512034 beast kernel: RSP: 0018:ffffaba7975a7ed8 EFLAGS: 00010282
Jun 25 20:44:13.512043 beast kernel: RAX: 0000000000000000 RBX: ffff9ac6dce94e00 RCX: 0000000000000000
Jun 25 20:44:13.512052 beast kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: ffff9ac00dd898c0
Jun 25 20:44:13.512060 beast kernel: RBP: ffff9ac0145a5a00 R08: 0000000000000000 R09: ffffaba7975a76e0
Jun 25 20:44:13.512068 beast kernel: R10: 0000000000000003 R11: ffffffffb52ca1e8 R12: 000000000000000b
Jun 25 20:44:13.512077 beast kernel: R13: ffff9ac00dd898c0 R14: ffffaba7975a7908 R15: ffff9ac6dce94e00
Jun 25 20:44:13.512086 beast kernel: FS:  00007f1b40ffb6c0(0000) GS:ffff9ac738940000(0000) knlGS:0000000000000000
Jun 25 20:44:13.512094 beast kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 25 20:44:13.512101 beast kernel: CR2: 00007f2ff1820010 CR3: 0000000107884000 CR4: 0000000000750ee0
Jun 25 20:44:13.512113 beast kernel: PKRU: 55555554
Jun 25 20:44:13.512122 beast kernel: Call Trace:
Jun 25 20:44:13.512131 beast kernel:  <TASK>
Jun 25 20:44:13.512138 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 25 20:44:13.512147 beast kernel:  ? __warn+0x81/0x130
Jun 25 20:44:13.512155 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 25 20:44:13.512160 beast kernel:  ? report_bug+0x171/0x1a0
Jun 25 20:44:13.512166 beast kernel:  ? handle_bug+0x3c/0x80
Jun 25 20:44:13.512171 beast kernel:  ? exc_invalid_op+0x17/0x70
Jun 25 20:44:13.512177 beast kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jun 25 20:44:13.512182 beast kernel:  ? do_exit+0x8a9/0xaf0
Jun 25 20:44:13.512188 beast kernel:  ? do_exit+0x70/0xaf0
Jun 25 20:44:13.512195 beast kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Jun 25 20:44:13.512201 beast kernel:  make_task_dead+0x81/0x170
Jun 25 20:44:13.512208 beast kernel:  rewind_stack_and_make_dead+0x17/0x20
Jun 25 20:44:13.512215 beast kernel: RIP: 0033:0x7f9d2610fb5c
Jun 25 20:44:13.512220 beast kernel: Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 89 9c f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 df 9c f8 ff 48
Jun 25 20:44:13.512226 beast kernel: RSP: 002b:00007f1b40ff94b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Jun 25 20:44:13.512231 beast kernel: RAX: ffffffffffffffda RBX: 00007f1bc4022cc0 RCX: 00007f9d2610fb5c
Jun 25 20:44:13.512237 beast kernel: RDX: 00000000000cf000 RSI: 00007f16d61f09e0 RDI: 0000000000000033
Jun 25 20:44:13.512241 beast kernel: RBP: 00007f9d261ee5a0 R08: 0000000000000000 R09: 00007f16d631deb0
Jun 25 20:44:13.512248 beast kernel: R10: 00007f1b28000790 R11: 0000000000000246 R12: 00007f16d61f09e0
Jun 25 20:44:13.512253 beast kernel: R13: 00000000000cf490 R14: 0000000000000a68 R15: 00007f9d261edca0
Jun 25 20:44:13.512258 beast kernel:  </TASK>
Jun 25 20:44:13.512264 beast kernel: ---[ end trace 0000000000000000 ]---

Last edited by gagootron (2023-06-25 19:07:19)

Offline

#2 2023-06-25 20:52:30

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,267

Re: NVME errors with amd_iommu enabled

https://wiki.archlinux.org/title/Solid_ … nd_support
Do you get away w/ "amd_iommu=fullflush" or "amd_iommu=force_isolation" ?

Online

#3 2023-06-27 05:39:38

gagootron
Member
Registered: 2023-06-25
Posts: 5

Re: NVME errors with amd_iommu enabled

seth wrote:

Do you get away w/ "amd_iommu=fullflush" or "amd_iommu=force_isolation" ?

I just did and the Lexar SSD is still throwing the same errors with either setting.

Offline

#4 2023-06-27 08:11:46

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,267

Re: NVME errors with amd_iommu enabled

Tried "nvme_core.default_ps_max_latency_us=0" to disable APST?

Online

#5 2023-06-27 17:03:45

gagootron
Member
Registered: 2023-06-25
Posts: 5

Re: NVME errors with amd_iommu enabled

i now tried "nvme_core.default_ps_max_latency_us=0" with once with "iommu=pt" which still caused my Samsung SSD to Fail, and once without with caused the Lexar SSD to fail.

However when shutting down my pc after the Samsung crash, it has produced some new error messages that i hadn't seen before.

nvme nvme0: I/O 776 (Write) QID 9 timeout, aborting
nvme nvme0: I/O 777 (Write) QID 9 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 778 (Write) QID 9 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 779 (Write) QID 9 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 780 (Write) QID 9 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 781 (Write) QID 9 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 782 (Write) QID 9 timeout, aborting
nvme nvme0: Abort status: 0x0

I typed the above by hand of a photo i took of the screen, as no logs were written anymore, so please excuse any typos.

This is certainly similar to the APST errors described in the wiki. But obviously the workaround didn't work here...

I'm going to try updating the SSD firmware, maybe that will help...

Edit:
The Firmware update did not help

Last edited by gagootron (2023-06-27 18:59:12)

Offline

#6 2023-06-28 06:30:46

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,267

Re: NVME errors with amd_iommu enabled

Tried "iommu=soft"?

Online

#7 2023-06-28 19:30:17

gagootron
Member
Registered: 2023-06-25
Posts: 5

Re: NVME errors with amd_iommu enabled

No, the Lexar SSD doesn't work then.

I just tried removing both the Lexar SSDs and setting iommu=pt (Usually causing the Samsung SSD to fail). So far my system is running (4+ Hours without crash). It seems to me like something about the Lexar SSDs causes the nvme driver to crash. I will have to test them by themselves in some other systems. Maybe I can figure this out what's wrong.

I will report back in a few days whether or not it's working properly.

Offline

#8 2023-06-29 16:54:12

gagootron
Member
Registered: 2023-06-25
Posts: 5

Re: NVME errors with amd_iommu enabled

Ok, i think i figured it out.

The Lexar SSD is the root cause.
When it is inserted in an nvme slot that is connected through the chipset it acts up.

I tested it all by itself in my pc, these are the results:

  • CPU nvme slot: No issues

  • Slot that goes through one chipset: Only issues when other SSDs are present

  • Slot that goes through two chipsets: causes IO_PAGE_FAULT with iommu enabled but seems to work fine otherwise; crashes after mounting with iommu disabled

I can't tell why this is happening. I guess it is either something about the chipset(s) being doing something weird. Or the nvme driver can't handle the lower bandwidth/shared pcie lanes in this specific case.

So now my question changes:
Should I somehow report this to the maintainers of the nvme driver? And if yes, how do I collect the data needed to debug this and how do is send a report? I guess this page of the kernel wiki would be a good starting point.

Offline

#9 2023-06-29 20:10:04

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,267

Online

Board footer

Powered by FluxBB