You are not logged in.
Hello,
Kind of not used to ask help anywhere so tell me if do not include sufficient infos on the issue I'm facing.
Since some time now, when I try to get into some heavy gaming ( Borderlands 4, Arc Raiders ), both of my NVME disconnect in an instant ( happening every time ).
First the NVME where the game is installed, then millisecond later the system one too.
In order to get a good dmesg output, I have to ssh in, as the whole computer will just hang and force shutdown will corrupt journald file.
I tried power management kernel setting ( pcie aspm off ) after some searching on "controller down" issues.
So here I throw my bottle to the sea and some infos about my setup.
Kernel boot line :
options root=PARTUUID=1427d43a-9760-4711-89c8-ea350ddd1d05 zswap.enabled=0 rw rootfstype=ext4 init_on_alloc=1 vsyscall=emulate clearcpuid=514 pcie_aspm=off quiet( init on alloc is there for latest nvidia driver, problem is there even without it and clearcpuid which is for some error on some wine programs )
MB : ASUS Rog Strix X570-E GAMING (BIOS v 5031)
CPU : Ryzen 9 5900x
NVMEs :
- system : MP600 1To
- games : MP600 2To
GPU : NVIDIA 3080ti
Linux kernel v: 6.17.5
dmesg : Pastebin
It's not shown in this dmesg cause I cut ssh too soon but system nvme will get controller down almost at the same time
small part of it :
[Oct31 16:13] nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10[ +0.000004] nvme nvme1: Does your device have a faulty power saving mode enabled?
[ +0.000002] nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
[ +0.021197] nvme1n1: Read(0x2) @ LBA 30659072, 32 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
[ +0.000004] I/O error, dev nvme1n1, sector 30659072 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[ +0.011797] nvme 0000:04:00.0: enabling device (0000 -> 0002)
[ +0.000071] nvme nvme1: Disabling device after reset failure: -19
[ +0.015929] Buffer I/O error on dev nvme1n1p1, logical block 73, lost async page write
[ +0.000006] Aborting journal on device nvme1n1p1-8.
[ +0.000003] EXT4-fs error (device nvme1n1p1) in ext4_reserve_inode_write:6313: Journal has aborted
[ +0.000001] Buffer I/O error on dev nvme1n1p1, logical block 9463, lost async page write[ +0.000001] Buffer I/O error on dev nvme1n1p1, logical block 243826688, lost sync page write
[ +0.000003] JBD2: I/O error when updating journal superblock for nvme1n1p1-8.
[ +0.000000] Buffer I/O error on dev nvme1n1p1, logical block 305659916, lost async page write
[ +0.000002] EXT4-fs error (device nvme1n1p1): ext4_dirty_inode:6517: inode #65544863: comm kworker/u97:7: mark_inode_dirty error
[ +0.000002] Buffer I/O error on dev nvme1n1p1, logical block 306184201, lost async page write
[ +0.000002] EXT4-fs error (device nvme1n1p1) in ext4_dirty_inode:6518: Journal has aborted
[ +0.000009] EXT4-fs error (device nvme1n1p1) in ext4_reserve_inode_write:6313: Journal has aborted
[ +0.000001] EXT4-fs error (device nvme1n1p1): mpage_map_and_submit_extent:2536: inode #65544863: comm kworker/u97:7: mark_inode_dirty error
[ +0.000002] EXT4-fs error (device nvme1n1p1): mpage_map_and_submit_extent:2538: comm kworker/u97:7: Failed to mark inode 65544863 dirty
[ +0.000005] EXT4-fs error (device nvme1n1p1) in ext4_do_writepages:2944: Journal has aborted
[ +0.000002] EXT4-fs warning (device nvme1n1p1): ext4_end_bio:368: I/O error 10 writing to inode 65544863 starting block 306317056)
[ +0.000003] EXT4-fs (nvme1n1p1): failed to convert unwritten extents to written extents -- potential data loss! (inode 65544863, error -5)
[ +0.000004] Buffer I/O error on device nvme1n1p1, logical block 306316800
[ +0.000003] Buffer I/O error on device nvme1n1p1, logical block 306316801
[ +0.000002] Buffer I/O error on device nvme1n1p1, logical block 306316802
[ +0.000010] Buffer I/O error on dev nvme1n1p1, logical block 0, lost sync page write
[ +0.000002] EXT4-fs (nvme1n1p1): I/O error while writing superblock
[ +0.000661] EXT4-fs error (device nvme1n1p1): ext4_journal_check_start:87: comm WebSocketClient: Detected aborted journal
[ +0.000008] Buffer I/O error on dev nvme1n1p1, logical block 0, lost sync page write
[ +0.000003] EXT4-fs (nvme1n1p1): I/O error while writing superblock
[ +0.000002] EXT4-fs (nvme1n1p1): Remounting filesystem read-only
[ +0.010903] EXT4-fs (nvme1n1p1): shut down requested (2)
[ +0.091924] EXT4-fs warning (device nvme1n1p1): htree_dirblock_to_tree:1051: inode #65544780: lblock 0: comm PioneerGame.exe: error -5 reading directory block
[ +0.000060] EXT4-fs warning (device nvme1n1p1): htree_dirblock_to_tree:1051: inode #65544780: lblock 0: comm PioneerGame.exe: error -5 reading directory block
[ +0.006535] EXT4-fs warning (device nvme1n1p1): htree_dirblock_to_tree:1051: inode #65536148: lblock 0: comm PioneerGame.exe: error -5 reading directory block
[ +0.002553] EXT4-fs warning (device nvme1n1p1): htree_dirblock_to_tree:1051: inode #65536148: lblock 0: comm PioneerGame.exe: error -5 reading directory block
[ +1.126885] EXT4-fs warning (device nvme1n1p1): htree_dirblock_to_tree:1051: inode #65536148: lblock 0: comm PioneerGame.exe: error -5 reading directory block
[ +0.002362] EXT4-fs warning (device nvme1n1p1): dx_probe:791: inode #65536265: lblock 0: comm PioneerGame.exe: error -5 reading directory block
[ +0.000014] EXT4-fs warning (device nvme1n1p1): dx_probe:791: inode #65536265: lblock 0: comm PioneerGame.exe: error -5 reading directory block
[ +0.000015] EXT4-fs warning (device nvme1n1p1): dx_probe:791: inode #65536265: lblock 0: comm PioneerGame.exe: error -5 reading directory block
[ +0.000008] EXT4-fs warning (device nvme1n1p1): dx_probe:791: inode #65536265: lblock 0: comm PioneerGame.exe: error -5 reading directory blocksmartctl result :
- System NVME
=== START OF INFORMATION SECTION ===
Model Number: Force MP600
Serial Number: 2014823000012856314E
Firmware Version: EGFM11.3
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 6479a7 33525718aa
Local Time is: Fri Oct 31 17:21:52 2025 CET
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x08): Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 90 Celsius
Critical Comp. Temp. Threshold: 95 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.78W - - 0 0 0 0 0 0
1 + 6.75W - - 1 1 1 1 0 0
2 + 5.23W - - 2 2 2 2 0 0
3 - 0.0490W - - 3 3 3 3 2000 2000
4 - 0.0018W - - 4 4 4 4 25000 25000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 3%
Data Units Read: 158,231,679 [81.0 TB]
Data Units Written: 136,781,795 [70.0 TB]
Host Read Commands: 676,623,709
Host Write Commands: 332,336,534
Controller Busy Time: 2,043
Power Cycles: 3,006
Power On Hours: 17,252
Unsafe Shutdowns: 320
Media and Data Integrity Errors: 0
Error Information Log Entries: 7,323
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message
0 7323 0 0x0014 0x4004 0x028 0 0 - Invalid Field in Command
Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code
0 Extended Completed without error 17249 - - - - -
1 Extended Completed without error 17248 - - - - -
2 Short Completed without error 17248 - - - - -- Game NVME
=== START OF INFORMATION SECTION ===
Model Number: Force MP600
Serial Number: 20178229000128555772
Firmware Version: EGFM11.3
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 6479a7 34c241d9dc
Local Time is: Fri Oct 31 17:21:38 2025 CET
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x08): Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 90 Celsius
Critical Comp. Temp. Threshold: 95 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.78W - - 0 0 0 0 0 0
1 + 6.75W - - 1 1 1 1 0 0
2 + 5.23W - - 2 2 2 2 0 0
3 - 0.0490W - - 3 3 3 3 2000 2000
4 - 0.0018W - - 4 4 4 4 25000 25000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 47 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 5%
Data Units Read: 84,246,199 [43.1 TB]
Data Units Written: 92,575,868 [47.3 TB]
Host Read Commands: 1,089,717,566
Host Write Commands: 1,169,340,761
Controller Busy Time: 2,571
Power Cycles: 3,193
Power On Hours: 17,265
Unsafe Shutdowns: 506
Media and Data Integrity Errors: 0
Error Information Log Entries: 8,792
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message
0 8792 0 0x000c 0x4004 0x028 0 0 - Invalid Field in Command
Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code
0 Extended Completed without error 17261 - - - - -
1 Extended Completed without error 17261 - - - - -
2 Short Completed without error 17261 - - - - -Might be hardware problem but I ran some controller tests and all with bios utility and smartctl, wich are all passing successfully.
I don't know what I'm missing.
To whoever that read this, thank you XD and have a good night
Last edited by Prouk (2025-10-31 17:03:28)
I'll try my best, probably
Offline
does the issue also happen with LTS kernel?
Offline
does the issue also happen with LTS kernel?
I'll try this today or tomorrow, sadly I won't have much time to test today
I'll try my best, probably
Offline