You are not logged in.
Recently I had multiple instances of my system freezing up. After investigation I noticed that I am having these issues with my persistent flash storage:
nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
kernel: nvme0n1: Read(0x2) @ LBA 160540672, 32 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
I/O error, dev nvme0n1, sector 160540672 op 0x0:(READ) flags 0x880700 phys_seg 4 prio class 2However a SMART test does not turn up anything:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 6%
Data Units Read: 122,448,022 [62.6 TB]
Data Units Written: 77,608,698 [39.7 TB]
Host Read Commands: 3,957,617,815
Host Write Commands: 2,069,244,753
Controller Busy Time: 7,976
Power Cycles: 11,104
Power On Hours: 6,376
Unsafe Shutdowns: 305
Media and Data Integrity Errors: 0
Error Information Log Entries: 7,476
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 39 Celsius
Temperature Sensor 2: 50 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message
0 7476 0 0x000b 0x4004 - 0 0 - Invalid Field in Command
1 7475 0 0x003a 0x4016 0x004 0 1 - Invalid Namespace or Format
2 7474 0 0x0008 0x4004 - 0 0 - Invalid Field in Command
3 7473 0 0x001c 0x4004 - 0 0 - Invalid Field in Command
4 7472 0 0x003a 0x4016 0x004 0 1 - Invalid Namespace or Format
5 7471 0 0x0008 0x4004 - 0 0 - Invalid Field in Command
6 7470 0 0x201c 0x4004 - 0 0 - Invalid Field in Command
7 7469 0 0x003a 0x4016 0x004 0 1 - Invalid Namespace or Format
8 7468 0 0x0008 0x4004 - 0 0 - Invalid Field in Command
9 7467 0 0x0018 0x4004 - 0 0 - Invalid Field in Command
10 7466 0 0x0008 0x4004 - 0 0 - Invalid Field in Command
11 7465 0 0x300e 0x4004 - 0 0 - Invalid Field in Command
12 7464 0 0x003a 0x4016 0x004 0 1 - Invalid Namespace or Format
13 7463 0 0x0008 0x4004 - 0 0 - Invalid Field in Command
14 7462 0 0x301a 0x4004 - 0 0 - Invalid Field in Command
15 7461 0 0x003a 0x4016 0x004 0 1 - Invalid Namespace or Format
... (48 entries not read)
Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code
0 Extended Completed without error 6376 - - - - -
1 Short Completed without error 5730 - - - - -I am not sure if the issue appeared since the last kernel update.
The nvme controller keeps going down periodically
kernel: nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
kernel: nvme nvme0: 4/0/0 default/read/poll queues
kernel: nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
kernel: nvme nvme0: 4/0/0 default/read/poll queuesI'm not sure if this is normal or not.
Any idea what might be going on and how I can further investigate?
Offline
SMART could be the actual cause and reason if you e.g. enabled smartd or so on a timer (note that there are also some DEs that have such an integration and might enable a check e.g. I know plasma-disks has an option that should however be disabled by default). Some NVMEs react allergic to the way SMART wants to trigger the test, see e.g. https://bbs.archlinux.org/viewtopic.php … 6#p2213996 and https://bbs.archlinux.org/viewtopic.php?id=306106
Last edited by V1del (Today 13:49:51)
Online
SMART could be the actual cause and reason if you e.g. enabled smartd or so on a timer (note that there are also some DEs that have such an integration and might enable a check e.g. I know plasma-disks has an option that should however be disabled by default). Some NVMEs react allergic to the way SMART wants to trigger the test, see e.g. https://bbs.archlinux.org/viewtopic.php … 6#p2213996 and https://bbs.archlinux.org/viewtopic.php?id=306106
Thanks for the suggestions. I checked and I don't have it enabled. To be honest, I don't know why but it seems like the errors happen sporiadically but mostly when I'm using Youtube on Firefox.
Also previously I had swap off, as I typically have enough RAM to do without. I turned swap on just to check if I was running out of memory. This I only mention because I don't think it's due to swap activity.
Offline
https://wiki.archlinux.org/title/Solid_ … leshooting
"nvme_core.default_ps_max_latency_us=0 pcie_aspm=off iommu=soft", https://wiki.archlinux.org/title/Kernel_parameters
And see whether this stops.
Offline