You are not logged in.
Hello Arch Linux Forum! A few days ago I wiped my windows installation from my m.2 SSD because I felt comfortable enough with my arch installation. After that I just copied partitions to my ssd using gparted, it booted and worked fine. Yesterday I tried to play a game and my system crashed. In fact every game was crashing it over and over again.
Just to clarify, it's not like my system just freezes, it does it module by module: first the panels stop updating, then the windows stop moving, then my cursor gets stuck in one place, after a few seconds all sounds stop playing and that's it. If I wait long enough, I think the KDE plasma will try to restart itself: most of the windows will close, wallpaper and panels will disappear. Switching to tty at this point just showing a cursor blinking at in the top left corner.
I was 100% sure it was the drive. Person on reddit suggested to rebuild the gpt table. I was lucky enough to find an old 500Gb hard drive and copied my partitions to it. Just to check, I booted to that drive and surprise: all games work fine, no crashes. I thought I was just unlucky and something bad happened when I moved my partitions to m.2 ssd. So I booted to liveISO, deleted the partitions, rebuilt the gpt table and finally copied my partitions back to ssd.
It didn't boot... Only thing I got is grub rescue screen. "It's okay, I'll just reinstall grub". Boot into liveISO again, chroot, grub-install, grub-mkconfig, reboot. Finally booted into the installation. My luck ended here: the games still crash.
At this point, I don't know what could possibly be wrong. Drive health is fine, it's even still under warranty, jornalctl literally shows no errors, I've tried lts kernel, I've tried booting games from another drive, nothing helps.
System info:
Operating System: Arch Linux
KDE Plasma Version: 6.1.1
KDE Frameworks Version: 6.3.0
Qt Version: 6.7.2
Kernel Version: 6.6.36-1-lts (64-bit)
Graphics Platform: Wayland
Processors: 12 × AMD Ryzen 5 5600X 6-Core Processor
Memory: 15.6 GiB of RAM
Graphics Processor: AMD Radeon RX 5700 XT
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: B450 GAMING XThanks for taking the time to read my post. Any help would be greatly appreciated.
EDIT:
After letting the system rest in a "half-shutdown" state, I got some logs in jornalctl:
Jul 03 13:32:30 archlinux kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Jul 03 13:32:30 archlinux kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jul 03 13:32:30 archlinux kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 03 13:32:30 archlinux kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Jul 03 13:32:30 archlinux kernel: nvme nvme0: Disabling device after reset failure: -19I don't think this is a problem with an SSD, hate to say it, but it worked fine on windows.
EDIT 2: I tried to add the options recommended above to my kernel options, verified that they are added (cat /proc/cmdline), but my system still crashes
EDIT 3: Now I can't even get the logs to show up in journalctl again, I think it's because the drive is disconnecting. Also, now that the system is rebooting almost properly, it gets stuck on the screen where it usually shows things like "Broadcast message from X: the system will reboot now", if I am not mistaken, an init screen.
Last edited by ilia21 (2024-07-14 08:07:22)
Offline
you're using LTS - do you have a specific reason for that?
have you tried with current standard kernel?
Offline
first the panels stop updating, then the windows stop moving, then my cursor gets stuck in one place, after a few seconds all sounds stop playing and that's it. If I wait long enough, I think the KDE plasma will try to restart itself: most of the windows will close, wallpaper and panels will disappear.
That sounds more like an OOM condition, make sure to keep an eye on your memory demand.
Do you have physical swap (partition or file, but not zram or zswap)?
But if you're positive it's the nvme (because the other drive didn't exhibit this on extensive tests), try "iommu=soft" as well.
Offline
you're using LTS - do you have a specific reason for that?
have you tried with current standard kernel?
Hello, thanks for you reply!
As I said in my post, I tried using both LTS and the default kernel to see if my problem was present on both.
Offline
first the panels stop updating, then the windows stop moving, then my cursor gets stuck in one place, after a few seconds all sounds stop playing and that's it. If I wait long enough, I think the KDE plasma will try to restart itself: most of the windows will close, wallpaper and panels will disappear.
That sounds more like an OOM condition, make sure to keep an eye on your memory demand.
Do you have physical swap (partition or file, but not zram or zswap)?But if you're positive it's the nvme (because the other drive didn't exhibit this on extensive tests), try "iommu=soft" as well.
Hello, thanks for your reply!
I have swap but not as partition, only zram0
I did some research before making this post, setting iommu to soft was a possible solution, but unfortunately it didn't fix my problem.
I'm having my PC upgraded with a new case today to hopefully solve the GPU overheating issues. There's a chance the SSD is just overheating. I tried monitoring it with
watch --interval 0.25 sudo nvme smart-log /dev/nvme0and it goes up to 65C under load.
I will post an update
Offline
m.2 nvme drives run hotter without issue. Samsung 980 for example warns at 82C and maxes at 85C I believe. Different devices will have different limits, but it isn't going to run as cool as a separate 2.5" drive.
If 65 is the highest you've seen it, I personally don't think its overheating. It might have problems, but probably not temperature related. Heatsinks don't always fix those problems, as well. Many are designed to run without a heatsink with the label on the top acting as a thin heatsink.
Desktop: Ryzen 7 1800X | AMD 7800XT | KDE Plasma
Offline
keep "dmesg -w" running in a visible window, the nvme or its bus would likely show up there in case of problems.
And keep an eye on the RAM - if you're using zram, disable zswap and in doubt add some physical swap.
Offline
Okay, after a few days I can confirm that replacing the PSU and upgrading the case to the one with fans solved the problem for me. I also replugged everything including the ssd just in case. Thanks for all the replies and suggestions.
For people coming here in the future:
- Try adding nvme_core.default_ps_max_latency_us=0 pcie_aspm=off to your kernel parameters.
- Try adding iommu=soft to your kernel config.
- Check if your drive is overheating (watch --interval 0.25 sudo nvme smart-log /dev/nvme0)
- Try a different slot if possible
- Check if your SSD is properly connected
Also, moderators, please move this to Kernel & Hardware.
Offline
Thanks for replying back with your solution. Often many find a solution and don't come back.
Offline