You are not logged in.
I have been experiencing random system hangs for the past month or so, and it seems to be getting more frequent.
Pastebin (0x0.st) links with some relevant logs and system info:
system info (neofetch, installed kernels and headers, GPU drivers and modules)
journalctl logs for boots -0, -1, -2, -3, -4, -5 (boot -0 is full log, others are for priority -p err)
What happens when the system hangs:
Completely unresponsive; ctrl+alt+del, SysRq, ctrl+alt+Fx (for opening terminal login prompt) all do nothing.
The only option is a hard reboot.
Occasionally the system just reboots itself, though this is more rare compared to the usual complete freezing.
No discernible pattern about when or why it happens; sometimes it can be only after a few minutes of rebooting, sometimes several hours.
I am not performing any particularly CPU or memory intensive tasks when it happens.
It seems to happen less frequently when I stop using a Unitek USB hub (with Ethernet port); but I strongly suspect this is just a coincidental correlation and not the cause, as before these issues started I would be using that same device nonstop with zero problems.
Some history about either particular quirks of my laptop, or hardware changes, that may help in troubleshooting the cause (though again, the system hang issues did not start immediately after any of these changes):
The laptop has currently no battery installed.
The battery became severely swollen over time, due to my unawareness that leaving the power supply plugged in constantly and not letting the battery discharge occasionally is bad practice.
It was so swollen that the clicking function of the touchpad had become borderline unusable.
Since removing the battery, the touchpad is back to normal.
The "k" key on the on-board keyboard does not function, which I suspect was damaged due to the swollen battery.
The power supply currently used is not the OEM, replaced about 1 year ago.
The OEM power supply had the following specs:
AC Input: 100-240V, 3.6A, 50/60Hz
DC Output: 19.5V, 11.8A
Power: 230W
The replacement has almost identical specs, except the AC Input current rating is 2.9A instead of 3.6A.
The original SSD was replaced about 3 months ago.
Original:
Samsung PM981 512GB SSD (PCIe 3.0 x4 NVMe M.2 2280)
Model: MZ-VLB5120
New:
Samsung 990 Pro 2TB SSD (PCIe 4.0 NVMe M.2 2280)
Model: MZ-V9P2T0BW
I used a live USB arch system to clone the old SSD onto the new one using dd (and a USB SSD enclosure).
During this extended usage of a live system, the laptop ran extremely hot (more so than when running a normal environment; see below).
It ran so hot that the dd cloning would not complete, and the system would power off automatically due to the overheating.
I eventually had to complete the clone in 50-100GB chunks, letting the system cool off in between.
I used cmp to compare the entirety of the cloned drive with the original, to make sure all was okay.
No immediate issues were noticed with the system under the new SSD.
The laptop used to run very hot nearly always.
This turned out to be due to dust.
While I had attempted cleaning it before, I eventually discovered that actually disassembling the two fans entirely revealed a whole bunch more dust that had previously been hidden from view.
After the thorough cleaning, it now runs normal; still slightly warm to the touch, but it now feels like normal laptop temperature to me, and nowhere near the extreme amounts before, where the fans would be going full blast almost constantly.
So while none of the above changes/upgrades seemed to immediately cause the issues with the system hanging, I guess given that the issues started about a month or two after the change of SSD and disassembly of fans for thorough cleaning, it is not beyond the realm of possibility that I may have inadvertently done something during those tasks that started the chain of whatever is causing the current issues.
I have included the logs that I thought would be relevant based on browsing the forums for people having similar issues. But admittedly, while feeling quite comfortable on the command line and scripting for years, the nitty gritty sysadmin and logs/journal perusing and parsing is still a little bit new to me, so if any additional info is needed to help troubleshoot, I would greatly appreciate somebody's assistance in guiding me through what else I need to supply.
Thank you in advance as always for your time and assistance.
List of edits:
Typo in subject
Uploading two more error priority journalctl logs (-0 and -1) after yet more crashes.
Including another instance of the system rebooting itself.
This time the Unitek USB hub was not being used, so more sure now that this has nothing to do with the problem.
Would I be correct in assuming that the "Invalid framebuffer status: "GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT" errors that appear very frequently in the journalctl logs, and which are the last err priority messages logged in the -1 boot, is likely to be the culprit?
Last edited by fractal_sounds (2026-02-28 06:41:03)
Offline
This is proving really difficult to troubleshoot. Here are some additional things I've tried, and a couple of other observations that will hopefully help somebody have an "aha!" moment.
Switched to nouveau driver
/etc/mkinitcpio.conf:
#MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)
MODULES=(nouveau)/usr/lib/modprobe.d/nvidia-utils.conf:
#blacklist nouveau
blacklist nova_core
blacklist nova_drm/usr/lib/modules-load.d/nvidia-utils.conf:
#nvidia-uvmlspci -nnk | grep -iA2 vga:
00:02.0 VGA compatible controller [0300]: Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630] [8086:3e9b]
DeviceName: Onboard - Video
Subsystem: Razer USA Ltd. Device [1a58:2002]
--
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106M [GeForce RTX 2070 Mobile] [10de:1f10] (rev a1)
Subsystem: Razer USA Ltd. Device [1a58:2002]
Kernel driver in use: nouveauThis has seemed to stop the very frequent GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT messages appearing in journalctl, so I was hopeful this would solve the issue, but alas, the random system hangs continue.
I have tried installing the Xfce DE (by installing the xfce4n and xfce4-goodies groups), thinking that perhaps a lighter DE might alleviate the issue, and also since I would occasionally see systemd-coredump messages originating from plasmashell in the journalctl logs, so I thought perhaps it's a KDE problem.
The Xfce Wayland session would not work at all, and in fact the system would freeze upon trying to login; I'm unsure whether this is related to whatever is causing the regular system hangs, or because I have not configured things properly in order for Xfce to work with Wayland.
The non-Wayland session would at least allow me to log in, but within a minute or two the system would freeze up completely.
When I mentioned that sometimes the system just reboots itself spontaneously rather than freezing up, I've noticed that occasionally it takes me directly to the BIOS setup (without me actually pressing the F1 key myself during boot up).
On at least one occasion, maybe twice, when I select "reboot without saving" from the BIOS menu, it once again took me back to the BIOS menu without the press of the F1 key, and only after a second subsequent reboot was I able to progress to the normal login window.
A couple of times, the laptop's onboard keyboard is completely unresponsive (no backlight, no key presses work) upon a hard reboot, and I have to reboot a second time to get keyboard functionality back to allow me to actually type my cryptlvm password.
The system freeze has occurred on several occasions when I have been on a tty; so unless whatever is causing it was happening in the background from the KDE session I also had opened, it's making me think more likely that it's a hardware issue if it happens even from a tty.
The thing that is confusing me though, is that if it's a hardware issue, why is it so damn unpredictable, when my usage patterns are not varying all that much? Why does it happen barely minutes into a new session sometimes, while other times it can be several hours between occurrences of it?
That's all I can think of for now. I won't post any more logs just yet, as I'm still unsure whether the ones I posted initially were the "right" ones, or if I need to look elsewhere for getting to the bottom of this.
Any advice would be greatly appreciated. While I occasionally have runs of good fortune and can get a couple of hours of use, increasingly frequently I am having instances where it crashes several times in the span of 10-15 minutes, making it very difficult to use it for work (which unfortunately I rely on, as my only other machine is an old MBP that is horrendously outdated and very much on its last legs).
Last edited by fractal_sounds (2026-03-02 10:40:09)
Offline
So, bit of an update... I've had a very stable system the past couple of days. Current uptime is 2 days, 5 hours, 32 minutes.
While I'm very glad of course, it bugs me that I don't actually know what was causing it, and whether the cause has actually gone away or is still lurking somewhere.
I learnt about the issue with ddcutil and/or powerdevil through this forum post.
The pacman log of the recent history of these packages,
[2025-07-29T03:48:34+1000] [ALPM] upgraded ddcutil (2.2.0-1 -> 2.2.1-1)
[2025-11-23T10:27:40+1100] [ALPM] upgraded ddcutil (2.2.1-1 -> 2.2.3-1)
[2026-02-12T10:02:47+1100] [ALPM] upgraded ddcutil (2.2.3-1 -> 2.2.5-2)
[2025-12-29T00:28:34+1100] [ALPM] upgraded powerdevil (6.5.3-2 -> 6.5.4-1)
[2026-01-14T20:40:22+1100] [ALPM] upgraded powerdevil (6.5.4-1 -> 6.5.5-1)
[2026-02-23T18:31:41+1100] [ALPM] upgraded powerdevil (6.5.5-1 -> 6.6.0-1)
[2026-02-25T23:10:17+1100] [ALPM] upgraded powerdevil (6.6.0-1 -> 6.6.1-1)
[2026-03-04T19:43:47+1100] [ALPM] upgraded powerdevil (6.6.1-1 -> 6.6.2-1)shows that while I still have the ddcutil 2.2.5 that was giving problems to others, there has been an upgrade of powerdevil in the last couple of days. I am tentatively hoping that this is what has caused the stability of my system.
The only changes I have made to my system, aside from the regular upgrades, are:
NVIDIA proprietary -> Nouveau
Stopped using the Unitek USB hub (with Ethernet port)
Turned off Bluetooth completely
My current plan is, if I continue to have a stable system for another couple of days, to one-by-one revert the three changes mentioned above, to see if the issue returns. If not, then I can only presume that powerdevil 6.6.1-1 -> 6.6.2-1 is what rectified the issue, and will mark the thread solved.
Last edited by fractal_sounds (2026-03-05 16:20:05)
Offline
Well, the first thing I tried reverting was using the Unitek USB hub, with only an Ethernet cable connected to it (no other USB devices). Upon a system upgrade and reboot (as in, rebooting it myself, not random reboots), bringing to end my unprecedented (since the issues started) 3-day long uptime, I tried logging into an Xfce session. This DE had been even worse than Plasma (in terms of how quickly the system would freeze), so I was eager to see if the powerdevil upgrade had rectified the issue.
Nope. Again a total system freeze within less than a minute of the session starting. No response from Ctrl+Atl+Del, SysRq, or switching to a tty window. After a hard reboot, I tried a Plasma session, still with the USB Ethernet hub connected. Again, very quick system freeze, within a couple of minutes.
I unplugged the USB Ethernet hub. Hard rebooted. Logged back into Plasma. I uninstalled the Xcfe DE completely, as it was just adding another unknown variable into the mix. And also, the system freezes always seemed to happened much more quickly, and almost always after the same amount of elapsed time, on Xfce... whereas on Plasma there was more variation. So I couldn't be sure that it was the same issue that was causing the crashes on Xfce, or if there was some other incompatibility from having both DEs installed at the same time.
In any case... I've since had an uptime of 18 hours and counting, on Plasma, minus the USB Ethernet hub.
It makes very little sense to me. As I mentioned, I had used this device for a number of years, with zero issues, so I can't understand how it can all of a sudden be the cause of such volatile system crashes. Moreover, I don't know the counts, but I have had at least one (maybe a few) crashes even without that hub connected. Although, ever since the recent powerdevil upgrade, so far the only times I have had significant uptime (3 days initially, and now 18 hours) with no crashes has been without that hub, and the only time I tried connecting it, I had a very unstable system.
This is both intriguing and infuriating in equal measure.
I can definitely live without that hub (I only use it because the WiFi at my place is a little sketchy at times, and the Ethernet connection was more stable).
Does anyone have any advice on what tests I can run to confirm that it's not just a coincidence, and that it really is this hub that's causing the crashes?
Last edited by fractal_sounds (2026-03-07 14:30:00)
Offline