You are not logged in.
Today I encountered a weird problem on my hybrid AMD iGPU and NVIDIA dGPU laptop where, each time I open the Unity3D engine, after a while my laptop (hard) shuts down - as in, the power just goes off. The delay is random and I'm not even necessarily working in Unity at that point, it's usually just idling in the background.
I've already mostly ruled out a few things:
Temperatures - laptop is mostly idling and cold when it happens.
Stress testing the CPU, didn't trigger it.
Playing games intensively using the GPU using DXVK doesn't trigger it.
Downgrading various packages (NVIDIA 580 to 575, kernel 6.16 to 6.12 LTS, amd-ucode from 20250808-1 to 20250708-1, linux-firmware packages down one version, ...).
Using OpenGL instead of Vulkan in Unity.
I have a Lenovo Legion laptop and use the LenovoLegion kernel driver. I tried updating that as well as searching through its codebase for problems.
Search the internet for related issues but most of them speak about hardware issues, which might be the case but findings below might imply something else.
Viewing logs; there are no useful logs because the reset triggers abruptly.
I have an AMD Zen system and the Since kernel 6.16 it's possible for newer models to get the reason for random reboots or shutdowns, and indeed it's reported in my logs:
aug 22 16:12:25 archlinux kernel: x86/amd: Previous system reset reason [0x00300800]: ACPI power state transition occurred
aug 22 16:12:25 archlinux kernel: x86/amd: Previous system reset reason [0x00300800]: software wrote 0xE to reset control register 0xCF9
That sounds like good and bad news, I guess: looks like it's not necessarily a hardware issue as such but some software wrote 0xE (which apparently means 'do a hard full reset') to the Reset Control Register to trigger it. I'm assuming the ACPI power state transition is a result of the write but I'm not sure. All the shutdowns so far had 0xE and one had a line for 0xE and 0x6 (but not the ACPI one) simultaneously.
At first I suspected software update issues since I haven't opened Unity in about 2 months, before which I never had these issues, but as mentioned above downgrades haven't worked so far. The only package I didn't try downgrading yet is systemd 257.8 to 257.x but it seems unlikely this is the culprit given that these are just bugfix releases.
If it's indeed a piece of software that is doing this, how do I figure out what it is? I suppose it must be something elevated to be able to do so (kernel, driver, systemd) and Unity only causing it indirectly. I also don't understand why, at least so far, only having Unity open makes it happen sooner or later.
Last edited by mwohah (2025-10-15 16:57:44)
Offline
Online
Thanks for the reply.
I did find those topics before posting, but I'm not sure they're related (but could be wrong); my system doesn't freeze nor even reboot automatically and actually does a full shutdown/power off (the power LED even goes out). The Ryzen hardware error logs on the Arch wiki page aren't present and it would be strange if they suddenly started occurring after having a stable system for about a year (unless software updates, but tried to rule that out already).
The weirdest bit so far is that, at least for now, this problem only occurs after having started Unity at some point. It's probably causing or triggering the problem indirectly somehow. I've also worked with the exact same Unity version on the same device for months without issue and this issue is new, so something must have changed.
There are some good ideas in the thread though; when I experienced it I was at work with different USB peripherals, might be good to try at home with different peripherals to see if there's some kind of weird hardware issue; though it would also be new since nothing on that end changed recently in my configuration.
Offline
my system doesn't freeze nor even reboot automatically and actually does a full shutdown/power off
Please post your complete system journal for that boot, eg.
sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st
for the previous ("-1") one.
Online
Thanks for the reply. Here is a full log of a problematic boot: https://0x0.st/KiEW.txt . Note that you will still find 'software wrote 0xE to reset control register 0xCF9' in this log because the boot before this one also had the shutdown. The log is fairly short, about 10 minutes, because I immediately reproduced after boot by leaving Unity open for a few minutes, starting to do something else, and then it happened.
The delay after which it happens so far ranged between anywhere from about a minute to ~20 minutes. When it happened I usually wasn't even working in Unity but it was just open in the background - sometimes using Firefox and sometimes writing code. The other set of applications is always the same.
I've also stressed the CPU and GPU yesterday by running some heavyweight games for two hours and it was free of problems.
I'm also making a list of things to test from my end to further narrow down the issue, which I'll mainly post here in case it might help others experiencing similar problems and as reference:
Test at home first. I run the NVIDIA GPU as primary there and have different USB devices. Make no other changes yet to rule out USB device weirdness. Might also point me to AMD iGPU or mesa problems despite Unity running on the NVIDIA GPU (as mutter still copies between GPUs).
Unity is Flatpaked in my case, so it would use the Flatpak mesa if it queries/does things with the iGPU despite it running on the dGPU. Regardless, verify host mesa as mutter runs on the host and still copies between GPUs.
I changed some USB settings in my BIOS a few months ago around not keeping them powered on after shutdown. Try reverting those.
Test if just having UnityHub open causes the problem without even having to start Unity.
I ran tests inside Unity at least once when this happened, try not running tests after starting it.
Try opening a different Unity project.
Try downgrading to kernel 6.5.9 instead of 6.2 LTS (if still possible) as a regression in 6.6 may have been backported.
Try reverting to systemd 257.7-2 over 257.8-1.
Uninstall LenovoLegion kernel driver.
Uninstall VirtualBox and its kernel driver since it was updated to 7.2 last week and may have broken something.
Do SMART tests on two SSDs I have. I have a single filesystem using LVM spanning two SSDs and one may be wonky. A colleague had corruption with this same model of SSD but the symptoms were different and involved filesystem corruption and not being able to mount instead.
Malware inserted through package downloads in the Unity project? Haven't done anything shady from my end recently and seems an unlikely problem for malware to cause since there is little gain to doing something like that. Unity is also Flatpak-ed so that makes exploiting harder.
What happens if I run Unity using the same project in VirtualBox on a Windows VM.
Last edited by mwohah (2025-08-23 09:23:46)
Offline
Because it is like top posting.
Why not?
Please don't "-r".
But ftr, the journal ends abruptly at
aug 22 16:11:49 computer wpa_supplicant[1186]: wlp4s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-43 noise=9999 txrate=286700
so not a clean shutdown, the behavior after the power off is dictated by the firmware (uefi) settings - we're very much in the realm of the links in #2
Online
Thanks for the ideas and helping me rubber duck :-).
Because it is like top posting.
Why not?
Please don't "-r".
Sorry, habit of wanting to see the latest logs in a terminal ;-).
Because it is like top posting.
But ftr, the journal ends abruptly ataug 22 16:11:49 computer wpa_supplicant[1186]: wlp4s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-43 noise=9999 txrate=286700
so not a clean shutdown, the behavior after the power off is dictated by the firmware (uefi) settings - we're very much in the realm of the links in #2
That's indeed true. There is still the matter of the 'restart/shutdown reason' reported by the CPU after Linux comes up again from the OP that I find strange, though - it points to a software reason, but I guess 'software' is still broad enough to encompass UEFI/firmware. It's a rather new diagnostic so there is not really much to find about it yet beyond kernel documentation.
I've been doing more testing since my last post and... the good news is that it hasn't happened at home after several hours of working with the same Unity project where I had it at work. I'm not at the office every day but this narrows down the scope from my list a lot already.
One thing that did change recently at the office that I remember now is that a new Wi-Fi AP was installed that supports Wi-Fi 6 with 802.11ax whilst the old one didn't. It's working very well for me otherwise so far (MediaTek MT7922, which supports that), but perhaps a new code path is being hit somewhere now. It's a long shot and still doesn't really explain Unity's involvement yet beyond 'maybe it happens to do a network request to somewhere that happens to trigger a code path in the MT7922 driver that just so happens to have a bug that causes the crash directly or indirectly'.
Offline
It's been quite the journey already finding the weirdest issue I've ever experienced, but I tested a bunch of things that ruled out most of my list:
Test at home first. I run the NVIDIA GPU as primary there and have different USB devices. Make no other changes yet to rule out USB device weirdness. Might also point me to AMD iGPU or mesa problems despite Unity running on the NVIDIA GPU (as mutter still copies between GPUs). Doesn't happen at home.
Unity is Flatpaked in my case, so it would use the Flatpak mesa if it queries/does things with the iGPU despite it running on the dGPU. Regardless, verify host mesa as mutter runs on the host and still copies between GPUs. Mesa versions don't influence it.
I changed some USB settings in my BIOS a few months ago around not keeping them powered on after shutdown. Try reverting those. No difference.
Test if just having UnityHub open causes the problem without even having to start Unity. It doesn't (so far).
I ran tests inside Unity at least once when this happened, try not running tests after starting it. It still happens if tests aren't run.
Try downgrading to kernel 6.5.9 instead of 6.2 LTS (if still possible) as a regression in 6.6 may have been backported. Deemed irrelevant because it doesn't happen at home with the same version.
Try reverting to systemd 257.7-2 over 257.8-1. Deemed irrelevant because it doesn't happen at home with the same version.
Uninstall LenovoLegion kernel driver. Deemed irrelevant because it doesn't happen at home with it installed.
Uninstall VirtualBox and its kernel driver since it was updated to 7.2 last week and may have broken something. Deemed irrelevant because it doesn't happen at home with it installed.
Do SMART tests on two SSDs I have. I have a single filesystem using LVM spanning two SSDs and one may be wonky. A colleague had corruption with this same model of SSD but the symptoms were different and involved filesystem corruption and not being able to mount instead. No problems found.
What I didn't test yet from the list:
Try opening a different Unity project. Still need to test this.
Malware inserted through package downloads in the Unity project? Haven't done anything shady from my end recently and seems an unlikely problem for malware to cause since there is little gain to doing something like that. Unity is also Flatpak-ed so that makes exploiting harder. Hard to verify, still have to test but testing another project as listed above should be able to help pinpoint this
What happens if I run Unity using the same project in VirtualBox on a Windows VM. Haven't tested this yet because it doesn't happen at home, so it's not Unity per se, not sure what I would learn from this even if it didn't happen.
Other things I tested at work and ruled out:
Unplugging all USB devices, because I use different ones there. No difference.
Test having internal display enabled at home (ordinarily I have just the secondary monitor) because I use it at work as well. No difference.
Using NVIDIA as primary in mutter like I do at home. No difference.
Test using a Realtek Wi-Fi adapter to connect to (another) Wi-Fi network. Still happened, but my main MediaTek adapter was still scanning at that point. This rules out it being an issue just when I'm connected to a Wi-Fi network on the MediaTek adapter (could still be scanning or the presence of a certain network, though).
Keep Wi-Fi at home enabled to scan whilst working on ethernet. Didn't happen here. Rules out Wi-Fi just being enabled on the MediaTek adapter being the issue.
Actually use Wi-Fi at home to work. Didn't happen here. Rules out using Wi-Fi in general on the MediaTek adapter being an issue. Could still be the presence of a certain network during scanning (at work) though.
Put laptop in airplane mode at work and don't use Wi-Fi at all. Hasn't happened so far with this, implies it is somehow related to Wi-Fi.
Test two other splitter plugs at the office. Didn't help.
Test throwing out a daisy chain of splitter plugs. Didn't help.
Test working on battery. Hasn't happened so far, implies power usage somehow also affects it.
Stress test CPU and GPU at work. Didn't cause it.
Update BIOS as I was running two versions behind. Didn't fix it.
So up until now everything seems to point towards two things: Wi-Fi scanning, but in a specific situation, i.e. the presence of a specific network or AP (e.g. the new Wi-Fi 6 one), and/or the wall power socket as it doesn't happen on battery.
This leaves the following things I think I should test:
Disable the new AP and keep Wi-Fi enabled as usual to see if the new Wi-Fi 6 AP or network is causing it somehow.
Use a different power socket altogether.
Test having the power cable unplugged, but with power saving disabled (otherwise it could just be power saving alleviating the issue somehow).
Try another Unity project to rule out malware or something peculiar about this one.
Offline
The other thread pinned it down to libnghttp3 1.11.0-1 - downgrading to 1.10.1-1 seems to fix it.
Online
Thanks for the tip. I have it installed on the host, but I am running Unity in Flatpak and the FreeDesktop 24.08 runtime doesn't seem to contain libnghttp3 at all currently. I tried downgrading it anyway but alas to no avail.
Offline
Wait for the response to https://bbs.archlinux.org/viewtopic.php … 5#p2260955 and see whether you can reproduce the setup.
Online
In the end, it turns out libnghttp3 is not the culprit, removing it from my system just mitigated the issue but didn't totally resolved it. Sorry for the misguidance
Offline
Thanks for the updates and ideas. I did a bunch of more tests as unfortunately the problem is not yet solved for me and I have ruled out some things, but still don't know for sure what the culprit is.
From my last list of things to test:
Disable the new AP and keep Wi-Fi enabled as usual to see if the new Wi-Fi 6 AP or network is causing it somehow. It's not Wi-Fi related. I started enabling my internal laptop screen at home, where I never have Wi-Fi enabled, and now I suddenly have it there too.
Use a different power socket altogether. Done by testing at home as well, where I now also have the problem sometimes.
Test having the power cable unplugged, but with power saving disabled (otherwise it could just be power saving alleviating the issue somehow). It happens equally often with power saving disabled. With it enabled it hasn't happened yet but I doubt it fully fixes it, it likely masquerades the problem somewhat.
Try another Unity project to rule out malware or something peculiar about this one. Had no effect. Tried different Unity versions with different projects, as well as reinstalling the existing Unity versions.
As can be gathered from the above, things seem to point more in the direction of the internal screen, which is wired to the AMD iGPU. At home I have it disabled all the time, but I recently changed something on my external monitor configuration around suspending where it disconnects when it goes in standby, and then GNOME re-enables my internal screen (so I have at least one monitor active). At that point, when I had Unity open as well, I noticed my laptop having rebooted two times at home.
The above bought me upon the path of testing things listed in 'Asus AMD Advanced Optimus Laptop shuts down on amdgpu.', as this seems similar. Unfortunately, though, my cause seems to be different. Things I tested:
Downgrade linux-firmware-amdgpu to 20250613.12fe085f-9 and amd-ucode to a version before late June.
Downgrade linux-firmware-amdgpu to 20250613.12fe085f-8, as apparently so bump 9 still included a backport of commit 71920b7a98f7560181d888d7a7319edf2f434c20 that in turn includes supposedly broken commit cbbce56d6dcc1ec8fb485dfb92c68cb9acd51410.
Downgrade linux-firmware-amdgpu to 20250613.12fe085f-3. Couldn't test because I can't get a graphical session any more here (GDM doesn't show up any more).
Upgrade linux-firmware-amdgpu to 20250917.
Test replacing gc_10_3_6_rlc.bin manually with the version from this commit, which is before the supposedly broken commit.
Test replacing gc_10_3_6_rlc.bin manually with a version from January 2024.
Take all AMDGPU firmware files from 89fd8440ad29dc3536e84661464cdfe2e57c38bd (version from January 2025) and downgrade all of them, verifying they are loaded after boot.
Disabling AMD iGPU through BIOS completely and let the NVIDIA GPU take over everything.
Revert FreeDesktop SDK and Runtime 24.08 (used by UnityHub) to versions from around 11 June 2025. This would revert any potential libnghttp3 I might have missed as well as glibc and other libraries in bulk.
None of these fixed the problem. If this was AMDGPU related I would have expected the BIOS mux change to have remedied it, but it didn't.
It sometimes (rarely) also happens without Unity open, but this is very rare - over all this time, once at home and two times at work.
Further ideas:
Already tested downgrading amd-ucode to somewhere in June, maybe I should go further back?
Downgrade glibc to an older version? Although I'm using Unity in Flatpak, which isn't in sync with the Arch libraries anyway (and I also tried downgrading those, see above).
Try disabling the internal monitor at work to try and pinpoint it to the display somehow causing problems.
I use the ALHP v4 variants of Arch packages, try switching back to the official Arch repositories?
Try a full system revert of all Arch packages to somewhere a few months ago. Might be painful, though.
Offline
I think this is solved. I think it was this one that did it:
I use the ALHP v4 variants of Arch packages, try switching back to the official Arch repositories?
Around August 13th GCC 15.2 entered the repositories. It may be the case that this is when I started experiencing it since there were quite a few weeks that I didn't work with Unity at all.
At that point I recall testing the LTS kernel as well, but I never actually tested reverting the kernel, and it turns out on the same day the LTS build was also bumped and built with the new GCC version.
There may be something in GCC 15.2 that compiles differently with respect to particular processor extensions that it enables that the stock Arch kernels don't that triggers this. This may also mean that users of CachyOS can experience the same problem, but I don't know if the flags used are identical to the ones ALHP uses.
Last edited by mwohah (2025-10-15 16:57:31)
Offline