You are not logged in.
EDIT2: *False alarm. The 7900GRE did not resolve this issue. This is now a completely new build experiencing the same problems as before...*
EDIT: *All of this appears to have been due to a faulty 7800xt. The freezing began before the complete shutdowns/reboots, but it slowly degraded over time. I have edited the title to correct the issue and remove the reference to the specific kernel versions I originally attributed to the problem.*
I'm not sure if this is the appropriate section or if it would more accurately fit under Kernel and Hardware, but the issue seems to only occur when playing games. Valheim/Nordic Ashes were the only two I managed to replicate it on before downgrading back to 6.9.5. After somewhere between 30 mins to an hour of gaming on the 6.9.7 kernel I've been consistently getting complete system freezes requiring a hard power cycle. General system specs are:
CPU: Intel(R) Core(TM) i9-10900X (20) @ 4.70 GHz
GPU: AMD Radeon RX 7800 XT @ 0.25 GHz [Discrete]
WM: KWin (Wayland)
DE: KDE Plasma 6.1.1
Kernel: Linux 6.9.7-arch1-1 x86_64
Downgrading to 6.9.5 seems to have resolved the issue, so I'm assuming it's GPU driver related. Similar topics such as https://bbs.archlinux.org/viewtopic.php?id=293400 seem to be specific to Nvidia, so I'm assuming this is something entirely different.
This is more a question of how I can accurately track logs to view what might have been occurring when these freezes happen. journalctl -b # ends mid stream on some random Unity garbage that has nothing to do with the rest of my system (loading/unloading assets and failed connections to cdp.cloud.unity3d.com because it's blocked by my pihole).
Short of just saying "Big number don't work, and small number do", I'm just curious how I could more accurately provide information for proper reporting on situations like this.
Last edited by Sestren (2024-08-28 17:36:48)
Offline
Check whether you can "safely" reboot by enabling sysrq and rebooting with the REISUB sequence (leave a pause of ~4 secs between key presses) and whether that leaves you with more information in the journal due to being able to flush disk contents: https://wiki.archlinux.org/title/Keyboa … el_(SysRq)
Offline
Well, I dropped my package lock, upgraded back to 6.9.7 (along with whatever else came in the last day or two) and enabled sysrq. I've been playing one of the affected games for the past hour and a half and nothing has happened. I'll update this thread if I can recreate it again.
This is the worst kind of resolution... It was extremely consistent in its failure 2 nights ago. Happened 4 times back to back.
Offline
Upgraded to 6.9.8 last afternoon and I'm experiencing the full-system freezing again today. REISUB and REISUO are non-responsive, and I just tested to confirm that I had the '1' in sysrq (and tested both commands to make sure they work outside of this situation). The last few entries in my journalctl are still the same Unity stuff, but I'll paste it in just for the sake of clarity and confirming the situation:
Jul 11 17:53:33 sestren-desktop steam[5354]: Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Jul 11 17:53:33 sestren-desktop steam[5354]: Unloading 9 unused Assets to reduce memory usage. Loaded Objects now: 197425.
Jul 11 17:53:33 sestren-desktop steam[5354]: Total: 96.938196 ms (FindLiveObjects: 16.962672 ms CreateObjectMapping: 10.342861 ms MarkObjects: 69.223216 ms DeleteObjects: 0.408880 ms)
Jul 11 17:54:11 sestren-desktop steam[5354]: Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Jul 11 17:54:11 sestren-desktop steam[5354]: Unloading 21 unused Assets to reduce memory usage. Loaded Objects now: 199960.
Jul 11 17:54:11 sestren-desktop steam[5354]: Total: 100.080612 ms (FindLiveObjects: 16.781535 ms CreateObjectMapping: 9.976674 ms MarkObjects: 72.821552 ms DeleteObjects: 0.500173 ms)
Jul 11 17:54:23 sestren-desktop steam[5354]: Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Jul 11 17:54:24 sestren-desktop steam[5354]: Unloading 2 unused Assets to reduce memory usage. Loaded Objects now: 200367.
Jul 11 17:54:24 sestren-desktop steam[5354]: Total: 122.993565 ms (FindLiveObjects: 16.978892 ms CreateObjectMapping: 10.281422 ms MarkObjects: 95.372348 ms DeleteObjects: 0.360397 ms)
Jul 11 17:55:21 sestren-desktop steam[5354]: Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Jul 11 17:55:21 sestren-desktop steam[5354]: Unloading 16 unused Assets to reduce memory usage. Loaded Objects now: 200575.
Jul 11 17:55:21 sestren-desktop steam[5354]: Total: 124.262664 ms (FindLiveObjects: 17.339272 ms CreateObjectMapping: 10.063017 ms MarkObjects: 96.468312 ms DeleteObjects: 0.391462 ms)
Jul 11 17:56:09 sestren-desktop steam[5354]: Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Jul 11 17:56:09 sestren-desktop steam[5354]: Unloading 37 unused Assets to reduce memory usage. Loaded Objects now: 202018.
Jul 11 17:56:09 sestren-desktop steam[5354]: Total: 103.577466 ms (FindLiveObjects: 17.035629 ms CreateObjectMapping: 9.934511 ms MarkObjects: 76.215049 ms DeleteObjects: 0.391795 ms)
Jul 11 17:57:23 sestren-desktop steam[5354]: Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Jul 11 17:57:23 sestren-desktop steam[5354]: Unloading 32 unused Assets to reduce memory usage. Loaded Objects now: 202320.
Jul 11 17:57:23 sestren-desktop steam[5354]: Total: 96.551700 ms (FindLiveObjects: 17.268726 ms CreateObjectMapping: 10.294578 ms MarkObjects: 68.573660 ms DeleteObjects: 0.414236 ms)
Jul 11 17:57:44 sestren-desktop steam[5354]: Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Jul 11 17:57:44 sestren-desktop steam[5354]: Unloading 0 unused Assets to reduce memory usage. Loaded Objects now: 202453.
Jul 11 17:57:44 sestren-desktop steam[5354]: Total: 95.280102 ms (FindLiveObjects: 16.994303 ms CreateObjectMapping: 9.532151 ms MarkObjects: 68.028660 ms DeleteObjects: 0.724434 ms)
Jul 11 17:58:01 sestren-desktop steam[5354]: Curl error 7: Failed to connect to cdp.cloud.unity3d.com port 443 after 1 ms: Error
As far as I can tell, I'm kind out of luck in being able to troubleshoot this. If it had been occurring consistently since I bought any component in the computer I would just chalk it up to that, but it seems to be related to specific kernel upgrades (could just be coincidence, I know).
If there are any other suggestions to try to pinpoint anything I'd love to try to contribute something. I know this isn't exactly a popular GPU (or CPU for that matter...).
Offline
I am experiencing very similar symptoms. In my case it has happened while playing video games using Steam (specifically Elden Ring) and once while watching videos on YouTube. In my memory these freezing issues have only started occurring within the past month or so.
In the very same manner, at some point during the game my system will totally freeze with the image staying stuck at whatever happened to be showing at the time and any audio that happens to be playing will freeze on whatever note/tone it was at. Keyboard controls will be totally unresponsive so I am unable to do any sysreq commands or switch to another TTY. I can only power off to work around the freeze. Upon rebooting when I check journalctl, there are zero error messages in the moments leading up to the freeze. There are no coredump files generated according to coredumpctl. Logwise, I cannot find anything to point to what caused the freeze.
I find it notable that we share the same GPU. I suspect it is a GPU driver issue. Have anyone learned anything since the last post?
Specs:
CPU: Ryzen 9 7900X
GPU: XFX Speedster QICK319 Radeon RX 7800 XT
Memory: 32GB Corsair Vengeance 6000Mhz DDR5 CL30
Storage: SAMSUNG 980 Pro 1TB M.2 NVMe
Motherboard: MSI X670E Gaming Plus WiFi
Power Supply: Corsair RM850x
Software:
Graphics Platform: Wayland
DE: KDE Plasma 6.1.2
Kernel: 6.9.9-arch1-1 (NOTE: Freezing issue has noticeably been occurring since 6.9.7 by my recollection)
Last edited by TheRemster (2024-07-14 17:16:30)
Offline
I am experiencing very similar symptoms. In my case it has happened while playing video games using Steam (specifically Elden Ring) and once while watching videos on YouTube. In my memory these freezing issues have only started occurring within the past month or so.
In the very same manner, at some point during the game my system will totally freeze with the image staying stuck at whatever happened to be showing at the time and any audio that happens to be playing will freeze on whatever note/tone it was at. Keyboard controls will be totally unresponsive so I am unable to do any sysreq commands or switch to another TTY. I can only power off to work around the freeze. Upon rebooting when I check journalctl, there are zero error messages in the moments leading up to the freeze. There are no coredump files generated according to coredumpctl. Logwise, I cannot find anything to point to what caused the freeze.
I find it notable that we share the same GPU. I suspect it is a GPU driver issue. Have anyone learned anything since the last post?
Specs:
CPU: Ryzen 9 7900X
GPU: XFX Speedster QICK319 Radeon RX 7800 XT
Memory: 32GB Corsair Vengeance 6000Mhz DDR5 CL30
Storage: SAMSUNG 980 Pro 1TB M.2 NVMe
Motherboard: MSI X670E Gaming Plus WiFi
Power Supply: Corsair RM850xSoftware:
Graphics Platform: Wayland
DE: KDE Plasma 6.1.2
Kernel: 6.9.9-arch1-1 (NOTE: Freezing issue has noticeably been occurring since 6.9.7 by my recollection)
Experiencing the same issue for over a month with Elden Ring. I've got an i9-13900HX CPU and 4080 RTX (mobile). Gnome + Wayland + external monitor.
I've tried every combination of 535, 550 and 555 drivers (open and closed) with various kernel/modprobe settings recommended in the wiki or elsewhere. No dice yet.
Last edited by CuriousRubick (2024-07-14 19:43:24)
Offline
Had this happen again today when attempting to play New World after about 20 minutes. REISUB/O still unresponsive. Ran ~4 hours of memtest86+ with no faults (at least 1 full cycle). I also just finished about an hour of memtest_vulkan endless with zero faults.
Given the fact that I can go weeks without any issues so long as I avoid specific problem games, and that stress tests are turning up nothing, I feel that I can safely say this isn't a hardware issue. Journalctl has something new this time, but it also appears to be unrelated pipewire logging:
Jul 24 15:30:07 sestren-desktop pipewire-pulse[1310]: mod.protocol-pulse: 0x5f46e18b5000: [New World] overrun recover read:2974208 avail:17920 max:15360 skip:14080
Jul 24 15:30:07 sestren-desktop pipewire-pulse[1310]: mod.protocol-pulse: 0x5f46e18b5000: [New World] overrun recover read:2999808 avail:20992 max:15360 skip:17152
Jul 24 15:30:07 sestren-desktop pipewire-pulse[1310]: mod.protocol-pulse: 0x5f46e18b5000: [New World] overrun recover read:3132160 avail:17664 max:15360 skip:13824
Jul 24 15:30:09 sestren-desktop pipewire-pulse[1310]: mod.protocol-pulse: 0x5f46e18b5000: [New World] overrun recover read:353280 avail:17408 max:15360 skip:13568
Jul 24 15:30:09 sestren-desktop pipewire-pulse[1310]: mod.protocol-pulse: 0x5f46e18b5000: [New World] overrun recover read:447488 avail:19456 max:15360 skip:15616
Jul 24 15:30:10 sestren-desktop pipewire-pulse[1310]: mod.protocol-pulse: 0x5f46e18b5000: [New World] overrun recover read:701184 avail:21760 max:15360 skip:17920
Jul 24 15:30:10 sestren-desktop pipewire-pulse[1310]: mod.protocol-pulse: 0x5f46e18b5000: [New World] overrun recover read:788224 avail:22784 max:15360 skip:18944
Unlike TheRemster's previous response, I do not get frozen audio when this happens. Sound stops entirely and I lose all input control. Currently on 6.9.10 with the same de/wm/etc as previously. Full GPU vulkaninfo:
GPU0:
apiVersion = 1.3.278
driverVersion = 24.1.4
vendorID = 0x1002
deviceID = 0x747e
deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
deviceName = AMD Radeon RX 7800 XT (RADV NAVI32)
driverID = DRIVER_ID_MESA_RADV
driverName = radv
driverInfo = Mesa 24.1.4-arch1.2
conformanceVersion = 1.3.0.0
deviceUUID = 00000000-1900-0000-0000-000000000000
driverUUID = 414d442d-4d45-5341-2d44-525600000000
I really feel like I have to be missing something obvious here.
Edit:
As to your response CuriousRubick - you at least can potentially chalk this up to the recent Intel 13/14th gen CPU oxidation/voltage issues. I'm on a 10th gen.
Last edited by Sestren (2024-07-25 01:51:58)
Offline
@Sestren, @TheRemster, @CuriousRubick
If sysrq keys to sync, umount disk and reboot system don't work and you can't do anything, then wait over 90 seconds (default systemd service timeout) and press power button (but don't hold it), wait next 30 seconds. If system don't power off itself, then press and hold power button until system turn off.
The reason for doing this is to check if system hanged completely or not; if not then it may save full journalctl log (which you can post here, so we can see what happened) and turn off system gracefully.
I have had some situations like this, that I thought system hanged definitively, but doing that procedure, the system power off itself saving journalctl log.
Offline
In case @xerxes_ tip doesn't work, try setting up kdump. I was able to get a stacktrace of the kernel crash by using the AUR package described in subsection 1.1. The wiki article is missing a few informations, after installing the AUR package you need to enable the systemd service called `kdumpst-init`, then start it and reboot your system. Then check journalctl to make sure kdumpst is running properly. See kdumpst's readme for details.
Offline
Went a few weeks without playing many games and accidentally dodged this issue for a while. Just had it creep up again today in World of Warcraft, and I attempted @xerxes_ suggestion to no avail. No response from the power button (waited 5 minutes after the "crash" and 5 minutes again after pressing power, then another 5 minutes after pressing it again just in case I didn't hit it hard enough).
I just installed the kdumpst AUR package suggested by @Sidekick and I'm going to see if I can force this to happen again. Just out of curiosity, do you personally use pstore or kdump for log collecting with kdumpst? I'm leaving it on the defaults (pstore) for now, but I figured I'd check while I wait for this to crash again.
Offline
EDIT: ***Start of a different unrelated issue. End is marked at the bottom of the last post related to this. These posts can be ignored in relation to the original topic***
Got another crash today. I don't know if this was because of kdump or if it was something totally unrelated, but this time it actually forcefully rebooted rather than hanging. When I checked for the dump logs I realized that it apparently never loaded properly -
journalctl -b | grep kdump
Aug 14 11:15:08 sestren-desktop systemd[1]: Starting kdumpst loader boot-time service...
Aug 14 11:15:08 sestren-desktop kdumpst-load.sh[836]: Generating grub configuration file ...
Aug 14 11:15:08 sestren-desktop kdumpst-load.sh[872]: Found linux image: /boot/vmlinuz-linux
Aug 14 11:15:08 sestren-desktop kdumpst-load.sh[872]: Found initrd image: /boot/intel-ucode.img /boot/initramfs-linux.img
Aug 14 11:15:09 sestren-desktop kdumpst-load.sh[872]: Found fallback initrd image(s) in /boot: intel-ucode.img initramfs-linux-fallback.img
Aug 14 11:15:09 sestren-desktop kdumpst-load.sh[1051]: Warning: os-prober will not be executed to detect other bootable partitions.
Aug 14 11:15:09 sestren-desktop kdumpst-load.sh[1051]: Systems on them will not be added to the GRUB boot configuration.
Aug 14 11:15:09 sestren-desktop kdumpst-load.sh[1051]: Check GRUB_DISABLE_OS_PROBER documentation entry.
Aug 14 11:15:09 sestren-desktop kdumpst-load.sh[1055]: Adding boot menu entry for UEFI Firmware Settings ...
Aug 14 11:15:09 sestren-desktop kdumpst-load.sh[1065]: done
Aug 14 11:15:09 sestren-desktop root[1067]: kdumpst: kexec won't succeed, no reserved memory in this boot...
Aug 14 11:15:09 sestren-desktop root[1068]: kdumpst: but we automatically set crashkernel for next boot.
Aug 14 11:15:09 sestren-desktop systemd[1]: Finished kdumpst loader boot-time service.
I tried editing the presets for kdumpst as mentioned in the Arch docs to use kdump over pstore. Restarted kdumpst to the same error (although the docs claim it should fallback anyway if pstore fails to reserve memory for whatever reason). Available memory is definitely not the issue -
free -m
total used free shared buff/cache available
Mem: 96236 5274 89138 157 2901 90961
Swap: 36863 1287 35576
I couldn't find anything else on the subject that might allude to why kexec isn't getting a reserved memory slot. So far as I can tell, kdumpst uses the GRUB_CMDLINE config setting to set the crashkernel size, and this is defaulted to the suggested maximum of 256M.
Also checked journalctl again just for shits, but it's just a bunch of random BS that's completely unrelated to the crash as seen below:
Aug 14 10:48:08 sestren-desktop baloo_file_extractor[3133]: Invalid encoding. Ignoring "/home/sestren/.nuget/packages/sharpgen.runtime/2.0.0-beta.13/.signature.p7s"
Aug 14 10:48:18 sestren-desktop teams-for-linux[2588]: [DEBUG] GetSystemIdleState => IdleTimeout: 300s, IdleTimeoutPollInterval: 10s, ActiveCheckPollInterval: 2s, IdleTime: 0s, IdleState: 'active'
Aug 14 10:48:28 sestren-desktop teams-for-linux[2588]: [DEBUG] GetSystemIdleState => IdleTimeout: 300s, IdleTimeoutPollInterval: 10s, ActiveCheckPollInterval: 2s, IdleTime: 0s, IdleState: 'active'
Aug 14 10:48:38 sestren-desktop teams-for-linux[2588]: [DEBUG] GetSystemIdleState => IdleTimeout: 300s, IdleTimeoutPollInterval: 10s, ActiveCheckPollInterval: 2s, IdleTime: 0s, IdleState: 'active'
Aug 14 10:48:41 sestren-desktop baloo_file_extractor[3133]: Invalid encoding. Ignoring "/home/sestren/.nuget/packages/microsoft.win32.primitives/4.3.0/.signature.p7s"
Aug 14 10:48:41 sestren-desktop baloo_file_extractor[3133]: Invalid encoding. Ignoring "/home/sestren/.nuget/packages/vortice.dxgi/2.3.0/.signature.p7s"
Aug 14 10:48:41 sestren-desktop baloo_file_extractor[3133]: Invalid encoding. Ignoring "/home/sestren/.nuget/packages/sharpgen.runtime.com/2.0.0-beta.13/.signature.p7s"
Aug 14 10:48:41 sestren-desktop baloo_file_extractor[3133]: Invalid encoding. Ignoring "/home/sestren/.nuget/packages/vortice.mathematics/1.4.25/.signature.p7s"
Aug 14 10:48:41 sestren-desktop baloo_file_extractor[3133]: Invalid encoding. Ignoring "/home/sestren/.nuget/packages/vortice.directx/2.3.0/.signature.p7s"
The only other thing of note was that dmesg reported that fsck found 4 errors on the boot following the crash, but they were fixed on the next boot and no longer appear. Probably just something that got cut off during the previous crash, but I'm mentioning it for the sake of completeness...
Edit: I should also note that the kdumpst error about failing to allocate the reserved memory for kexec persists even after rebooting and supposedly allowing grub to dedicate that 256M to the duplicate kernel used by kdump. Although the docs kind of suggest that this shouldn't be required anyway.
Last edited by Sestren (2024-08-20 12:56:16)
Offline
I made a few changes today, and I'm writing it here in case it affects any of my previous issues (I was possibly just an idiot the entire time). A while back I replaced two sticks of faulty memory, and only now realized that the timings of these didn't match up. I made the assumption that because they were from the same manufacturer and the same speed/generation (albeit 32g sticks vs 16) that they would match in timings. Boy was I wrong... Set 1 was 16-18-18-36 at 1.35V. Set 2 was 16-20-20-38 at 1.35V. The fact that the voltage matched up and that the timings weren't "too" far off could have potentially been the reason that this issue was so sporadic. I have no clue how Memtest86+ didn't manage to break though.
Set 1 has been removed, set 2 has been moved over and the timings adjusted to their proper values. I also went ahead and flashed my bios to the latest version. I had previously been on the 7B94v13 bios version for the MSI X299 Pro. When I read the release notes for 7B94v14 a while back it didn't look like any of the changes were relevant, but it appears that resizable bar wasn't even an option in the previous release, so I turned that on as well.
I'm hoping that this was just a matter of me having some really shitty memory timing bugs, but I'll update if anything else changes.
EDIT: *** End of unrelated issue. Ignore these two posts in relation to the original topic issue.***
Last edited by Sestren (2024-08-20 12:56:45)
Offline
There were two separate issues going on here. The previous few posts can be ignored as they related to a faulty PSU. It was not actually instability caused by the memory timings.
I recently upgraded my entire build aside from the RX7800XT and I am still getting the same non-logged system freezes that originated around the time of the 6.9.7 kernel update a few months ago.
Swapped mobo from MSI X299 Pro to ASUS B650E
Swapped PSU from some Corsair 750W to Corsair RM850e
Swapped i9-10900X to a 7800X3D
Swapped memory from ddr4-3200 to ddr5-6000
Literally the only components that are the same in this setup are the case/GPU/nvme drives and the screws connecting it to the case standoffs... I spent a few hours testing the video card in a Windows desktop with zero failures. I can safely say at this point that there is a Linux-related issue specific to the 7800XT that arose around the end of June.
Offline
I have been experiencing identical freezes for several months now. Since then, I have also tested or temporarily replaced all possible hardware components and recently even had my processor replaced, as the freezes started exactly after I switched to a 5800X3D and this was apparently the last remaining potential source of error.
Unfortunately, it was not responsible for the problems, as they are still present with the new one.
But I don't think this error is specific to the RX 7800XT, as my configuration is slightly different:
Ryzen 5800X3D, RX 6800, B550 motherboard and 32GB DDR4-3600
It seems to be AMD-related, though, as I cannot reproduce it on a system with an almost identical Arch setup on an i5 8300H and Nvidia 1060.
It also looks like it is a kernel bug that first appeared in the kernel versions around 6.8. I have now switched to the LTS kernel (6.6) and have had no more problems since then.
Could you perhaps test if the LTS kernel fixes the freezes for you too?
Offline
Further clarification on the issue. This has been isolated now down to native Linux builds of games on Steam made in Unity (at least in my particular case).
The crashes in New World and World of Warcraft were both presented differently. New World crash simply freezes, but is able to be closed without locking the entire system. World of Warcraft crashes were related to the previously mentioned PSU issue that was separate from the original topic.
I tried reading through the kernel changelog from 6.9.5 up to 6.9.7 when I first encountered this issue, but the topics related to AMD GPU changes are way over my head. It's possible that this arose from a Mesa change around this time, and that my downgrade back to 6.9.5 didn't "actually" fix anything, but unfortunately, I have no idea what version of Mesa/vulkan-radeon I had when this was last functioning properly.
The issue with getting kdumpst working from another previous post was because I was using the systemd bootloader instead of GRUB, and kdumpst only has configuration options for GRUB. I have set up kdump manually, and I'll try to force another crash later tonight, although my hopes aren't very high given that literally nothing gets logged when these freezes occur. I don't see how kexec is going to know when to log anything.
@nitrescov - I can try downgrading to 6.6 later tonight as well once I've attempted to get something out of kdump. I should also mention that I have tested now with my current setup and a 3060ti with zero issues. It's 100% an AMD issue, even if it isn't specific to the 7000 series (not that there is any huge different between a 6800 and a 7800, xt or otherwise )
Offline
So somehow I have managed to recreate the full system rebooting that I previously thought was due to a faulty PSU... And with the only common powered component being the GPU. After about 2 hours of playing New World today everything froze, sound stopped, monitors went blank after about 5 seconds, and I was forced to hard shut-down (no SysRq response again).
Still nothing in journalctl logs and kdump didn't generate a crash log. I guess I'm at a point where I have to try RMAing the card to see if a new one has the same issues. I've got nothing left to try.
Offline
That's sounds really annoying... but your description of the freezes matches exactly what I've been experiencing. And in my case there was also no chance of getting logs, whether via journalctl, SSH from another PC or kdump.
In the last few days I've extensively tested the games and applications that caused the freezes for me. Since switching to the LTS kernel, however, I haven't had a single problem. In my case, the RMA (of the processor) was completely unnecessary.
Could you already try whether the LTS kernel also helps in your case?
Offline
I tried downgrading the kennel and couldn't force the freezing "crash" in 3 hours of gaming, but I did experience the reboot issue once. I couldn't get my warranty to cover changing to a different model, and after some casual googling, it appears that there are numerous reports of similar issues with 7800xts across multiple manufacturers.
I bought a 7900gre and I'm just going to see if I can sell the new 7800xt after my rma is done. It'll probably be fine, but I can't trust it now.
Offline
Turns out that this was not the fault of the 7800XT (or at least not entirely). I used the 7900GRE for 3 days with no issues. Today I had the same full system shutdown that I experienced previously with the 7800XT. Journalctl logs are still just full of some random garbage for the few minutes prior to the crash (teams-for-linux idlestate messages in this case).
The fact that this is now occurring with quite literally an *entirely* new set of components has led me to look into other external issues, and I may have found another possible cause...
I have this setup behind a 1000VA/600W UPS. I have never seen it spike above max draw (or even come anywhere close), but if this is a transient power spike issue, it's possible that I'm pulling more than the UPS can provide for a short burst. I'm going to see if I can run out tonight and find a proper 1500VA/900W UPS to replace this with.
Offline
I've just been experiencing the exact same issue, i play for 20 minutes to 3 hours, and the freeze happens. This was playing World of Warcraft. no logs.
I have additionally also experienced the same issue in Warframe with quite a different outcome, I ran it in gamescope and whenever the freeze would happen, instead it started stuttering until i restarted the game.
The difference i see is that i run a RX6950XT.
I upgraded to a new PSU a year ago, and i suspect it might be at fault, how did you discover that it was the issue?
Offline
I've just been experiencing the exact same issue, i play for 20 minutes to 3 hours, and the freeze happens. This was playing World of Warcraft. no logs.
I have additionally also experienced the same issue in Warframe with quite a different outcome, I ran it in gamescope and whenever the freeze would happen, instead it started stuttering until i restarted the game.
The difference i see is that i run a RX6950XT.
I upgraded to a new PSU a year ago, and i suspect it might be at fault, how did you discover that it was the issue?
I don't know if you're responding to my most recent post or one of my incorrect assumptions from earlier. I'm now assuming that this is UPS related (battery backup), not the PSU. I only made the conclusion earlier that it might have been the power supply as I was replacing parts for the build piece by piece and failed to run into a crash for a short while after the power supply was replaced. I'll update this post again in a few days after I've managed some testing with the new 1500VA/900W UPS. At least unless it crashes sooner...
I'm a little upset about having to buy CyberPower instead of APC, but I needed something that would be available tonight. Hopefully I don't burn down the house.
Offline
ah okay, no i just encountered a similar issue for a while now, i was considering opening a thread myself, but saw that your issue seemed related. I was asking if you had used any method to deduce that it was the power supply or if it was just pure assumption. thanks
Offline
Welp, it looks like the UPS power output was not the issue. With all other accessories (monitors/usb hub/desk items) on the original UPS and the computer itself as the only connection to the new 900W UPS, I just experienced the full system shutdown/freeze again.
At this point, every single component short of the wall socket itself has been replaced.
Offline
Similar issue on NVIDIA GeForce GTX 1070 with Linux 6.10.7-arch1-1. Playing Age of Empires II: DE, it will freeze at random time. No mouse control, no input. Funny thing is, even if I put power off on tower, the monitory still stays on showing frozen Aoe game. Only after putting power back on, and starting the computer, the monitory will properly unfreeze.
I don't know maybe in my case it's some hardware issue, though it never happens unless running the game.
Offline
This is probably becoming too much of a combination of separate issues. I can't even be certain that the original freezing issue was directly associated with the 7800XT compared to the 7900GRE, or if it's just a weird combination of power draw differences and other tangential issues.
The shutdown/reboot problem was "possibly" (fingers crossed) because of a shorting USB hub. I disconnected the hub 4 days ago and swapped the important components to the rear of the case. In that time I haven't experienced the shutdown issue yet. It's weird that I was not able to replicate this in Windows though. On the plus side, this means the 7800xt is probably perfectly fine and I can save the money on going through the RMA process for it.
The 600W output UPS compared to the 750W and then later 850W PSU was obviously an issue, but I don't think it ever directly caused any problems. The previous 10900X was probably perfectly fine (although I'm happy with the upgrade to the 7800X3D anyway). The miss-timed memory was also an obvious problem, but I don't "think" it ever directly caused any instability. I've just got a pile of semi-decent components to put on Ebay now...
Offline