My instant test to know if I'm hitting this problem is to try and run ARK Survival (apparently it's very intensive on the GPU ). There might be other ways to trigger the bug; but this one tells me immediately if the machine is stable or not; and it causes problems on both Arch and Windows.
XMP definitely plays a role, but maybe it's not the only thing.
1. With default BIOS settings ("default" like "the way it came out of the box"), I was able to set everything up and use the system alright, including playing ARK for a few hours.
2. Then, at some point, after reading about XMP, I decided to enable it, because it seemed a pretty safe thing to do (hahaha). Everything seemed fine at first; but I didn't play ARK right away...
3. ... And then a few days later, I try to run ARK, and it wouldn't work. On Linux, the video would freeze/unfreeze repeatedly. At first it looks like the machine is dead, but if I wait long enough (anywhere between a few seconds and a minute), the video will unfreeze for a brief moment, then freeze again; rinse and repeat. I can still log in with SSH. Very similar symptoms as the ones described by @Onyros here.
4. Eventually I try it on Windows as well, and I have a similar issue. When starting the game, the music stops after about 1 second, display is frozen. However, the OS is not dead; I can start the task manager, kill ARK, and it recovers.
5. ... After hours of looking around, I found a few forum posts recommending to disable XMP. So I disabled XMP, but that didn't change anything. Game still crashes immediately. However, when I did "Load optimized default settings" in the BIOS, that fixed it! (So it looks like even just turning XMP on and off leaves something different in the BIOS.)
6. ... A couple of days later, the problem happens again. I don't even think too hard about it, go to the BIOS, "load optimized default settings", reboot, and it works. (???)
7. ... A couple of days later, the problem happens AGAIN, but this time, "load optimized default settings" doesn't change anything. Game still crashes instantly.
8. ... A few hours of looking around, and I see folks suggesting to upgrade the BIOS. I was running version F10, I download version F13G, upgrade, and ... now it works.
Until next time. ?
This is very much in line with the explanations above, i.e. the GPU being a bit more sensitive to memory timings/voltage than the CPU. I am now wondering:
- is there a program similar to memtest86, but that would work from the GPU instead of the CPU?
- how to determine safe/stable timings and voltages?
- why did my system go spontaneously from "it works" to "it doesn't work"?
(For the last item, I suspect that it might have been caused by the Gigabyte software that I installed - on Windows - to try and get some metrics - voltage, temperature, etc - to figure out what was going on. Maybe that software, or maybe the Radeon software, changed the voltage and/or memory timings to "optimize performance".)
]]>I wrote to Gigabyte about this problem, and hopefully they will do something about it, as this error is OS independent (though windows recovers the GPU crash gracefully).
Meanwhile, the solution is to disable XMP or manually set your ram timings & SoC voltages so that they don't fluctuate.
Apparently ASUS already fixed this, so it works for you now with the latest BIOS.
]]>If it does benefit anyone else, I can add journalctl's output with the errors on display, but the relevant lines are those on my first post, I believe.
All in all, I believe that the old BIOS version, which was from the beginning of the year, might not fully support the Ryzen 4750G yet, hence those problems.
]]>Onyros, please post full journal output from one of the crashes, also lspci -k .
GRUB with early KMS, a very vanilla mkiniticpio
Normally early KMS is configured in mkinitcpio.conf , please post your modules= line .
Some things to check :
- Is microcode updating configured ?
- Do you have the latest EFI firmware for your system installed ?
- All the programs you mention (slack, vivaldi, chromium ) appear to be using chromium.
Have you tried using other browsers that are not chromium-based like firefox (gecko) , midori(webkit) or falkon(qt5-webengine) to see if those also crash ?
Rookie mistake: did a BIOS update. Hadn't thought of that, somehow, and didn't really think it could be so dramatic, in this case. I did notice that after the BIOS update the GPU seems to be performing a wee bit slower: it stutters a bit in 4K video, which was surprising, with dropped frames and all, whereas before it dropped few, if not none.
The motherboard is an Asus TUF GAMING A520M-PLUS.
Haven't had a freeze since, so this really had nothing to do with the APU itself. I'm going to mark this one as solved. PEBKAC to a point
As for the 4750G, I bought it as part of a set, as it wouldn't be available otherwise (only sold in bulk or OEM), and then just assembled it myself, but I intend to explore moving it from the current setup to an ASRock Deskmini X300 -- as it's in a mid-tower right now and I like a cleaner desk.
]]>From the spectre checker script from https://github.com/speed47/spectre-meltdown-checker:
CPU microcode is the latest known available version: NO (latest version is 0x8600106 dated 2020/06/19 according to builtin firmwares DB v165.20201021+i20200616)
The version from the journal log is 0x8600104 which is indeed not as up to date as that quoted in the spectre checker output - but I don't know where the most recent version is.
$ pacman -Q amd-ucode
amd-ucode 20201023.dae4b4c-1
I don't know if the microcode in your 4750G machine is up to date as referenced if you run the spectre checker? Also I don't know where the more recent microcode for AMD CPUs is available, since I had thought that the package amd-ucode would have the latest available version and it could then be updated during boot by including the microcode image file as a boot parameter, which has always worked for me for Intel CPUs. Is this something worth pursuing?
]]>I'm not running a DE, I'm running straight up DWM as a window manager, and no display manager. I startx into my X graphical session as needed (a customized xinitrc) -- as it's DWM, it's Xorg.
I'm using GRUB with early KMS, a very vanilla mkiniticpio (as I'm running on my laptop) with no specified added modules or binaries, and all standard hooks. I had the same setup on the Ryzen 2500U, and it's the same on the 4500U -- the only one giving me trouble is this 4750G, really.
There's a rotten combination of Slack, Chromium (with Meet) or Zoom sometimes -- all of which I need for work -- and Vivaldi. I've tried many a combination, installed mesa-git, downgraded the kernel, tried several combinations of (to me) obscure kernel parameters that I found people with similar problems, but older setups, using.
I saw some info about C-States freezing Ryzen systems in the past -- and tried just because I've tried many other things as well (tried preventing from 1, 5, and 6 with kernel params) -- but the problem doesn't seem to be related to that, as the system does keep running, but the it's just the graphics that freeze.
I'm at wits end, as I'd expected it to be as smooth as on my Lenovo laptop with the 4500U or my older Ryzen Mini PC.
]]>I'm getting intermittent freezes on a new system I've built, running with a Ryzen 4750G APU on the latest 5.9.8 stable kernel. I'm running DWM 6.2, and AMDGPU drivers (+ vulkan-radeon). I have the latest stable linux-firmware and amd-ucode, too.
The issue usually arises when I'm on conference calls -- with Zoom or Google Meet -- but it's also happened just while watching YouTube videos, so it's been quite unstable. I just moved from another Ryzen system, a little older, with a 2500U APU and none of these issues were happening there.
Usually the graphics freeze, while the system keeps running in the background, and it's always unrecoverable. Sometimes it does throw me back to the command line and kills X -- but when I restart X performance is very shaky and it'll eventually just freeze again -- others it'll just hang X completely to the point I'll have to hard reset. (I still hear sound in the background, and people have stated they keep seeing me move on my webcam).
I'm getting instances of these in dmesg:
[ 85.250136] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 90.380019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=15114, emitted seq=15116
and these errors on journalctl:
Nov 12 15:26:12 voidskin kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 12 15:26:12 voidskin kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 12 15:26:12 voidskin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=890533, emitted seq=890535
Nov 12 15:26:12 voidskin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process chromium pid 1907 thread chromium:cs0 pid 1933
Anyone have any ideas on how to further troubleshoot this, or any additional things I may try out? I've tried a few kernel parameters, to no avail, probably not the right combination. Anyone's had trouble with the Renoir APUs and managed to get around this kind of trouble?
]]>