You are not logged in.
hey all,
I have been having this issue with my archlinux installation where it randomly reboots. I can usually reproduce it easily by playing any movie on VLC and after about ~15 minutes, the system instantly reboots.
This also occurs randomly when doing coding sessions but usually takes a couple of hours before it happens.
System Specs:
CPU: Ryzen 9 7950x
GPU: RTX 3080 Ti
Motherboard: Gigabyte Aero B650 G
Motherboard Firmware version: F31 (F33b is out but don't really want to upgrade yet)
RAM: 32GB
WM: sway
Kernel: x86_64 Linux 6.13.8-arch1-1
Nvidia Driver: 570.133.07
Here's the latest journalctl -b -1: https://0x0.st/8eki.txt
And sudo journalctl -b -1: https://0x0.st/8eko.txt
Also worth mentioning that I dual boot windows and arch. They are both on their own separate disks. I do a lot of gaming on my windows and have never experienced any sudden shutdowns on it.
Last thing I tried was disabling DDR5 Autobooster(Auto by default) but no luck. The rest of the mobo settings are still default.
Any help is appreciated. Thanks!
edit: Fixed mobo's firmware version
Last edited by abe22.9 (Yesterday 23:10:43)
Offline
Your journals seem to be cut off mid-stream without any errors or shutdown messages. It's possible that the restart is hardware triggered and that there are messages that didn't get flushed to disk.
Unless someone can spot a known driver or hardware issue, troubleshooting these issues are very time consuming. First thing to check is CPU/GPU temperature/heat: are your fans running? Monitor your CPU/GPU temperatures and see if there's an upward trend.
In the off-chance you might "catch it in the act", you could try this: on one of your monitors, have a couple of terminals running top/htop (at a very fast refresh), the CPU/GPU temps and two running `journalctl --follow` and `sudo journalctl --follow`, making sure you can see the full messages come by, i.e. not truncated. Then fix a phone/camera on either a tripod (if you have one), or fix it between some books or something. And then on a 2nd monitor (or just running in a tiny window) run a movie with VLC and record until you get a crash. Don't try to hand-hold the phone/camera, because it will become blurry. Also, if the movie is showing, make sure it's not offensive content, in case you want to share it via a link on the forums.
Offline
Your journals seem to be cut off mid-stream without any errors or shutdown messages. It's possible that the restart is hardware triggered and that there are messages that didn't get flushed to disk.
Unless someone can spot a known driver or hardware issue, troubleshooting these issues are very time consuming. First thing to check is CPU/GPU temperature/heat: are your fans running? Monitor your CPU/GPU temperatures and see if there's an upward trend.
In the off-chance you might "catch it in the act", you could try this: on one of your monitors, have a couple of terminals running top/htop (at a very fast refresh), the CPU/GPU temps and two running `journalctl --follow` and `sudo journalctl --follow`, making sure you can see the full messages come by, i.e. not truncated. Then fix a phone/camera on either a tripod (if you have one), or fix it between some books or something. And then on a 2nd monitor (or just running in a tiny window) run a movie with VLC and record until you get a crash. Don't try to hand-hold the phone/camera, because it will become blurry. Also, if the movie is showing, make sure it's not offensive content, in case you want to share it via a link on the forums.
Yes, the fans are running fine and also I have been checking the GPU's temperature and it looks good, but now that you mention it, I haven't been able to configure lm-sensors to get my CPU's temperatures. I will look into that and set up a camera like you suggested.
Thanks for your suggestions.
Offline
Any CPU undervolting?
Offline
There'd better be… https://wiki.archlinux.org/title/Ryzen#Random_reboots
However:
I do a lot of gaming on my windows and have
…hopefully read the 3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
Otherwise windows might simply be rebooting the system to run "important" updates.
Edit: sarcasm doesn't work on the internet, so adding quotes.
Last edited by seth (2025-03-30 06:18:08)
Offline
Any CPU undervolting?
Not that I know of. I checked my BIOS config and they all appear to be on default settings. The majority of them are on "auto".
Took a few screenshots of the BIOS config:
- Tweaker Settings
- Tweaker Settings > Advanced CPU settings
- Settings > Manual CPU Overclocking
- Settings > Memory Subtimings Settings
There'd better be… https://wiki.archlinux.org/title/Ryzen#Random_reboots
However:I do a lot of gaming on my windows and have
…hopefully read the 3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
Otherwise windows might simply be rebooting the system to run "important" updates.Edit: sarcasm doesn't work on the internet, so adding quotes.
Thanks, I will give this a try after a few more tests.
Yesterday I got lm-sensors to work by following these instructions and I have been monitoring temps, load and fan speeds with CoolerControl. So, far I was able to have a coding session without issues while running a movie on the background (this would usually cause a reboot after ~15mins). I really don't think this was the issue but it's still odd. I haven't logged into my windows when doing all this so I might boot Windows > Shutdown > boot arch and start playing movies and doing some more tests.
Thanks all for your replies.
Offline
I have been able to crash it 3 times in a row now. I managed to set up a video and monitor journalctl and temperatures but nothing comes up.
This is a screenshot of the video right before the crash. On the left you can see temps, fan speeds, load, etc. On the top right I have sudo journalctl --follow and on the bottom right journalctl --follow.
I could upload the 20s video but its about 200mb.
Worth mentioning that I had not booted into windows.
edit: Things I had running:
1. 2 npm run start servers.
2. About 3 zed editor instances
3. Ran a few cargo test commands (This seemed to trigger reboots easier)
4. VLC on the background.
Last edited by abe22.9 (2025-03-30 18:30:20)
Offline
"crash" means cold reboot?
Have you disabled windows fast-start (and rebooted both OS twice afterwards? The voodoo thing is a joke, but the necessity has been reported several times)
Have you increased the voltage to the cores and/or disbled PBO?
Edit: english.
Last edited by seth (2025-03-30 20:17:57)
Offline
"crash" means cold reboot?
Have you disabled windows fast-start (and rebooted both OS twice afterwards? The voodoo thing is a joke, but the necessity has been reported several times)Have you increased the voltage to the cores and/or disbled PBO?
Edit: english.
Yes, by crash I meant cold reboot, my bad.
- Fast start has been disabled for months now.
- I have not increased the voltage yet
- Precision Boost Overdrive (Enhancement) was disabled and just noticed a different Precision Boost Overdrive setting inside of Tweaker Settings > Advanced CPU Settings. Disabled both now.
- I just rebooted both OSes twice now.
I will try disabling C states and adding idle=nomwait next.
What I just noticed now is that when rebooting from Windows, the restart process would take a while to complete and monitors would just turn black while RBGs are still on. The DRAM led is turned on at that moment and after about 2 minutes, it would show the BIOS splash screen as usual. After this happens, it would only recognize only 1 out of my 2 RAM sticks.
I had this happen twice before but I thought I was not placing the ram sticks correctly. This sounds like a similar issue to mine.
edit:
The DRAM led issue occurred after rebooting Windows for the second time and I just tried this:
- After rebooting Windows the 2nd time, DRAM led was on and took ~2mins to get to the bios splash.
- Booted into arch and noticed 16GB out of 32GB ram.
- Rebooted arch
- Wrote this post.
- Perfomed another reboot, went into BIOS and saw only 16GB being recognized.
- Performed a full shutdown from arch, waited a few seconds and turned it back on.
- Went into BIOS and now the 32GB were recognized. (I did not physically move or touched the RAM sticks)
When this occurred in the past, I tried swapping the RAM sticks as well but the problem it still occurred.
edit: typos
Last edited by abe22.9 (Yesterday 19:56:44)
Offline
What is the azure/blue line at the top of the graph? I mean the one that's clipped at "100" two times and looks like it was on its way up there again before the restart? I hope that's not a temperature, assuming your temps are in Celcius? If it's system load at 100%, then that's very high for modern CPUs if you're just running a movie and some text editors.
Half your RAM not being recognized in the BIOS is a serious problem. It'll be hard to argue against else but a hardware problem at that point. Have you done an exhaustive RAM check on those sticks?
Offline
What is the azure/blue line at the top of the graph? I mean the one that's clipped at "100" two times and looks like it was on its way up there again before the restart? I hope that's not a temperature, assuming your temps are in Celcius? If it's system load at 100%, then that's very high for modern CPUs if you're just running a movie and some text editors.
Half your RAM not being recognized in the BIOS is a serious problem. It'll be hard to argue against else but a hardware problem at that point. Have you done an exhaustive RAM check on those sticks?
- Blue lines are Fan related
- Top is the Fan Speed in %
- There's some on the middle and bottom which I are also fan speed percentages.
- Red are CPU related
- Middle ones are CPU temps
- Bottom sort of like dotted red lines is CPU load %.
- Yellow ones are GPU related
- Middle one is temp
- Dotted one is load %
- Orange one is labeled as GPU Temp Edge
I have been checking temps and load and they are pretty normal when the cold reboots occurs.. nothing above 90C or 100% cpu/gpu load.
About the RAM issues, I just ran Windows Memory Diagnostic with both sticks recognized and no errors were reported. Not sure if you guys could recommend a testing tool for it? Also, I have swapped the memory sticks from place and the issue still occurs, so it might be something with the motherboard's firmware or the motherboard itself.
I should mention that I recently bought the motherboard, CPU, RAM and PSU. For the PSU, I bought the Corsair RM1000e.
Offline
About the RAM issues […] you guys could recommend a testing tool for it?
memtest86+, you want to run that for hours (or rather days…) to be able to say that your RAM is ok.
If the board (and ultimately the RAM) doesn't get enough voltage/current because of sideload (CPU/GPU) you'll not get any RAM errors there.
Everythings got dedicated power supply (notably the GPU) and the GPU is slotted in the PEG?
Have you tried just removing one of the DIMMs and whether that stabilizes the system?
Or whether you can crah it by stressing your CPU xor GPU?
Offline
memtest86+, you want to run that for hours (or rather days…) to be able to say that your RAM is ok.
Ok thanks, I'm thinking I could try this if I start experiencing issues on Windows.
If the board (and ultimately the RAM) doesn't get enough voltage/current because of sideload (CPU/GPU) you'll not get any RAM errors there.
Everythings got dedicated power supply (notably the GPU) and the GPU is slotted in the PEG?
Have you tried just removing one of the DIMMs and whether that stabilizes the system?
Or whether you can crah it by stressing your CPU xor GPU?
Yes, everything has dedicated power supply. I assume it is slotted in the PEG.
I could try removing one of the memory slots and see what happens and could also try stressing CPU/GPU.
I just got a few cold reboots after I made these changes:
- Disabled PBO
- Added idle=nowait <-- noticed that it should be "nomwait"
- Added processor.max_cstate=5
I'm wondering if I should try a different distro? Maybe after running some other tests on arch. Thinking about this since I have no issues on Windows.
Offline
I control my CPUs power from PBO with bit lower default values for power and current.
If you want to disable C6 from BIOS try "Global C-state Control" and maybe "Power Supply Idle Control" if you have them, under AMD CBS. Try disabling Spread Spectrum.
Check RAM voltage, if on XMP it should have 1.35V. Monitor various voltages, SoC etc with some tool, have no clue what is available for Zen 4. Sometimes SoC + small offset helps depending on your RAM.
Offline
I control my CPUs power from PBO with bit lower default values for power and current.
If you want to disable C6 from BIOS try "Global C-state Control" and maybe "Power Supply Idle Control" if you have them, under AMD CBS. Try disabling Spread Spectrum.
Check RAM voltage, if on XMP it should have 1.35V. Monitor various voltages, SoC etc with some tool, have no clue what is available for Zen 4. Sometimes SoC + small offset helps depending on your RAM.
I recently added idle=nomwait and still got the cold reboot. As you recommended, I disabled the Global C-state from BIOS now after the previous crash so I am testing it now. As you can see on the screenshot, I do not get a "disable" option for Power Supply Idle Control.
I have also been running some CPU+memory intensive jobs such as cargo build --release on some medium sized projects along with a movie on the background, firefox, obs, and other programs and everything was working fine. Ran the cargo build a couple of times and nothing happened. I left the movie for a while without doing much else and then I got the cold reboot. This is a screenshot that shows CPU load and memory when I was running the load test (not when it crashed)
I also ran strain for a couple of minutes and nothing happened.
edit: Being more specific about screenshot.
edit2: I could also try and upgrade the motherboard's firmware from F31 -> F32?
Last edited by abe22.9 (2025-04-01 03:49:17)
Offline
As you can see on the screenshot, I do not get a "disable" option for Power Supply Idle Control.
Choose Typical Current Idle, that disabled C6 on my system.
You can upgrade your BIOS if the latest one is not a beta version, those are not good with stability, mostly for important security issues.
Crashes don't happen under load (in my case), they usually happen under light load. Hence disabling C6 might help. Somthing about cores not getting high enough voltage. If you still have the issue after disabling C6 try going to curve optimizer and add a positive offset of +4 to all cores. That should up the voltage on all of them, though not by much. This +4 boost might help the weakest ones not crash under low voltage conditions.
Last edited by qu@rk (2025-04-01 05:44:18)
Offline
But what I do not understand is when OP says that under Windows, playing games no less, there is no problem with these restarts. Doesn't that point to a driver issue? How would the hardware load be different if not for the software instructing it?
Edit: is running Windows 32 bits still a thing? That could be a major difference, of course.
Last edited by twelveeighty (2025-04-01 14:15:45)
Offline
The huge red flag here is the disappearing RAM, regardless of OS.
The ryzen specific issue doesn't happen under maximum load but power state changes (corecycler under windows can trigger it) and the solution is to increase the voltage to the CPU (what implicitly will heat them up, restricting the overdrive capacity of single cores) - the system will be slower and consume more energy, but be stable.
And that's likely the condition you get under windows as well, since it's basically a HW issue that gets triggered by overagressive optimization.
Offline
Quick update: I did qu@rk's suggested changes on the BIOS. I disabled from the BIOS the Global C-State and changed Power Supply Idle Control = Typical Current Idle.
I have had arch running the entire day without a single reboot. I have done occasional cargo build --release just to spike CPU usage temporarily while having movies playing all day and everything seems fine.
I haven't used arch today like I usually do for coding so I will keep posting updates during the week.
Thank you all again for the support.
Offline
Please mark your post SOLVED by editing your first post and pre-pending SOLVED to the title. This issue may come up for others and then they know this post may have a solution for them as well.
Offline
Please mark your post SOLVED by editing your first post and pre-pending SOLVED to the title. This issue may come up for others and then they know this post may have a solution for them as well.
Yes, of course. I would like to test the changes for at least 2 more days just to make sure if that's ok with you guys.
Offline
Another update: Had PC running all day again and no reboots occurred. Marking thread as solved. Thanks all for your help!
Offline