You are not logged in.

#26 2023-12-15 09:23:31

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

Got a reboot now with powersave governor, stock voltage. It feels like it takes longer for reboots to happen with higher voltage or powersave, but it still happens.

Offline

#27 2023-12-15 21:39:31

seth
Member
Registered: 2012-09-03
Posts: 51,462

Re: Random reboots with green screen and mce events : bea0000000000108

And if you increase the voltage along the powersave governor?
(Though it already sounds as if the problem is the/some cores being undervolted)

Online

#28 2023-12-16 00:34:31

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

seth wrote:

And if you increase the voltage along the powersave governor?
(Though it already sounds as if the problem is the/some cores being undervolted)

I will try during next week. Weird thing is it has worked flawlessly since 2019 until now. I should try using Windows more even though I really don't like it, to see if it happens there.

Offline

#29 2023-12-16 06:53:39

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

Actually it rebooted within minutes at 1.262v and powersave governor, so limited to 2.2GHz. Trying 1.3v now but I'm not going higher because I don't want to kill my CPU / shorten lifespan.

Edit: and it rebooted within minutes again at 1.3v + powersave, time to test using Windows.

Last edited by micke1m (2023-12-16 07:02:38)

Offline

#30 2023-12-21 00:41:44

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

A progress report from me.
Turns out Windows was stable, stock voltage and all, this is a software issue.
So I've been testing rolling back arch a bunch and the results are:
linux 6.5.9.arch2-1 (arch linux 2023-11-08) = stable as far as I can tell
linux 6.6.1.arch1-1 (arch linux 2023-11-09) = crash
linux-mainline 6.7rc6-1 = crash

Offline

#31 2023-12-21 14:54:40

seth
Member
Registered: 2012-09-03
Posts: 51,462

Re: Random reboots with green screen and mce events : bea0000000000108

https://wiki.archlinux.org/title/Ryzen#Random_reboots

The wiki wrote:

It seems that out of the box, Windows seems to run the CPUs at higher voltage and lower peak frequencies, compared to the stock linux kernel, which depending on your draw from the silicon lottery could cause a host of random application crashes or hardware errors that lead to reboots … To solve this problem you need to supply higher voltage to your CPU so that it is stable when running at peak frequencies. The easiest way to achieve this is to use the AMD curve optimiser which is accessible via your motherboard's bios.

Since the OP apparently did address this, though and there're eg. https://bbs.archlinux.org/viewtopic.php?id=290159 and https://bbs.archlinux.org/viewtopic.php?id=286060 this might be rather about the GPUs (w/ the reboots being a follow-up issue)?

Online

#32 2023-12-21 21:16:05

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

In my case this is a software issue. It has nothing at all to do with voltage. My computer has worked flawlessly for almost a year since the last hardware change and linux 6.5.9 still does work with no issues, as soon as I get a newer kernel than 6.5.9 I get reboots.
I will simply run 6.5.9 from now on until it gets fixed, I don't have the knowledge to make it happen.

Offline

#33 2023-12-21 21:34:01

seth
Member
Registered: 2012-09-03
Posts: 51,462

Re: Random reboots with green screen and mce events : bea0000000000108

You might be running into the situation because of the new GPU or the newer kernel doing other things to the GPU or CPU but spontanous reboots are
- undervolted
- overheated
- cosmic rays™

A "software issue" does not spontanously reboot the system, let alone w/ MCE errors.
Also

micke1m wrote:

it takes longer for reboots to happen with higher voltage or powersave

Is the LTS kernel affected?

Online

#34 2023-12-21 21:40:09

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

seth wrote:

Is the LTS kernel affected?

I guess LTS is older than 6.5.9 so it probably works.

As I've said I went all the way up to 1.3 volt and still got reboots, 1.3 volts is very high for not overclocking. Windows doesnt even go over 1.16v no matter the load.

I also have performance boost / precision boost disabled.

Last edited by micke1m (2023-12-21 21:41:32)

Offline

#35 2023-12-21 21:43:09

seth
Member
Registered: 2012-09-03
Posts: 51,462

Re: Random reboots with green screen and mce events : bea0000000000108

Did you change the curve optimizer or just manipulate the core voltage?
I'd rather not ass·u·me the kernel behaviors, but my guess is that the problem is actually the GPU because there're A LOT of recent issues w/ amdgpu and the 7800/7900 cards specifically.
So if you can, I'd try the newer kernel w/ a different GPU.

But, as mentioned before, a GPU issue does typically not reboot the system (and that's also not the pattern we see w/ all the amdgpu related errors)

Online

#36 2023-12-21 22:01:57

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

seth wrote:

Did you change the curve optimizer or just manipulate the core voltage?
I'd rather not ass·u·me the kernel behaviors, but my guess is that the problem is actually the GPU because there're A LOT of recent issues w/ amdgpu and the 7800/7900 cards specifically.
So if you can, I'd try the newer kernel w/ a different GPU.

But, as mentioned before, a GPU issue does typically not reboot the system (and that's also not the pattern we see w/ all the amdgpu related errors)

I just changed the core voltage.

My GPU has worked without issue since I got it in January, and my computer still works without issue with 6.5.9, anything newer gives me reboots.

How can it not have anything to do with the kernel since changing it makes all the difference. I will try 5700 XT when I have the time.

Offline

#37 2023-12-21 22:08:33

seth
Member
Registered: 2012-09-03
Posts: 51,462

Re: Random reboots with green screen and mce events : bea0000000000108

Again: I'm not saying that the kernel making use of hw features or maybe a bug in the amdgpu module doesn't trigger this (like, some GPU access draws to much power there and the CPU starves), but unless it's overheating or the gods really. REALLY.  hate you, the system mst likely reboots because the GPU is underpowered.
Did you see the ryzen article and did you try to adjust the curve optimizer?

Online

#38 2023-12-22 00:22:29

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

seth wrote:

Again: I'm not saying that the kernel making use of hw features or maybe a bug in the amdgpu module doesn't trigger this (like, some GPU access draws to much power there and the CPU starves), but unless it's overheating or the gods really. REALLY.  hate you, the system mst likely reboots because the GPU is underpowered.
Did you see the ryzen article and did you try to adjust the curve optimizer?

There doesn't seem to be anything called "curve optimizer" in my bios but I tried setting vcore loadline calibration to high and still got a reboot.
I've tested a bit with linux-mainline and 5700 XT and so far it does not seem to reboot at least.
But I will have to go back to 6.5.9 to use my newer GPU, but I will test a bit more with the old one just to be sure it's stable.

Offline

#39 2023-12-30 00:32:19

micke1m
Member
Registered: 2017-08-27
Posts: 13

Re: Random reboots with green screen and mce events : bea0000000000108

Now I got a reboot with the same type of mce error on a completely different computer with 6.6.3-arch1-1.

Ryzen 1800x
Radeon R9 Fury

Offline

#40 2024-01-15 20:34:42

topasiss
Member
Registered: 2022-08-08
Posts: 50

Re: Random reboots with green screen and mce events : bea0000000000108

My machine also had about 5 or more hard reboots with the mce mentioned in the topic title on the following boot.
I don't know if it happened after my recent upgrade to the current system, or a while after, since I upgraded from Intel to AMD in September or October last year and the issue started at the end of the year.

journalctl

doesn't go back that far.

The current hardware is

- AMD Ryzen 5 5600X 6-Core Processor
- AMD Navi 23 [Radeon RX 6600/6600 XT/6600M]
- ASRock B450 Fatal1ty Gaming-ITX/AC AMD B450 So.AM4 Dual Channel DDR4
- 32GB Corsair Vengeance LPX DDR4-3200
- PSU 450 Watt Corsair SF Series SF450

while the CPU is most likely most important here.

I'll try to increase the voltage per core (curve optimizer) in BIOS.

micke1m wrote:

My GPU has worked without issue since I got it in January, and my computer still works without issue with 6.5.9, anything newer gives me reboots.

If it fails I'll test if this older kernel doesn't trigger the issue.

Offline

#41 2024-01-15 21:00:09

topasiss
Member
Registered: 2022-08-08
Posts: 50

Re: Random reboots with green screen and mce events : bea0000000000108

agapito wrote:
Nathan67 wrote:

I typically get this error:

nov. 14 23:56:05 pc kernel: mce: [Hardware Error]: CPU 7: Machine Check: 0 Bank 5: bea0000000000108
nov. 14 23:56:05 pc kernel: mce: [Hardware Error]: TSC 0 ADDR 35913f0e6 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
nov. 14 23:56:05 pc kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1700002559 SOCKET 0 APIC 3 microcode a20120e

That message means your Core 3 needs more voltage.

I need some help with identifying the core that needs to be pushed in voltage.
Why is it core 3 thats affected when the mce states CPU 7?

According to my 6 core CPU

cat /proc/cpuinfo
[...]
processor	: 7
[...]
core id		: 1
[...]

Offline

#42 2024-01-26 18:56:06

topasiss
Member
Registered: 2022-08-08
Posts: 50

Re: Random reboots with green screen and mce events : bea0000000000108

I did use the curve optimizer.
I add 10 increments (positive) for every failing core initially and everytime it failed again I added another 5 increments.

I counted from failing mce CPU number to core number in the curve optimizer using the logical cpu core index. See "PU L#" in the LSTOPO screenshot.
https://i.postimg.cc/pL6rHBvg/Screensho … LSTOPO.png

I ended up with very high values. Core 0 was already at 30 positive increments and was then failing again. There I stopped.

Last edited by topasiss (2024-01-26 18:56:52)

Offline

#43 2024-01-26 19:00:10

topasiss
Member
Registered: 2022-08-08
Posts: 50

Re: Random reboots with green screen and mce events : bea0000000000108

From where do you know that the mce uses the logical index for counting CPUs instead of the physical (PU P#)?
"/proc/cpuinfo" also seems to use the physical index.

Offline

#44 2024-01-27 21:04:18

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 662

Re: Random reboots with green screen and mce events : bea0000000000108

topasiss wrote:

Why is it core 3 thats affected when the mce states CPU 7?

          CPU 0
Core 0   
          CPU 1


          CPU 2
Core 1 
          CPU 3


          CPU 4
Core 2 
          CPU 5


          CPU 6
Core 3   
          CPU 7

Last edited by agapito (2024-01-29 21:10:13)


Excuse my poor English.

Offline

#45 2024-01-27 21:08:24

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 662

Re: Random reboots with green screen and mce events : bea0000000000108

As long as we are not talking about defective hardware, I can guarantee that if each of the cores of your CPU is capable of overcoming 10 hours in CoreCycler without errors, you will never again suffer a spontaneous reboot.

Try this settings on Core Cycler using Prime95 mode:

runtimePerCore = 600m
FFTSize = 720-720
suspendPeriodically = 1
mode = SSE

Start testing your best/fav core while you are sleeping.

I had random reboots in all the Zen 3 processors I have used on my motherboard, using the stock-auto settings. After many hours of testing and calibration using CoreCycler and Curve Optimizer I have not had any in almost 2 years.


Excuse my poor English.

Offline

#46 2024-02-17 07:06:52

topasiss
Member
Registered: 2022-08-08
Posts: 50

Re: Random reboots with green screen and mce events : bea0000000000108

I tested a suggestion from the discussion in the linked kernel discussion to set up a specific GPU feature mask.

Setting amdgpu.ppfeaturemask=0xffffbffb the reboots are gone in two weeks testing - that's why it took me so long to reply. I now try to find out which bit it did exactly. Two bits were changed.

I don't want to say that it cannot still be a CPU undervoltage issue which is not triggered anymore by disabling GPU features. I'll worry about the CPU issue later. Currently I am happy to workaround these reboots.

Just to let you know. I am interested in your thoughts about disabling GPU features and the CPU undervolted issue.

Offline

#47 2024-02-17 10:19:53

seth
Member
Registered: 2012-09-03
Posts: 51,462

Re: Random reboots with green screen and mce events : bea0000000000108

Online

#48 2024-02-19 00:31:35

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 662

Re: Random reboots with green screen and mce events : bea0000000000108

topasiss wrote:

Just to let you know. I am interested in your thoughts about disabling GPU features and the CPU undervolted issue.

I think there are at least two causes with the same consequences (random reboot) and that's why nobody finds a definitive solution. I personally have suffered from both.

The first one is the most widespread (Zen 3/4 lack of voltage on some cores) causing a reboot in idle conditions. When this error happens, it usually leaves those typical "kernel: mce: [Hardware Error]: CPU X: Machine Check: 0 Bank" or "mce [Hardware Error]: System Fatal error" and the infamous WHEA-18 error on Windows OS.

The only way to solve this problem is by individually calibrating each of the cores with the help of the CoreCycler program and Curve Optimize or if you don't want to waste too much time testing just add a +10 to all the cores in the curve, but I don't recommend it if you want to get the maximum performance out of the CPU.

Some people claim that deactivating the C-States has solved the problem, but this is not entirely true. What happens is that when a core does not shut down completely, it does not need as much voltage when it wakes up. For example, a core that with the C-States deactivated is 100% stable with a -20 on the curve, may need a -12 when you activate the C-States. Even a core that passes a lot of hours the Core Cycler test successfully could still fail, so I recommend adding 3 or 4 more points for safety and to anticipate future degradation of the CPU. Taking the above example I would set a -17 or -9 to that core, depending on whether or not the C-States are activated.

The second one, GPU related: https://bugzilla.kernel.org/show_bug.cgi?id=206903#c262 and not OS related https://bugzilla.kernel.org/show_bug.cgi?id=206903#c279  With my old AMD Polaris card was common to find some instability after or just at the moment of modifying the frequencies or voltage. My actual RDNA2 does not suffer from these problems.

Memory instability can also cause spontaneous reboots, especially under high temperature conditions. Memory should always be tested with the GPU at 100% utilization.

In short, most of these errors are due to incorrect voltage/frequency settings and not software bugs. I can't even remember the last time I experienced a spontaneous reboot on my AMD CPU and GPU system and that's because i tested and stressed all my PC components during hours/days in all kind of situations.

Last edited by agapito (2024-03-18 06:00:14)


Excuse my poor English.

Offline

#49 2024-03-03 22:28:18

Nathan67
Member
Registered: 2023-11-17
Posts: 14

Re: Random reboots with green screen and mce events : bea0000000000108

With the following settings, I haven't had a crash since.
I'm using kernel 6.7.8-arch1-1 and the "corectrl" package:
screenshot of settings in corectrl

Offline

Board footer

Powered by FluxBB