You are not logged in.

#1 2024-04-23 03:11:10

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Random reboots with green screen and MCE events

I am using archlinux on my machine for 3/4 months straight without any problems. But 2 days ago recently all of a sudden, without any reason, I got a green screen. After I got the green screen I immediately force restarted my PC and then after it booted it prompted some hardware errors at the boot screen. Something I haven't seen before. So I ran "sudo dmesg | grep -i hardware" and this was the result.

$ sudo dmesg | grep -i hardware
[    2.636488] mce: [Hardware Error]: Machine check events logged
[    2.636489] [Hardware Error]: System Fatal error.
[    2.636494] [Hardware Error]: CPU:4 (19:21:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000001000108
[    2.636503] [Hardware Error]: Error Addr: 0x00ffffffc0612040
[    2.636506] [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
[    2.636511] [Hardware Error]: Execution Unit Ext. Error Code: 0
[    2.636512] [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN

First thing I did was reinstalled my whole operating system. And I got into that same situation with green screen and MCE events 2 more times.
I did some digging on the internet and found that AMD processors are affected by this kind of bug. The recommended solution according to Gentoo wiki was to disable C-state. So I have disabled Global C-state control from my BIOS. This issue used to occur from 30/40 minutes to 2/3 hours of using the computer. After disabling C-State I was able to have 11 hours 9 minutes of uptime and then the green screen appeared again with the MCE hardware error after reboot. If I do not force reboot when the green screen appears, the screen stays like that for 5/6 seconds after rebooting itself. Every time it shows different CPUs.

[    2.640925] mce: [Hardware Error]: Machine check events logged
[    2.640927] [Hardware Error]: System Fatal error.
[    2.640933] [Hardware Error]: CPU:3 (19:21:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000001000108
[    2.640944] [Hardware Error]: Error Addr: 0x00ffffffc0714040
[    2.640947] [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
[    2.640953] [Hardware Error]: Execution Unit Ext. Error Code: 0
[    2.640954] [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN

I have never faced an issue like this in my life where the screen goes green randomly and then a MCE error appears. I did not try these solutions that were recommended in other forums.

  • Change "Power Supply Idle Control" to "Typical current idle"

  • Change CPU voltage

  • Change DRAM voltage

I am using ArchLinux on this machine for a long time. I have never had an issue. I haven't updated my system for 3/4 weeks before this hardware error first appeared. So you can say that I was doing nothing that could produce an error like this.

Specifications:
CPU: AMD Ryzen 5 5600
GPU: RX 7700 XT
Motherboard: Gigabyte B550M K (rev 1.1)
RAM: Corsair Vengeance LPX DDR4 16x2 GB
BIOS version: F5 (latest at the time of writing)

Last edited by noisypiano (2024-05-06 00:24:07)

Offline

#2 2024-04-23 08:28:10

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

Just got another green screen. An hour after I have changed the CPU voltage offset to +0.042V. At this point, C-state in my bios is turned off, power idle control is set to auto. I have changed the CPU voltage to see if it works with it or not.

[    2.756727] mce: [Hardware Error]: Machine check events logged
[    2.756728] [Hardware Error]: System Fatal error.
[    2.756735] [Hardware Error]: CPU:3 (19:21:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000001000108
[    2.756747] [Hardware Error]: Error Addr: 0x00ffffffc063433a
[    2.756750] [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
[    2.756756] [Hardware Error]: Execution Unit Ext. Error Code: 0
[    2.756757] [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN

EDIT: I increased the wrong CPU voltage in the BIOS. See post #27, #28. Which means the result is completely misleading.

Last edited by noisypiano (2024-04-25 19:23:51)

Offline

#3 2024-04-23 18:08:44

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

After I got the last green screen, I have increased the CPU voltage offset to +0.054V. After that I have run memtest for 4 hours and #1, #2 test passed without any errors. Then I have benchmarked my system using Unigine Superposition. I benchmarked with tool before and this time I got similar results like I used to. The GPU temperature spiked and everything worked fine. So this issue doesn't happen if the machine is in stress. The only time get this type of error is when I am doing very basic things. For example when I am browsing, reading a doc, or I have vscode running alongside firefox. Or only vscode is running, or only a terminal window is open alongside firefox so that I can read the docs and modify the configurations. And there is no way to reproduce this error. It happens randomly without a reason. You can't connect the dots it's very random. Sometimes in 30 minutes, sometimes after an hour or two, sometimes after a long time.
I did all these tests with:

  • Global C-state Control > Auto

  • Power supply idle control > Auto

  • CPU Voltage offset > +0.054V

I have already tried disabling Global C-state Control from bios but it didn't have any effect and my PC ran into the same crash after 11 hours. The only combination I have not tried is having C-state control disabled and Power supply idle set to Typical Current Idle (which means disabled according to my bios manual). I'll first try with this voltage offset before doing anything else.
However I forgot to write which kernel I was using. At the time when this issue first happened, I was using the default 6.8.1-arch1-1 kernel. Right now I am using 6.8.7-arch1-1.
One thing that bothers me is that from all the post I have seen on the internet that got the same type of mce error, they either updated their kernel, changed the CPU, changed the GPU. I did nothing. I also haven't updated my system for a long time when this crash started to happen. It was working perfectly until it wasn't.

EDIT: Nothing in this post is misleading except "I have increased the CPU voltage offset to +0.054V". See post #27, #28.

Last edited by noisypiano (2024-04-25 19:17:06)

Offline

#4 2024-04-23 19:27:34

OpusOne
Member
Registered: 2023-05-31
Posts: 86

Re: Random reboots with green screen and MCE events

Note that the "green screen" symptom is due to the GPU, not the CPU. It's a "well known" symptom with AMD-based GPUs, although there can be myriads of causes for that. I ran into it (I have a RX6650XT) a few months ago, occasionally, when the computer was resuming from standby. It got fixed (for me) after some update of the Linux kernel (which included a fix to the amdgpu module), I don't remember which version it was.

It's hard to tell whether it's a hardware of software issue in your case. Is you power supply sufficient for your machine?

You don't tell us how you manage your updates either. Archlinux being a rolling release, the kernel gets updated very, very frequently. So, 'it was working perfectly until it wasn't' is not very helpful. Unless you never update your system, your system must have changed quite a bit between the time when it "worked" and the time when you started having this issue. If you're not able to track down your updates, that'll be a tough one, unless, as I said above, it's a hardware problem.

Offline

#5 2024-04-23 21:08:44

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

Hey OpusOne, thanks for your reply.
My PSU is MSI MPG A650GF which is 650W and I think is more than enough for the GPU I am using.
I usually update my system after a long time (usually once a month). If I remembered correctly, at the time I first faced this issue my system was not up to date for about 3/4 weeks or a bit less. I update from the terminal using the sudo pacman -Syu command. So that's how I manage my system.

And yeah it might be a GPU related issue as you are saying but I don't have any information to be sure about it. Because as I have said I had benchmarked using Unigine Superposition which stresses the GPU a lot. And I forgot to say that I ran that back to back 3 times without any break. Nothing unexpected happened.
Another thing to add is that there is mce log which is well known and mentioned in both Gentoo and Arch wikis. The mce events shows error that are specifically related to CPUs not GPU. Did you run into the same mce issue as me, or is it just a green screen? If there is no mce events then I don't think your issue is the one that I am having.

Offline

#6 2024-04-23 23:01:04

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

Alright I got the green screen again with this error

[    2.738463] mce: [Hardware Error]: Machine check events logged
[    2.738465] [Hardware Error]: System Fatal error.
[    2.738471] [Hardware Error]: CPU:7 (19:21:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000000000108
[    2.738483] [Hardware Error]: Error Addr: 0x00007fd4c743622a
[    2.738487] [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
[    2.738493] [Hardware Error]: Execution Unit Ext. Error Code: 0
[    2.738494] [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN

I am going to reset the voltage and do the C-State + Typical Power supply idle control combination to see if there's any hope.

Offline

#7 2024-04-23 23:04:09

seth
Member
Registered: 2012-09-03
Posts: 51,826

Re: Random reboots with green screen and MCE events

https://wiki.archlinux.org/title/Ryzen#Random_reboots
Have you encountered any more crashes/MCEs since you increased the CPU voltage?
F5…

Last edited by seth (2024-04-23 23:04:40)

Offline

#8 2024-04-23 23:08:51

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

Yeah, the error I have posted right now happened when I had the CPU voltage offset set to +0.054V and it is set to that right now. And I also got the same error with MCE events after I had increased the CPU voltage to +0.042V as described in the post #2.

EDIT: I increased the wrong CPU voltage in the BIOS. See post #27, #28. Which means the result is completely misleading.

Last edited by noisypiano (2024-04-25 19:24:08)

Offline

#9 2024-04-24 05:10:07

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

After trying the last combination which is disabling C-state and setting Power supply idle to Typical Current Idle, I got a green screen again with the following error,

[    2.754742] mce: [Hardware Error]: Machine check events logged
[    2.754743] [Hardware Error]: System Fatal error.
[    2.754749] [Hardware Error]: CPU:0 (19:21:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000001000108
[    2.754761] [Hardware Error]: Error Addr: 0x00ffffffc066a33a
[    2.754765] [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
[    2.754771] [Hardware Error]: Execution Unit Ext. Error Code: 0
[    2.754772] [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN

I am completely out of ideas. I will be trying out a new operating system to see if this persists. The last thing for me to do would be to install Windows and use it for a week to see if anything weird happens. At which point I just have to take it to repair or claim warranty.

Last edited by noisypiano (2024-04-24 05:13:25)

Offline

#10 2024-04-24 07:03:37

seth
Member
Registered: 2012-09-03
Posts: 51,826

Re: Random reboots with green screen and MCE events

Do you have a spare GPU?
(Ideally one w/ less potential power demands)

Offline

#11 2024-04-24 14:02:07

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 664

Re: Random reboots with green screen and MCE events


Excuse my poor English.

Offline

#12 2024-04-24 16:19:42

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

seth wrote:

Do you have a spare GPU?
(Ideally one w/ less potential power demands)

Yeah I do have a spare GPU. I think you haven't read the #1 post clearly. I have written about my hardware parts clearly in that post. And I have also mentioned about the PSU I use in post #5.

Offline

#13 2024-04-24 16:41:38

seth
Member
Registered: 2012-09-03
Posts: 51,826

Re: Random reboots with green screen and MCE events

And where does the OP or anything else point out that
1. you have a spare (second, alternate, other) GPU
2. You've tried to replace the GPU?

If it ends up drawing toomuch™ power over the PEG, you can put whatever PSU you want into the system, that's not gonna help you.
The symptoms are all that of the known ryzen issues (also see agapito's link) except for the "green screen" thing what raises the GPU as a potentional origin of the undervoltage.

Offline

#14 2024-04-24 16:41:57

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

I have read your post already before. I don't really think it's a GPU problem as I have already said it in the first post that I am running ArchLinux for 3/4 months straight without any problem. I have also used other OS/distro without a problem. I haven't really changed any CPU/GPU voltage. The only settings I have ever changed in my BIOS is XMP Profile, Resizable BAR support and disabling CSM, nothing else. I only played around with the CPU voltage when I got this problem which is 3 days ago. Also the GPU related bug reports seems to have TSC written in the MCE log. I don't have TSC written in my MCE log. I also have run Unigine Benchmark as I have said before without any problem. I have also stated already that it is happening very randomly and only when I am doing very basic stuff. Although very basic stuff is all I do, I am not playing games for some time now. With that said I don't really have a clue if it's the GPU or the CPU. Because the error says CPU, and I don't know how this can be related to GPU. And my build is 6 and a half months old. I don't think there is a hardware problem because I had played games all day long after building this one without an issue for 1/2 months straight. The only possible reason could be using the 6.8.1 kernel. Because I did not have this problem before when I was using 6.7 series of kernel. But I am totally not sure about this and very skeptical.

Last edited by noisypiano (2024-04-25 19:19:48)

Offline

#15 2024-04-24 16:45:37

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

seth wrote:

And where does the OP or anything else point out that
1. you have a spare (second, alternate, other) GPU
2. You've tried to replace the GPU?

If it ends up drawing toomuch™ power over the PEG, you can put whatever PSU you want into the system, that's not gonna help you.
The symptoms are all that of the known ryzen issues (also see agapito's link) except for the "green screen" thing what raises the GPU as a potentional origin of the undervoltage.

Oh I am really sorry. You meant a spare GPU so that I can swap the current one so that I can test. No, I really don't have any spear parts for anything. This is my first build so I don't have anything else.

Offline

#16 2024-04-24 17:01:38

seth
Member
Registered: 2012-09-03
Posts: 51,826

Re: Random reboots with green screen and MCE events

Yeah, "spare" like Prince Harry tongue

only when I am doing very basic stuff … I did not have this problem before when I was using 6.7 series of kernel

In that case you could
a) test the behavior w/ the LTS kernel or downgrade the kernel to 6.7.x (this is ok, but don't forget the headers and/or OOT modules like nvidia or virtualbox)
b) re-enable c-state control but limit it in the OS, "processor.max_cstate=1"

https://wiki.archlinux.org/title/Ryzen# … nd_suspend suggests "Power idle control". Change its value to "Typical current idle"

Offline

#17 2024-04-24 17:11:25

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

seth wrote:

Yeah, "spare" like Prince Harry tongue

only when I am doing very basic stuff … I did not have this problem before when I was using 6.7 series of kernel

In that case you could
a) test the behavior w/ the LTS kernel or downgrade the kernel to 6.7.x (this is ok, but don't forget the headers and/or OOT modules like nvidia or virtualbox)
b) re-enable c-state control but limit it in the OS, "processor.max_cstate=1"

https://wiki.archlinux.org/title/Ryzen# … nd_suspend suggests "Power idle control". Change its value to "Typical current idle"

Yes I have already installed the linux-lts (6.6.28-1-lts) kernel and booted into it now. I will try the suggestions in the ArchWiki as you have mentioned but first I will try without any tweaks to see if it happens again.
Currently my bios have,

  • XMP Profile > Profile 1

  • Above 4G Decoding > Enabled

  • Re-size BAR Support > Auto

  • CSM Support > Disabled

Everything else is at their default values. Fast boot is disabled too.

Last edited by noisypiano (2024-04-24 17:21:41)

Offline

#18 2024-04-24 17:55:09

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 664

Re: Random reboots with green screen and MCE events

noisypiano wrote:

I have read your post already before. I don't really think it's a GPU problem as I have already said it in the first post that I am running ArchLinux for 3/4 months straight without any problem. I have also used other OS/distro without a problem. I haven't really changed any CPU/GPU voltage. The only settings I have ever changed in my BIOS is XMP Profile, Resizable BAR support and disabling CSM, nothing else. I only played around with the CPU voltage when I got this problem which is 3 days ago. Also the GPU related bug reports seems to have TSC written in the MCE log. I don't have TSC written in my MCE log. I also have run Unigine Benchmark as I have said before without any problem. I have also stated already that it is happening very randomly and only when I am doing very basic stuff. Although very basic stuff is all I do, I am not playing games for some time now. With that said I don't really have a clue if it's the GPU or the CPU. Because the error says CPU, and I don't know how this can be related to GPU. And my build is 6  and a half months old. I don't think there is a hardware problem because I had played games all day long after building this one without an issue for 1/2 months straight. The only possible reason could be using the 6.8.1 kernel. Because I did not have this problem before when I was using 6.7 series of kernel. But I am totally not sure about this and very skeptical.

I didn't say it was a GPU problem and I don't think it is either, I just linked that message so you could read the first cause of the random reboots which is the one that affects you and the possible solution.

So, my advice is to enable Curve Optimizer on bios and try to calibrate the CPU manually using CoreCycler on Windows 10/11 Safe Mode, as I mentioned in this message: https://bbs.archlinux.org/viewtopic.php … 8#p2146868

If you haven't flashed a new bios lately, it is possible that your CPU has degraded over time and now needs a little more voltage than before at a certain frequency.


Excuse my poor English.

Offline

#19 2024-04-24 18:41:07

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

agapito wrote:
noisypiano wrote:

I have read your post already before. I don't really think it's a GPU problem as I have already said it in the first post that I am running ArchLinux for 3/4 months straight without any problem. I have also used other OS/distro without a problem. I haven't really changed any CPU/GPU voltage. The only settings I have ever changed in my BIOS is XMP Profile, Resizable BAR support and disabling CSM, nothing else. I only played around with the CPU voltage when I got this problem which is 3 days ago. Also the GPU related bug reports seems to have TSC written in the MCE log. I don't have TSC written in my MCE log. I also have run Unigine Benchmark as I have said before without any problem. I have also stated already that it is happening very randomly and only when I am doing very basic stuff. Although very basic stuff is all I do, I am not playing games for some time now. With that said I don't really have a clue if it's the GPU or the CPU. Because the error says CPU, and I don't know how this can be related to GPU. And my build is 6  and a half months old. I don't think there is a hardware problem because I had played games all day long after building this one without an issue for 1/2 months straight. The only possible reason could be using the 6.8.1 kernel. Because I did not have this problem before when I was using 6.7 series of kernel. But I am totally not sure about this and very skeptical.

I didn't say it was a GPU problem and I don't think it is either, I just linked that message so you could read the first cause of the random reboots which is the one that affects you and the possible solution.

So, my advice is to enable Curve Optimizer on bios and try to calibrate the CPU manually using CoreCycler on Windows 10/11 Safe Mode, as I mentioned in this message: https://bbs.archlinux.org/viewtopic.php … 8#p2146868

If you haven't flashed a new bios lately, it is possible that your CPU has degraded over time and now needs a little more voltage than before at a certain frequency.

I have already tried increasing the CPU voltage from my BIOS. I haven't tried calibrating my CPU manually but I don't want to install Windows right now. I have installed a LTS kernel and I will install Windows when this also fails.
I am using the latest BIOS.

EDIT: "I have already tried increasing the CPU voltage from my BIOS", I increased the wrong CPU voltage in the BIOS and only this line in this post is misleading. See post #27, #28.

Last edited by noisypiano (2024-04-25 19:24:46)

Offline

#20 2024-04-24 20:49:02

seth
Member
Registered: 2012-09-03
Posts: 51,826

Re: Random reboots with green screen and MCE events

Can you completely disable XMP?

Offline

#21 2024-04-25 05:00:35

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

Okay I will test with that one as well but let me see what happens with the settings I had set when I had no problems.

Last edited by noisypiano (2024-04-25 05:01:04)

Offline

#22 2024-04-25 05:26:00

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

According to post #17, after using that BIOS setup and the LTS (6.6.28-1-lts) kernel, I am facing intermittent lags. Meaning I am facing lags (not freeze) when scrolling in the web browser and mouse lags when moving the mouse. I have been using linux on this machine since regular 6.6 kernel and I did not have any issue. This might be a LTS kernel issue but I'm not sure about it. I think I have to do another test later with 6.7.x series of kernels.

EDIT: One thing I have not really mentioned is that the lag I get is not a lot. It happens intermittently and it doesn't happen a lot. I mean not after every 3/4 minutes on average or at a interval that makes it very annoying to use. It happens way too less.
EDIT: This mouse and random lag issue is related to AMD fTPM. Before booting with the LTS kernel I have reset the BIOS because I was thinking maybe I have messed up something. But one thing I didn't realize is that when you reset BIOS, AMD fTMP gets automatically enabled. Later when testing with the LTS kernel also failed, I switched back to the recent kernel but the problem was present on the recent kernel too where it wasn't present before. So I was scrolling YouTube randomly and I saw a video about fTPM stuttering issue and then I realized my problem is very similar to that. After disabling fTPM, the issue was gone.

Last edited by noisypiano (2024-04-28 16:05:04)

Offline

#23 2024-04-25 07:24:44

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 664

Re: Random reboots with green screen and MCE events

If you really think this is happening to you because of a kernel/software bug, you are completely wrong. You are facing a hardware problem, your log says it clearly.

In fact I can reproduce that MC5_STATUS error whenever I want in my machine. To do so, I only have to assign a light task to a specific core, while another one of them has 2 points less on the curve. So the conclusion is clear: at least one of your processor's cores is not receiving the proper voltage.


Excuse my poor English.

Offline

#24 2024-04-25 09:03:40

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

I don't understand how do you assign some light task to a specific core and reduce 2 points less on a certain curve specifically to make a crash happen identical to mines. All I know is my setup wasn't laggy when I was using regular 6.6 kernel, and it was not laggy when I was using 6.8.7 kernel either.

Last edited by noisypiano (2024-04-25 09:05:22)

Offline

#25 2024-04-25 09:53:09

seth
Member
Registered: 2012-09-03
Posts: 51,826

Re: Random reboots with green screen and MCE events

https://wiki.archlinux.org/title/Ryzen#Random_reboots

use the AMD curve optimiser which is accessible via your motherboard's bios. Access it and put a positive offset of 4 points, which will increase the voltage your CPU is getting at higher loads

https://stackoverflow.com/questions/339 … cess-linux
https://man.archlinux.org/man/core/util … skset.1.en

But I cannot quantify "light", maybe just "top"?

Offline

Board footer

Powered by FluxBB