You are not logged in.

#76 2024-04-29 16:50:45

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 664

Re: Random reboots with green screen and MCE events

noisypiano wrote:

I understand that you got colored reboots in Windows as well, but the thing you are missing here is, I am not having any in Windows! I am still using it for testing, if it ever happens in Windows as well, I can say it might be the GPU.

noisypiano wrote:

I am using archlinux on my machine for 3/4 months straight without any problems.

Well, it seems your problems started since you are using kernel 6.8 and as you say this is a exclusive Linux problem, install kernel 6.7 and your system will be stable again. From there, you can start to bisect and help to fix the bug.


Excuse my poor English.

Offline

#77 2024-04-29 16:59:01

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

agapito wrote:
noisypiano wrote:

I understand that you got colored reboots in Windows as well, but the thing you are missing here is, I am not having any in Windows! I am still using it for testing, if it ever happens in Windows as well, I can say it might be the GPU.

noisypiano wrote:

I am using archlinux on my machine for 3/4 months straight without any problems.

Well, it seems your problems started since you are using kernel 6.8 and as you say this is a exclusive Linux problem, install kernel 6.7 and your system will be stable again. From there, you can start to bisect and help to fix the bug.

Surely, as I have already said in post #67

noisypiano wrote:

I have tried with the 6.6 LTS kernel but it still crashed. I have used Linux on this machine since the regular 6.6 kernel and I had no errors. I think rather than using a LTS kernel, I have to use the exact kernel that I had no errors on to find out more precisely

Offline

#78 2024-04-29 17:06:27

seth
Member
Registered: 2012-09-03
Posts: 51,842

Re: Random reboots with green screen and MCE events

Which 6.6, .27 , .28 or .29?
There's a amdgpu related bug that was also backported to the LTS kernels and *breaks* shutdown/reboots for some users (but only after a previous S3/S4) - 6.6.27 should still be ok and the timeline (starting ~april 20th) fits.

Offline

#79 2024-04-29 17:11:26

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

seth wrote:

Which 6.6, .27 , .28 or .29?
There's a amdgpu related bug that was also backported to the LTS kernels and *breaks* shutdown/reboots for some users (but only after a previous S3/S4) - 6.6.27 should still be ok and the timeline (starting ~april 20th) fits.

It was (6.6.28-1-lts).

Offline

#80 2024-04-29 17:20:27

seth
Member
Registered: 2012-09-03
Posts: 51,842

Re: Random reboots with green screen and MCE events

Try to downgrade to 6.6.27 or 6.8.6…
https://bbs.archlinux.org/viewtopic.php?id=295199

Offline

#81 2024-04-29 17:26:14

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

seth wrote:

Try to downgrade to 6.6.27 or 6.8.6…
https://bbs.archlinux.org/viewtopic.php?id=295199

I was affected by the reboot bug as well. I could shut down but couldn't reboot. After rebooting and selecting the grub menu, it showed a black screen and nothing else happened after. Cannot tell exactly which kernel version but it was quite some time ago. The solution was adding reboot=pci to the kernel parameter. It got fixed later and I don't really recall which kernel version it was sadly.

Can you please give a link I can trust to download older kernel versions? I will be running Windows for some time for testing purpose so can't do that right now.

Last edited by noisypiano (2024-04-29 17:26:43)

Offline

#82 2024-04-29 22:02:37

seth
Member
Registered: 2012-09-03
Posts: 51,842

Offline

#83 2024-05-03 00:56:16

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

I have tried linux-6.6.2 kernel, linux-6.6.29-lts, linux-6.6.28-lts, linux-6.8.7, all gives green screen completely randomly. In post #53 I told how to reproduce this error. But this only works on 6.8.7 kernel. I tried to reproduce using this procedure on other kernels but they didn't give a green screen. For example I have kept linux-6.6.2 on idle for an hour following that procedure, but nothing happened. But when I used my computer with that kernel, completely randomly out of nowhere, a green screen appears. But, this time, I get a different kind of MCE error.

MCE error from linux-6.6.29-LTS

May 02 07:21:45 kernel: mce: [Hardware Error]: Machine check events logged
May 02 07:21:45 kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 5: bea0000001000108
May 02 07:21:45 kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffc03c710a MISC d012000100000000 SYND 4d000000 IPID 500b000000000
May 02 07:21:45 kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1714612898 SOCKET 0 APIC 8 microcode a20120e

MCE error from linux-6.6.2

May 03 06:31:02 kernel: mce: [Hardware Error]: Machine check events logged
May 03 06:31:02 kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 5: bea0000001000108
May 03 06:31:02 kernel: mce: [Hardware Error]: TSC 0
May 03 06:31:02 kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1714696254 SOCKET 0 APIC 6 microcode a20120e

Right now, I will try linux-6.8.7 where I know how to reproduce the error. Then I will use the amd.ppfeaturemask kernel parameter to see where this goes. Already used Windows 11 23H2 for sometimes, I played The Finals on the highest settings, the GPU temp goes as high as 70C. No crash. I don't know where is the problem. I can't recall something I did prior to this error that might cause the problem.

Offline

#84 2024-05-03 06:50:34

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

On linux-6.8.7 this crash happens way too fast. Just tried with the amdgpu.ppfeaturemask=0xffffbffb kernel parameter and got a green screen.

May 03 12:47:20 kernel: mce: [Hardware Error]: Machine check events logged
May 03 12:47:20 kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 5: bea0000001000108
May 03 12:47:20 kernel: mce: [Hardware Error]: TSC 0
May 03 12:47:20 kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1714718833 SOCKET 0 APIC a microcode a20120e

Offline

#85 2024-05-03 08:06:57

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 664

Re: Random reboots with green screen and MCE events

In case you hadn't noticed, I was being sarcastic in my last post here.

As I commented to you earlier, there is no bug in the kernel which produces your errors. It is a hardware problem with your computer. Your hardware is broken, not in the sense that it needs repair, but in the sense that it is not getting the proper voltage and needs to be fixed. I'll repeat again: my Zen 3 CPU and my RDNA GPU have been stable for the last two years. I have never seen an automatic reboot since I calibrated my CPU correctly, with the help of CoreCycler and Curve Optimizer. Linux kernel is fine, and if it doesn't happen in Windows, it is possible that it is because Windows runs more services in the background and distributes the loads differently on your CPU.

In your case, everything pointed to a CPU voltage error, but if you say that CoreCycler does not give you errors (although I do not think you have spent the necessary time testing) the instability is produced by the GPU or the voltage settings of your limited motherboard. The only thing you can do right now is install mainline kernel which includes preferred core mode enabled by default and you may reduce or eliminate the consequence, but not the cause as Windows does. Or you can flash an old motherboard bios and cross your fingers.


Excuse my poor English.

Offline

#86 2024-05-03 09:04:28

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

agapito wrote:

In case you hadn't noticed, I was being sarcastic in my last post here.

First thing you say it's a CPU problem and even if I boot into Windows I'd face the same problem, which I don't after idling Windows for an hour 3/4 times. Then you tell me to run a CoreCycler test and "you will crash in a few minutes or even seconds". I ran CoreCycler for an hour without any errors. After that you say, "No. If your CPU passes CoreCycler without errors, then your CPU is fine. If your CPU is broken, it would also reboot using Windows." And now, "although I do not think you have spent the necessary time testing"
Yeah, thanks for your sarcasm!

agapito wrote:

As I commented to you earlier, there is no bug in the kernel which produces your errors. It is a hardware problem with your computer. Your hardware is broken, not in the sense that it needs repair, but in the sense that it is not getting the proper voltage and needs to be fixed. I'll repeat again: my Zen 3 CPU and my RDNA GPU have been stable for the last two years. I have never seen an automatic reboot since I calibrated my CPU correctly, with the help of CoreCycler and Curve Optimizer. Linux kernel is fine, and if it doesn't happen in Windows, it is possible that it is because Windows runs more services in the background and distributes the loads differently on your CPU.

In your case, everything pointed to a CPU voltage error, but if you say that CoreCycler does not give you errors (although I do not think you have spent the necessary time testing) the instability is produced by the GPU or the voltage settings of your limited motherboard. The only thing you can do right now is install mainline kernel which includes preferred core mode enabled by default and you may reduce or eliminate the consequence, but not the cause as Windows does. Or you can flash an old motherboard bios and cross your fingers.

Alright let me make things clear.
Perhaps, you think I believe it is a linux kernel issue. Which is not true. I haven't said in a single post that it's a linux kernel issue.
I have already made what I think clear in post #67.

noisypiano wrote:

The final conclusion I have come to is that as said in the ArchWiki, maybe the CPU has degraded and adding some voltage offset to all CPU cores might fix the issue. AMD is probably aware of the situation and made some driver patches that automatically does this on Windows.

To which you replied,

agapito wrote:
noisypiano wrote:

The final conclusion I have come to is that as said in the ArchWiki, maybe the CPU has degraded and adding some voltage offset to all CPU cores might fix the issue. AMD is probably aware of the situation and made some driver patches that automatically does this on Windows.

No. If your CPU passes CoreCycler without errors, then your CPU is fine. If your CPU is broken, it would also reboot using Windows.

I don't know if you were being sarcastic in here because now you are saying it's a CPU voltage or hardware error again.

Whatever, the thing I wanted to say is, I never said it's a linux kernel issue. The main reason I am trying out different kernels because I don't have any other options right now. I cannot change my CPU voltage. I'm just messing around trying to find a solution to a problem I am very worried about. So yeah, I think you get it now. No need to be sarcastic.

Last edited by noisypiano (2024-05-03 09:12:04)

Offline

#87 2024-05-03 12:26:46

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 664

Re: Random reboots with green screen and MCE events

noisypiano wrote:

First thing you say it's a CPU problem and even if I boot into Windows I'd face the same problem, which I don't after idling Windows for an hour 3/4 times.

I could link you hundreds of thousands of messages, from people with the same problem and the solution, which is always the same: change the voltage curve with Curve Optimizer, add a positive voltage offset or disabling the C-States.

noisypiano wrote:

Then you tell me to run a CoreCycler test and "you will crash in a few minutes or even seconds"

Given the speed with which your PC restarted it was logical to think that CoreCycler would detect the errors immediately, but as I explained before, Windows distributes the loads differently and that can mask the problem.

noisypiano wrote:

And now, "although I do not think you have spent the necessary time testing"

noisypiano wrote:

I ran CoreCycler for an hour without any errors

You are funny. Let me know when you have tested it properly, or in other words, let me know when you have passed 72 hours without errors.

From another message:

noisypiano wrote:

I don't have the sanity to run a 60+ hours test to find out if all my cores are stable

Oops.

Mine has spent almost 400 hours... Just saying.

When you have tested it properly and made sure that it is not a CPU related problem, that's when you can bet everything on the GPU, because it is the second most common cause of this problem.


noisypiano wrote:

Perhaps, you think I believe it is a linux kernel issue. Which is not true. I haven't said in a single post that it's a linux kernel issue.

Really?

noisypiano wrote:

A broken or misconfigured hardware that works magically on Windows without any issues and doesn't on Linux? Okay!

noisypiano wrote:

I understand that you got colored reboots in Windows as well, but the thing you are missing here is, I am not having any in Windows!

noisypiano wrote:

I have tried with the 6.6 LTS kernel but it still crashed. I have used Linux on this machine since the regular 6.6 kernel and I had no errors. I think rather than using a LTS kernel, I have to use the exact kernel that I had no errors on to find out more precisely

agapito wrote:

If this is a Linux Ryzen-CPU bug tell me why I haven't seen any unexpected reboot in the last two years. Why my 5950x is not affected by that bug?

noisypiano wrote:

I can't really say for sure.

noisypiano wrote:

Whatever, the thing I wanted to say is, I never said it's a linux kernel issue.

No, you never said it was a bug in it because as everyone knows, the kernel has nothing to do with CPU or GPU handling and you were just trying other kernels or operative systems out of boredom and lack of options.

noisypiano wrote:

AMD is probably aware of the situation and made some driver patches that automatically does this on Windows.

Windows does not do black magic or secretly add more voltage to avoid reboots as you think, voltages are decided by the motherboard & CPU firmware.


I don´t know if you are a newbie, a troll or both but I am not going to answer here anymore, because it is useless. Good luck trying to fix your broken hardware.


Excuse my poor English.

Offline

#88 2024-05-03 20:27:19

noisypiano
Member
Registered: 2024-04-23
Posts: 52

Re: Random reboots with green screen and MCE events

agapito wrote:
noisypiano wrote:

Perhaps, you think I believe it is a linux kernel issue. Which is not true. I haven't said in a single post that it's a linux kernel issue.

Really?

noisypiano wrote:

A broken or misconfigured hardware that works magically on Windows without any issues and doesn't on Linux? Okay!

noisypiano wrote:

I understand that you got colored reboots in Windows as well, but the thing you are missing here is, I am not having any in Windows!

I think you should add more context because this one here seems really confusing.
After I passed the CoreCycler test you said it was probably my GPU which is broken or misconfigured that is causing the issue. I disagreed and said this. Because if my GPU was broken or misconfigured, I would get colored reboots or weird artifacts in Windows as well, which I didn't after using it for sometimes. I was talking about my GPU not being broken.

Offline

Board footer

Powered by FluxBB