You are not logged in.

#26 2020-03-21 18:23:10

adrians
Member
From: Latvia
Registered: 2009-05-05
Posts: 18

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

I found this thread while researching hardware errors on my recently built Ryzen 3590x system. There seems to be some similarities with the case discussed here.

I am running the following:

CPU: AMD Ryzen 9 3590x
GPU: Nvidia GTX 1070 Ti
MB: Asus ROG Crosshair VIII Hero (WI-FI)
Memory: G.SKILL Trident Z Neo 32GB 3600MHz CL16 DDR4 KIT OF 2 F4-3600C16D-32GTZN
OS: Fedora 31, KDE spin (kernel 5.5.9-200.fc31.x86_64)

Once in a while I encounter the following messages from kernel:

Message from syslogd@grizzly at Mar 21 19:33:53 ...
 kernel:[Hardware Error]: Corrected error, no action required.

Message from syslogd@grizzly at Mar 21 19:33:53 ...
 kernel:[Hardware Error]: CPU:0 (17:71:0) MC27_STATUS[-|CE|MiscV|-|-|-|SyndV|-|-|-]: 0x982000000002080b

Message from syslogd@grizzly at Mar 21 19:33:53 ...
 kernel:[Hardware Error]: IPID: 0x0001002e00000500, Syndrome: 0x000000005a020001

Message from syslogd@grizzly at Mar 21 19:33:53 ...
 kernel:[Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 2, Link Error.

Message from syslogd@grizzly at Mar 21 19:33:53 ...
 kernel:[Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout)

I have done MemTest86 -- no errors there. I haven't seen hard crashes so far, might have crashed a program once, not sure. BIOS is updated to the the latest version currently available (Version 1302 2020/03/03). I am trying to understand if this is faulty hardware or something else.

Offline

#27 2020-04-01 12:04:09

daren_k
Member
Registered: 2020-02-13
Posts: 20

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

Still the issue with F12e BIOS. GPU related stuff like rocminfo still crashes the system hard on the latest kernel.

The hard crashes actually corrupted some files of mine, it seems it affects mostly files with open file descriptors like service logs or the shell history file... last couple of bytes are filled with zeros there.
Happened to a bunch of video files as well where multiple sections of >= 1MB were all zero, causing skips and artefacts in video playback. Maybe ffmpegthumbnailer was running during the crashes with open file descriptors. Luckily no files of much importance affected...

If I test this in future, I'll make sure to unmount a couple of drives and probably reduce the runlevel to avoid damage.

Should I file a bug report for the linux package for Arch and/or for the upstream kernel?
In that case can anyone tell me where to report a bug for the upstream kernel?

Last edited by daren_k (2020-04-01 12:06:13)

Offline

#28 2020-04-02 01:58:45

Ropid
Member
Registered: 2015-03-09
Posts: 856

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

Are things also crashing when running the RAM without overclock and at its default speed? The default speed will be something low like 2400MHz. I looked through the old posts in this thread and I couldn't understand if you tested the RAM at low speed or not. I know that it feels bad to run the RAM at 2400MHz instead of its advertised 3600MHz, but you should still test that because the 3600MHz speed is an overclock. If things run fine without overclock, making a bug report won't help you because there would be no way to replicate things on a different computer.

If you find out that things run fine when not using the 3600MHz RAM speed, you could try asking Gigabyte for help. They might send you an alternative BIOS to try, or they might point you to a BIOS setting you can tweak. The other thing to do would be to try to overclock the memory manually instead of using the XMP profile.

@adrians: I would try playing around with different UEFI/BIOS settings to see if there's a way to make those errors go away. I could always find something to make those kind of errors/warnings go away. I don't know if a computer can run stable and still record those warnings in the log so I don't know if it's important to make those kinds of errors go away. Personally, I wouldn't let it stay like this.

Offline

#29 2020-04-02 10:01:58

daren_k
Member
Registered: 2020-02-13
Posts: 20

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

Happens with clocks at optimized defaults as well (RAM@2133 MHz).

In short:
F1 BIOS + linux (5.5.x) & linux-lts (5.4.x) = all good except RDRAND issue and slightly lower memory bandwidth, rocminfo and Furmark benchmark work
More recent BIOS + linux (5.5.x) = kernel crashes with rocminfo and when starting Furmark benchmark for example, file corruption possible when kernel crashes
More recent BIOS + linux-lts (5.4.x) = all good, running this for weeks with RAM@3600 MHz, rocminfo and Furmark benchmark work

It seems to be related to the amdgpu kernel driver in 5.5.x to me tbh. Something not functioning correctly with more recent BIOSes there.

I already filed a bug report about it at Gigabyte, got a reply that it's being looked at.

Offline

#30 2020-04-02 12:29:04

Ropid
Member
Registered: 2015-03-09
Posts: 856

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

Can you force the crash reliably and fast? If you don't have to wait for it to show up, you could "bisect" the 5.5 kernel commits to find the one commit which introduced this crashing behavior.

I never did a bisect so don't know how it actually works. There's an ArchWiki article here:

https://wiki.archlinux.org/index.php/Bi … s_with_Git

I think this git-bisect thing doesn't work with the normal Arch kernel package as a base. The git repository in there only has tags like "v5.5.1" and "v5.4.2" etc. so you can't target the development commits that lead up to the release of 5.5. The ArchWiki article says something about using one of the "...-git" packages from the AUR to work on this. I don't know if that's really the way to do this for the kernel. I remember the user "loqs" writing a post with a guide on how to do a git-bisect on the kernel but I can't find that post.

EDIT:

I found a post by loqs that explains git-bisect for the kernel. See post #8 in this thread here:

https://bbs.archlinux.org/viewtopic.php … 9#p1875819

Last edited by Ropid (2020-04-02 12:31:35)

Offline

#31 2020-04-03 08:49:20

adrians
Member
From: Latvia
Registered: 2009-05-05
Posts: 18

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

I lowered RAM speed to 3200MHz and haven't seen the error since then (I have been running this setup for about two weeks now). In my case this seems to be overclocking issue as suggested by @Ropid. I falsely assumed that XMP profile for 3600MHz RAM speed would just work out of the box. It might require some manual tuning to get stability.

Offline

#32 2020-04-06 16:35:14

daren_k
Member
Registered: 2020-02-13
Posts: 20

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

I can instantly evoke the crash by calling rocminfo once or a couple times with linux 5.5.x.

It segfaults only with recent BIOS + linux 5.5.x.

On F1 BIOS and any BIOS with linux-lts 5.4.x I can call it with "while true; do rocminfo; done" and nothing breaks.

I will do a bit of testing once linux 5.6.x is released for arch. Maybe it is fixed then...

Last edited by daren_k (2020-04-06 16:35:38)

Offline

#33 2020-04-10 16:34:54

daren_k
Member
Registered: 2020-02-13
Posts: 20

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

Seems to be fixed with the current linux 5.6.3, can call rocminfo and GPU benchmarks all I want with no issues.

Offline

#34 2020-06-23 15:39:27

Cxpher
Member
Registered: 2016-06-05
Posts: 13

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

Have the exact same issue that Darren originally had with Chrome tabs and pacman segfaults. Yay also segfaults. Steam webhelper segfaults like crazy.

This is with a Ryzen 3950x and an ASRock X570 Taichi motherboard. Still working the problem.

Last edited by Cxpher (2020-06-23 15:40:15)

Offline

#35 2020-06-23 15:43:40

daren_k
Member
Registered: 2020-02-13
Posts: 20

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

Did you update the BIOS?

I am on F12e BIOS which works stable ever since arch is on kernel 5.6+, now 5.7.4.

There is F12f and F20a (with a lower AGESA version for some reason) that I didn't try out.

Edit:
Oh yeah those BIOS version numbers are for the GIGABYTE board, nevermind.

Last edited by daren_k (2020-06-23 15:49:31)

Offline

#36 2020-06-23 15:54:23

Cxpher
Member
Registered: 2016-06-05
Posts: 13

Re: Instability with Ryzen 9 3950X and updated X570 BIOS

Yeah. The versions for the X570 are different.

https://www.asrock.com/mb/amd/x570%20taichi/#BIOS

I cold flashed to 3.00 earlier but haven't tested it. Ran into some issues before flashing where I couldn't get the system to post.

So now working on that (CMOS battery and the works). Just randomly stopped posting upon a reboot.

This has not been over locked. Just stock.

Last edited by Cxpher (2020-06-23 15:54:41)

Offline

Board footer

Powered by FluxBB