You are not logged in.

#1 2020-04-06 20:51:24

Roken
Member
From: South Wales, UK
Registered: 2012-01-16
Posts: 1,359

Random shutdown followed by RAM not recognised

I don't know how to title this to be more descriptive, and I don't know if it's a linux issue or a BIOS issue.

Occasionally (more than a couple of times over the past fortnight) I've had my computer, not really shutdown, but cease to operate. The initial symptoms have been different. e.g the keyboard may just shut down (all the LEDs go off and it's unresponsive), or the monitor will suddenly blank (though any audio will continue for a short while).

It doesn't go for a shutdown completely, then it looks loke a shutdown. until the fan goes into overdrive until I force power off.

Frequently, following this, a reboot results in error beeps telling me there's no RAM installed. (there is, 16GB Corsair Vengeance). I don't know if this is a part of the problem, or a symptom.

My first suspicion was over-heating, but after getting it all back up and running (not a short process), and extensive stress testing (prime95, Blender and Luxcore renders, stressing both CPU and GPU to max) temps always stay well within tolerance. CPU peaks at 71C when running all cores/all threads at 100% for 15 - 20 minutes, and GPU at 76C, both well within tolerance.

Memtest reports no errors at all.

After some searching, there may have been BIOS issues with RAM in the past, and tonight, I've dropped the max frequency from 3Ghz to 2800 to see if it helps.

However, is there a chance that there's a kernel regression causing this?

For info.

MB - Asus ROG Strix B350 -F - latest BIOS
CPU - Ryzen 1800X (not overclocked)
RAM - 2x8Gb Corsair Vengeance rated at 3Ghz

Now, because when this happens I have to spend an age booting and rebooting, I've never been able to get a log, so I'm hoping that someone can come up with a suggestion based on what I've said.

It may be that dropping the RAM frequency will fix it all (suggestion from ASUS forums), but I'm not entirely confident.

Unfortunately, because it's so erratic, confirming a fix is likely to take weeks.

Last edited by Roken (2020-04-06 20:55:40)


Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Fractal Design Define 7 XL, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703

/ is the root of all problems.

Offline

#2 2020-04-06 20:58:11

behnamgolds
Member
Registered: 2020-04-06
Posts: 9

Re: Random shutdown followed by RAM not recognised

Have you tried removing the RAM modules one at a time and see if the problem goes away?

Edit:
Do you have any other OS on your system to see if it is an OS problem or a hardware one?

Last edited by behnamgolds (2020-04-06 21:00:37)

Offline

#3 2020-04-06 20:59:42

Roken
Member
From: South Wales, UK
Registered: 2012-01-16
Posts: 1,359

Re: Random shutdown followed by RAM not recognised

I have. When it dies so that RAM is not recognised, neither one nor the other sticks is recognised. I believe that disconnecting from power for a few minutes and shorting BIOS reset is what eventually gets me back up.

And for absolute clarity, I haven't just removed each stick, but I've moved them to different slots in the process.

Last edited by Roken (2020-04-06 21:00:44)


Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Fractal Design Define 7 XL, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703

/ is the root of all problems.

Offline

#4 2020-04-06 21:00:03

Slithery
Administrator
From: Norfolk, UK
Registered: 2013-12-01
Posts: 5,776

Re: Random shutdown followed by RAM not recognised

How long did you let memtest run for?


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan
Closing -- for deletion; Banning -- for muppetry. - jasonwryan

aur - dotfiles

Offline

#5 2020-04-06 21:01:22

Roken
Member
From: South Wales, UK
Registered: 2012-01-16
Posts: 1,359

Re: Random shutdown followed by RAM not recognised

Not the last time, but the time before I left it overnight. I'm happy to try that again tonight.


Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Fractal Design Define 7 XL, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703

/ is the root of all problems.

Offline

#6 2020-04-06 21:05:41

Roken
Member
From: South Wales, UK
Registered: 2012-01-16
Posts: 1,359

Re: Random shutdown followed by RAM not recognised

For added info, and I do realise I have a lot of drives in this system, I have tried disconneting non-critical drives to rule out PSU problems (Corsair 650W) but if you think power may be a problem, I could probably re-arrange data to strip a couple of drives from the system.

Last edited by Roken (2020-04-06 21:06:14)


Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Fractal Design Define 7 XL, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703

/ is the root of all problems.

Offline

#7 2020-04-07 06:05:42

Roken
Member
From: South Wales, UK
Registered: 2012-01-16
Posts: 1,359

Re: Random shutdown followed by RAM not recognised

Well, I left memtest to it overnight. zero errors.


Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Fractal Design Define 7 XL, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703

/ is the root of all problems.

Offline

#8 2020-04-07 09:33:06

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: Random shutdown followed by RAM not recognised

On my R7-2700X here, I had problems where the memory would randomly not work right. On most boots of the system, there was no problem. Then on some rare boots, the system either got stuck with fans at 100%, or started beeping and rebooting, or didn't manage to get into the OS, or it crashed within a minute in the OS.

I could fix this through manual tweaking of the "CAD_BUS" resistances in the UEFI/BIOS. That CAD_BUS stuff is a set of four values. I don't know what the typical board uses by default. Looking around in overclocking forums, popular settings for the four values are these sets of values here:

24-24-24-24
20-20-20-20
20-24-40-30
30-30-40-60

The exact names of the four settings are this:

(1) ClkDrvStren
(2) AddrCmdDrvStren
(3) CsOdtDrvStren
(4) CkeDrvStren

I had to battle with this for months because the boot problem was sometimes not happening for a whole week. For me here, I got to the following end result after randomly changing things after every problematic boot:

20-40-40-40

The second value seems to be the important one for me here because those 20 or 24 or 30 that I've seen other people use didn't work. Or maybe the whole combination of values is important.

This problem only happens for me after increasing the memory speeds. On my 2700X and my RAM, running up to 3000MHz speed seems to be unproblematic, but after that the strange problems start and it needs heavy manual tweaking to run for example 3066 or 3133 or 3200 speeds.

Throughout this adventure, on the good boots I could test the memory forever. I ran stress test programs for 20 hours and such.

Offline

#9 2020-04-07 13:25:42

Roken
Member
From: South Wales, UK
Registered: 2012-01-16
Posts: 1,359

Re: Random shutdown followed by RAM not recognised

Thank you, Ropid.

It seems to be a problem inherent in the Ryzen architecture, then. Perhaps, as you had success tweaking the timings, me dropping the overall frequency a couple of hundred hertz will do the same, and to be honest, in normal use, not benchmarking, who would notice?


Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Fractal Design Define 7 XL, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703

/ is the root of all problems.

Offline

Board footer

Powered by FluxBB