You are not logged in.

#51 2020-07-02 07:38:30

Gnurou
Member
From: Tokyo
Registered: 2009-07-20
Posts: 25
Website

Re: System randomly freeze

loqs wrote:
Gnurou wrote:

I could also see this issue with 5.6. At the moment I am running 5.4 LTS, will need more uptime to know whether it is good or not.

Without a known good version bisection is not viable anyway.  Has the system always had the issue since you installed arch on it?

This system has been running ~1 year without any issue. These freezes started happening around April, so after 5.4 I'd say.

loqs wrote:

Your motherboard only has one PCIE slot so you can not test with a different one.
Have you replaced either the GPU,  mainboard or CPU?

Nope. I guess that will be the next step if 5.4 also prooves unstable.

Offline

#52 2020-07-02 17:19:42

mac1202
Member
Registered: 2011-05-24
Posts: 33

Re: System randomly freeze

mac1202 wrote:

I have the same issue with kernel 5.7 to 5.8rc3 on my laptop with ryzen 4500U. Just downgraded to kernel 5.6.9. Will report if it fix the freeze.

My system was stable for a whole day since the downgrade. With kernel >5.7 I got several freeze a day. I will stick to 5.6.9 until a proper fix is pushed upstream.

Last edited by mac1202 (2020-07-02 17:20:08)

Offline

#53 2020-07-02 17:31:27

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

I drill have the motherboard and the CPU, so I can still test and I will use them when kernel will be fixed.

I keep an eye on this thread, gl for folks trying to bisect.

Last edited by automne (2020-07-02 17:32:46)

Offline

#54 2020-07-02 19:20:51

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: System randomly freeze

mac1202 wrote:
mac1202 wrote:

I have the same issue with kernel 5.7 to 5.8rc3 on my laptop with ryzen 4500U. Just downgraded to kernel 5.6.9. Will report if it fix the freeze.

My system was stable for a whole day since the downgrade. With kernel >5.7 I got several freeze a day. I will stick to 5.6.9 until a proper fix is pushed upstream.

mac1202 please post the kernel messages from a boot with the issue for comparison.

Offline

#55 2020-07-04 11:32:38

mac1202
Member
Registered: 2011-05-24
Posts: 33

Re: System randomly freeze

I switched back to 5.7 to post a log. There is a new version (5.7.7) for 5.7 series. Didn't get a crash with this one if somebody else could try this version and report result to confirm this.

Offline

#56 2020-07-06 12:43:33

eggz
Member
Registered: 2017-09-16
Posts: 8

Re: System randomly freeze

For me personally, the problem started with AMDGPU DRM on linux 5.7.7 . There was a major dump of AMDGPU code that release, and I think not all of it went well for my RAVENRIDGE based system.

I can instantly trigger a full system freeze when playing 4k videos when on 5.7.7

I am kind of suprised most of you already had freezing problems before 5.7.6, because for me, 5.7.6 fixes every problem.

Offline

#57 2020-07-12 10:20:09

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

FYI: Yesterday my PC was freezing again with kernel 5.7.7 after about 5 days without issues and running 24/7. It happened while I was running "ls -al" from a remote host on a NFS share on my PC. Sadly again no errors reported in systemd journal.

Offline

#58 2020-07-13 07:05:04

digitalone
Member
Registered: 2011-08-19
Posts: 328

Re: System randomly freeze

To me, when a freeze happens, I see GPU lockup errors in systemd journal.

Offline

#59 2020-07-14 01:49:38

Gnurou
Member
From: Tokyo
Registered: 2009-07-20
Posts: 25
Website

Re: System randomly freeze

Just got a freeze with 5.4 LTS. sad Going back to 5.7.8...

Offline

#60 2020-07-14 01:50:26

Gnurou
Member
From: Tokyo
Registered: 2009-07-20
Posts: 25
Website

Re: System randomly freeze

digitalone wrote:

To me, when a freeze happens, I see GPU lockup errors in systemd journal.

Could you share these errors so we can compare with what has been posted previously?

Offline

#61 2020-07-14 15:56:46

digitalone
Member
Registered: 2011-08-19
Posts: 328

Re: System randomly freeze

Gnurou wrote:
digitalone wrote:

To me, when a freeze happens, I see GPU lockup errors in systemd journal.

Could you share these errors so we can compare with what has been posted previously?

Keep getting freezes, but less frequently than before. One every 2-3 days. I'm on linux-lts, now I'm testing with amdgpu rather than radeon to see if radeon module is the culprit.

https://pastebin.com/fDTdXiiS
https://pastebin.com/fpT018eg

Last edited by digitalone (2020-07-14 16:01:40)

Offline

#62 2020-07-15 01:08:09

Gnurou
Member
From: Tokyo
Registered: 2009-07-20
Posts: 25
Website

Re: System randomly freeze

digitalone wrote:
Gnurou wrote:
digitalone wrote:

To me, when a freeze happens, I see GPU lockup errors in systemd journal.

Could you share these errors so we can compare with what has been posted previously?

Keep getting freezes, but less frequently than before. One every 2-3 days. I'm on linux-lts, now I'm testing with amdgpu rather than radeon to see if radeon module is the culprit.

https://pastebin.com/fDTdXiiS
https://pastebin.com/fpT018eg

This looks pretty different from what we have been experienced here and rather hints at a bug in the graphics stack. Here I'd indeed check if amdgpu helps, but preferably on the most recent kernel possible.

Offline

#63 2020-07-15 06:19:48

digitalone
Member
Registered: 2011-08-19
Posts: 328

Re: System randomly freeze

Gnurou wrote:

This looks pretty different from what we have been experienced here and rather hints at a bug in the graphics stack. Here I'd indeed check if amdgpu helps, but preferably on the most recent kernel possible.

That card is old, it runs by default on radeon driver, but in the last year I used amdgpu, which its support is experimental.

But never had issues, expect in the last weeks where I discovered that amdgpu lacks of audio through HDMI, so returned to radeon which fully works on HDMI, but I started experiencing random freezes.

Since I didn't have any in the previous months, I'm using old kernel versions, but I'm starting to think that the issue is the radeon driver.

Offline

#64 2020-07-20 23:30:42

tolga9009
Member
From: Germany
Registered: 2010-01-08
Posts: 62

Re: System randomly freeze

I'm also getting AER errors on PCIe GPP bridge, which my GPU is attached to. When the freeze happens, my mouse / audio / screen gets stuck and screen turns black after ~3 seconds. GPU LEDs start blinking. Power button short press doesn't shutdown the system. Forced shutdown using power button long-press is the only solution to get out. Crashes happen only during idle state or light load.

Specs:
Ryzen 7 2700
ASUS PRIME X470-Pro, latest BIOS (5601)
2x 16GB Crucial Ballistix DDR4-3200, Dual Rank
ASUS ROG STRIX RX480 OC
Samsung 970 Evo M.2
Samsung PM961 M.2
Corsair AX860 860W, 80+ Platinum PSU

Problems started shortly after installing 2nd M.2 SSD (970 Evo) and upgrading from 2x 8GB DDR4-3200 Samsung B-Die Single-Rank to 2x 16GB DDR4-3200 Micron E-Die Dual-Rank. Since I upgraded RAM and SSD at the same time, I don't know, what really caused it. Error logs clearly point to the GPU, but I think GPU failure is just a symptom, not the root cause. I had no stability issues for over 6 months, until I upgraded the components.

I've tried so far:
- AMD CBS -> Power Supply Idle Control -> Typical Current Idle. What this option does: in lowest C-State, your CPU runs at 0.8V, instead of the default 0.4V. Didn't help.
- Instead of XMP DDR4-3200 1.35V, I've tried setting my RAM to JEDEC DDR4-2400 1.2V. No changes.
- Updated all kinds of drivers, firmwares etc. No changes.
- System is prime95 stable, Memtest stable and 3D Mark stable. No issues during load.

I'm also dual booting Windows 10 and the rare freezes happen there aswell. Therefore I suspect this is a hardware issue. However, Windows Event Manager doesn't output anything useful, other than "Sudden Power Loss". GPU is out of warranty, CPU, Mainboard, SSD and RAM still have warranty left.

Last edited by tolga9009 (2020-07-20 23:43:09)

Offline

#65 2020-07-21 00:59:34

Gnurou
Member
From: Tokyo
Registered: 2009-07-20
Posts: 25
Website

Re: System randomly freeze

tolga9009, this is exactly the same problem indeed. What frequency of crash are you getting?

Same observation on my end that load doesn't seem to be the problem, if anything it seems to happen to me when the system is idle.

The latest thing I am trying is to slightly increase the CPU and DRAM voltage in the BIOS, so far so good (2 days) but I am getting crashes weekly on average.

Offline

#66 2020-07-21 09:44:44

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: System randomly freeze

tolga9009 wrote:

[...]
Specs:
Ryzen 7 2700
ASUS PRIME X470-Pro, latest BIOS (5601)
2x 16GB Crucial Ballistix DDR4-3200, Dual Rank
ASUS ROG STRIX RX480 OC
[...]

I have a Ryzen 2700X, ASRock X470 board, 2x16GB ECC RAM, and RX480 graphics card. About PCIe AER errors/warnings, for me here those seem to be caused by the PCIe ASPM ("Advanced State Power Management") feature. I can make the AER errors go away by disabling ASPM with this kernel command line argument here:

pcie_aspm=off

When ASPM is enabled, then I can make AER warnings show up reliably by starting a prime95 torture test. AER warnings then start showing up within minutes.

Windows 10 can track AER warnings as well in the "Windows Event Viewer" tool in the "administrative events" section. I didn't seem to get AER warnings in Windows, just in Linux. From what I could see, in Windows the PCIe ASPM feature is by default disabled for me for PCIe devices. I checked this in the tool "HWINFO64".

About the memory (and maybe the random freezes):

Does simply enabling XMP work for you with your 2x16GB memory kit? You didn't need to tweak anything else to run 3200MHz memory speed?

For me here with my Ryzen 2700X and 2x16GB dual-rank memory, I need to manually tweak ProcODT and RttNom, RttWr, RttPark resistances in the memory timing section of the BIOS menus. If I use the BIOS defaults for ProcODT and Rtt, the system only wants to do 2933 MHz at most.

The following values are somewhat standard for dual-rank memory from what I could find:

ProcODT = 68 Ohm
RttNom = 34 Ohm = RZQ/7
RttWr = 80 Ohm = RZQ/3
RttPark = 240 Ohm = RZQ/1

If ProcODT and Rtt are at default for you right now, perhaps try setting them manually to these values here and see what happens. The difference between values for dual-rank and single-rank is mainly the RttPark value. With single-rank it should be off (or a low value).

There's also another set of four resistances "CAD_BUS": ClkDrvStr, AddrCmdDrvStr, CsOdtDrvStr, CKEDrvStr. The majority of people don't seem to need to touch those. A normal set of values for CAD_BUS is 24, 24, 24, 24.

I had mysterious problems here where the system was usually running fine and could pass 10 hours of memory stress testing, then on another day it would find errors. Every boot had a chance to be bad. The Rtt and CAD resistance settings in the BIOS apparently can influence this, but I couldn't find any rules about how different values change the odds for the bad boot. I ended up just doing a random change to the settings whenever there was a memory problem. I have ECC memory here so I can just use the PC like normal and memory errors show up in the log.

I'm blaming that mysterious bad boot problem for the freezes I had in the past. It's now a few months since I fixed the random memory problems, and I also didn't have a freeze in months.

Offline

#67 2020-07-21 20:02:30

tolga9009
Member
From: Germany
Registered: 2010-01-08
Posts: 62

Re: System randomly freeze

Gnurou wrote:

What frequency of crash are you getting?

About 2 times a month or so. But when it happens, sometimes it will happen twice in a row. E.g. system freeze -> force shutdown -> reboot and system freezes within a few minutes again. I had this with my old components on older BIOS versions: everything below AGESA PinnaclePi 1.0.0.6 would crash - I don't know, if AER errors showed up back then aswell. But PinnaclePi 1.0.0.6 was extremely stable (6 months no crash). I have a valid BIOS dump and a hardware flasher. I think I'll downgrade.

Ropid wrote:

About PCIe AER errors/warnings, for me here those seem to be caused by the PCIe ASPM ("Advanced State Power Management") feature. I can make the AER errors go away by disabling ASPM with this kernel command line argument here:
...
Windows 10 can track AER warnings as well in the "Windows Event Viewer" tool in the "administrative events" section.

Sounds very plausible to me. But from what I've heard, disabling ASPM seems problematic for some NVMe devices. I need to test it.

Just looked there, AER is definitely nowhere to be found in Windows. Symptoms are the same though.

Ropid wrote:

Does simply enabling XMP work for you with your 2x16GB memory kit? You didn't need to tweak anything else to run 3200MHz memory speed?

Yes. My Ryzen 7 2700 has a poor IMC, can't overclock it much (maxes out around ~3533 iirc). But it does DDR4-3200 Dual-Rank w/ 2x 16GB without issues.

Offline

#68 2020-08-24 16:15:45

turmoni
Member
Registered: 2020-06-15
Posts: 3

Re: System randomly freeze

I think the 5.8 kernel has fixed the problem I was having - at least, I have an uptime of almost two days without it freezing.

Offline

#69 2020-08-24 20:51:18

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

The later 5.7.x worked also better for me. At least 5.7.11 worked about a week before the system freezed. I only have 5.8.2 running for two days now. But that's at least already 24 hrs more than the first 5.7 version ;-)

Offline

#70 2020-08-25 01:28:48

Gnurou
Member
From: Tokyo
Registered: 2009-07-20
Posts: 25
Website

Re: System randomly freeze

On my side I have been enjoying a few weeks of stability (before 5.8 then) after adding the following kernel parameters:

pci=noaer iommu=soft

I need to try again with either one to see which one fixes the problem, but since my repro rate is ~1 time per week I have been delaying and enjoying my stable system. Hope this helps though.

Offline

#71 2020-09-07 19:09:56

tolga9009
Member
From: Germany
Registered: 2010-01-08
Posts: 62

Re: System randomly freeze

Update on my case: after posting my crash issue here, it started to appear almost daily, sometimes even up to twice a day. It's really weird. My workload over the past few weeks was reading PDFs and light web browsing. Turning off ASPM didn't fix the issue.

I now think my GPU (ASUS ROG Strix RX480 OC) was responsible for the issues. Few years ago, ASUS released a VBIOS "Optimize memory and improve stability", which I blindly installed back then. I reverted to the original factory VBIOS about a week ago and haven't seen any freezes since then. A bit too early for final conclusions, but it looks promising so far.

//Edit: GPU wasn't the issue, as the freezes returned. Turns out, culprit was RAM. Was running Crucial Ballistix Sport LT DDR4-3200 2x 16Gb. RMA'd it and replaced it by Samsung DDR4-2666 ECC Memory 2x 16Gb about 2 months ago and haven't seen a single freeze since then. Definitely wasn't ASPM in my case, as it's deactivated per default (and can not be enabled) on my mainboard.

//Edit 2: Yepp, almost 10 months later, no issues. My issue was 100% RAM.

Last edited by tolga9009 (2021-09-25 01:49:03)

Offline

#72 2020-09-12 19:54:32

Amphitryon
Member
Registered: 2013-09-20
Posts: 39

Re: System randomly freeze

I have been getting semi-random freezes too.  I think these started some time after mid-July.  Initially I thought it may be due to defective RAM as I added another 16Gb around that time however MemTest86 did not find any fault with any of the RAM in the system and Yesterday I removed the extra RAM as an experiment and that did not avoid the freeze so I have put it back in again.

So now I am beginning to suspect it is a software issue.  While it can sometimes seem to be completely random there are a few of things that seem to be more likely to provoke it:

1. Failing to suspend.  In this case the last entry in the system journal is "kernel: PM: suspend entry (deep)" but it doesn't actually suspend.  The power LED remains solid while it would usually flash while suspended and nothing seems to be able to wake it again.

2. Some video operations.  Recently it has crashed during a zoom call and during a MS teams call and my daughters have had trouble watching video from various websites they go on.

3. There was a short period during which virtualbox would not run at the same time as either chromium or epiphany (the GNOME web browser).  Attempting to do so would quickly cause a freeze.

For the last two there is nothing logged to the system journal around the time of the freeze and nothing on the screen either.  It just become impossible to move the mouse and doesn't respond to the keyboard - not event the Alt-Fn to switch to other consoles.  Attempting to connect via PuTTY to the ssh port results in a blank screen so I am not sure if the connection is being accept and nothing being spawned but it doesn't report a timeout.  The only thing that seems to get out of this is holding the power button until it turns off and then turning on again.

In my case, though, this has nothing to do with AMD graphics as this is an Intel(R) Core(TM) i5-3570K with the integrated graphics (i915 driver).  SO now I am wondering if there has been some change to the graphics pipeline that is not AMD specific.

Offline

#73 2020-10-11 10:34:36

demi
Member
Registered: 2020-10-11
Posts: 1

Re: System randomly freeze

Similar, presumably ASPM-related freezes after moving beyond 5.4-ish here, too – on a Lenovo X240 (intel graphics). Has anyone got deeper insights on this yet?

Offline

Board footer

Powered by FluxBB