You are not logged in.

#1 2018-10-10 20:19:17

microbug
Member
Registered: 2015-02-07
Posts: 14

System hangs -- troubleshooting help [resolved]

I have been experiencing hangs in my system recently. The screen goes black and the system does not respond. Sometimes SYSRQ+B will force a reboot, other times I have to turn it off at the wall (even holding the power button does not force a shutdown). The hangs often happen when playing intensive games but not always.

I replaced my CPU and motherboard a few days ago, partly because I thought one of the two parts was broken and partly to upgrade to an 8c/16c processor. To be clear, these problems were happening before and after the upgrade. System specs:

CPU: Ryzen 7 2700 (new, not overclocked)
Motherboard: Gigabyte X470 Gaming 7 WiFi (new)
RAM: Corsair Vengeance 3000MHz (a few years old but passed   4 24 runs in memtest86)
GPU: Radeon RX Vega 64 (a few months old, also not overclocked)
PSU: Corsair HX1000i (Bought used, this could be the culprit but I'd like to check before I get another. I don't have another PSU around to use.)
Storage: 525GB Crucial SATA SSD
OS: Arch Linux 4.18.12 (system is up to date)

The journalctl logs from around a recent hang are here: https://pastebin.com/T6bWM13Z . I couldn't see anything out of the ordinary but maybe one of you will.

I'm about to lower the RAM speed to 2133MHz (3000MHz is from the XMP profile) but I think I've already tried that and it didn't help.

Any suggestions on how to proceed? Thanks in advance.

Edit: added more details
Edit 2: on rereading my post I realised that 4 memtest86 passes might not be enough, so I'm retesting with 20 passes. I'll report back after.

Last edited by microbug (2019-03-15 14:11:00)

Offline

#2 2018-10-11 03:17:41

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: System hangs -- troubleshooting help [resolved]

That PSU you are worrying about is usually safe to use used. It's one of those rather expensive models where manufacturers are super confident and provide ten years warranty. Then again, maybe the person that sold it also had problems, and that's why it was sold?

What was your system like before the upgrade? Did you keep everything else besides the motherboard and CPU?

Maybe try the "linux-lts" kernel and see what happens.

I'm the kind of person that overclocks everything. This then means the computer crashes a lot while I experiment. I'm using btrfs, and running its "scrub" tool after crashing does rather often find corruption in files. If your PC wasn't 100% stable in the past, maybe some files are corrupted and then causing problems that would normally not happen? Maybe the hardware is fine right now and you just have to reinstall all packages to get fresh files? I don't know if there's a command or script for Arch that checks files for corruption. A script or one-liner that compares files with the contents of the packages in pacman's cache might be possible to do, so someone might have written that a script for this.

About that corruption issue, it's possible to reinstall all packages as follows, but I wouldn't run this on a computer that's not stable:

pacman -Qnq | sudo pacman -S -

(this will only reinstall Arch packages, not AUR packages)

Last edited by Ropid (2018-10-11 03:19:57)

Offline

#3 2018-10-11 09:41:32

microbug
Member
Registered: 2015-02-07
Posts: 14

Re: System hangs -- troubleshooting help [resolved]

I think the person before me had been using the PSU for mining. As you say, it shouldn't be failing (and if it is, Corsair is usually pretty generous on warranties).

My system was equally unstable before the upgrade. Only the motherboard and CPU were swapped, everything else is the same.

I'm not using BTRFS so I can't do a scrub. I don't have much to lose at this point — this PC is mostly used for learning more about Linux and gaming so all I need to do is back up my save files. I'll try reinstalling all packages as you suggested, then the linux-lts kernel and finally reinstalling Arch.

I'm pretty confident it's not the RAM now. Memtest86 is now on 17 completed passes with no errors, I'll leave it for 20 but I think that rules out the RAM being defective.

Offline

#4 2018-10-11 10:52:19

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,410

Re: System hangs -- troubleshooting help [resolved]

What's your BIOS/AGESA version? There have been some known issues with modern AGESA (1.0.0.4) versions. If you can I'd try to downgrade that. Also FWIW that's a pretty useless journal excerpt on any account. After creating a freeze and rebooting post the complete output of

journalctl -b-1

make sure it isn't truncated by your pager.

Offline

#5 2018-10-11 22:07:17

microbug
Member
Registered: 2015-02-07
Posts: 14

Re: System hangs -- troubleshooting help [resolved]

Thanks for the help everyone. Reinstalling all packages seems to have worked, I haven't seen any hangs since then when normally I'm getting 3-5 per day.

FWIW I am on AGESA 1.0.0.4 so if I see further problems I'll downgrade. I'm aware of at least one issue with the latest AGESA (a timeout when starting up that causes a delay of a few seconds) that should be fixed in kernel 4.19 though. And thanks for the tip on journalctl, I didn't realise you could jump to the previous boot like that.

Offline

#6 2018-10-13 11:48:29

microbug
Member
Registered: 2015-02-07
Posts: 14

Re: System hangs -- troubleshooting help [resolved]

So after a few days this is much better, but not completely solved. I think there used to be two failure modes; one where the kernel would still respond to SYSRQ+B and one where it wouldn't. Now I'm only seeing the latter; the machine is completely locked up and can only be reset by removing AC power. This happens often enough to still be very frustrating.

The (improved) Pastebin of journalctl is here: https://pastebin.com/d0sc0qRL.

I've downgraded to AGESA 1.0.0.2 so I'll see if that helps.

Last edited by microbug (2018-10-13 11:48:57)

Offline

#7 2018-10-27 13:08:27

muddy
Member
Registered: 2016-08-10
Posts: 17

Re: System hangs -- troubleshooting help [resolved]

Any updates Microbug? I'm having lockup issues as well with Ryzen 1800x on MSI X470 Carbon Pro and Nvidia 1060.
Upgraded my hardware from old i5-2700k and all I did was update the Nvidia drivers to the newest ones and reinstalled the kernel and headers.
All I get in the journal is lots of messages about vmware dhcp for the vmnet and then my forced reboot by hitting the reset button.
Dmesg shows no issues and the box is updated every day.
Have replaced the memory 3x times now, will try a full box forced reinstall when I get home with the new memory today.

Last edited by muddy (2018-10-27 13:08:51)

Offline

#8 2018-10-28 16:32:07

microbug
Member
Registered: 2015-02-07
Posts: 14

Re: System hangs -- troubleshooting help [resolved]

It’s better than it was for me, but not fixed. My journalctl log doesn’t show anything from immediately before my PC freezes, and when it freezes I have to pull the AC power to turn it off. I get about 2 such freezes per week. 

My next plan is to reinstall Arch on BTRFS or ZFS so that I can scrub the filesystem for corruption, as that would let me eliminate one possible cause. (I also want to upgrade from i3 to Sway so instability is a nice excuse to do that without leaving any crud from the old i3 install).

I will also try updating the BIOS if/when a new one is released.

Last edited by microbug (2018-10-28 16:32:46)

Offline

#9 2018-11-06 13:36:38

microbug
Member
Registered: 2015-02-07
Posts: 14

Re: System hangs -- troubleshooting help [resolved]

This problem appears to be well known: https://www.reddit.com/r/Amd/comments/9 … inux_udev/

There are two current workarounds; switch to linux-lts kernel or add module_blacklist=ccp to your kernel command line. It sounds like the problem should be fixed in a future AGESA and/or kernel version.

Offline

#10 2018-11-07 01:07:56

muddy
Member
Registered: 2016-08-10
Posts: 17

Re: System hangs -- troubleshooting help [resolved]

Good to know, thanks so much for the follow up!
Checking on the options you mentioned.

Offline

#11 2018-11-07 12:51:56

Makersmarx
Member
From: Costa Rica
Registered: 2018-04-17
Posts: 24

Re: System hangs -- troubleshooting help [resolved]

As others mentioned that may help. AGESA only seemed to effect me @ boot when I upgraded to the 1.0.0.4 so I downgraded. The only solution that worked for me with my 1600 was going into my BIOS and setting Power Supply Idle Control to Typical. Its default is set to auto. Here is a link for review from AMD's forums & Linux users. https://community.amd.com/thread/225795

Offline

#12 2018-12-08 23:14:32

microbug
Member
Registered: 2015-02-07
Posts: 14

Re: System hangs -- troubleshooting help [resolved]

Quick update: I have replaced my motherboard and I'm still having problems. I'm now only experiencing hard power offs that leave no error messages in `journalctl`. I think this must mean that the power supply is suspect. I am ordering another power supply and will test it. Frustrating but now I know not to buy used power supplies sad.

Last edited by microbug (2018-12-08 23:15:27)

Offline

#13 2018-12-09 00:45:32

qrwteyrutiyoup
Member
From: Canada
Registered: 2017-12-26
Posts: 17

Re: System hangs -- troubleshooting help [resolved]

microbug wrote:

Quick update: I have replaced my motherboard and I'm still having problems. I'm now only experiencing hard power offs that leave no error messages in `journalctl`. I think this must mean that the power supply is suspect. I am ordering another power supply and will test it. Frustrating but now I know not to buy used power supplies sad.

Have you checked the "Power Supply Idle Control" setting in the BIOS, as a couple of people suggested?

Offline

#14 2018-12-09 17:35:28

microbug
Member
Registered: 2015-02-07
Posts: 14

Re: System hangs -- troubleshooting help [resolved]

qrwteyrutiyoup wrote:
microbug wrote:

Quick update: I have replaced my motherboard and I'm still having problems. I'm now only experiencing hard power offs that leave no error messages in `journalctl`. I think this must mean that the power supply is suspect. I am ordering another power supply and will test it. Frustrating but now I know not to buy used power supplies sad.

Have you checked the "Power Supply Idle Control" setting in the BIOS, as a couple of people suggested?

I think I've tried that with the old motherboard but I'll test it with the new one too before I order another PSU.

Offline

#15 2019-02-12 23:09:09

smiths_cloud
Member
From: Napier, New Zealand
Registered: 2013-11-28
Posts: 5

Re: System hangs -- troubleshooting help [resolved]

We had a similar problem at my work (where we run about 10 developer PCs on Arch Linux) - one PC used to frequently lock up overnight. I tried several things including some of the suggestions here, and a full (4 hour) hardware diagnostic test. There was nothing much in journalctl that indicated anything, just the whole system would freeze, couldn't even get to a console (Alt-Ctrl-2, etc) or log on remotely. The system was completely unresponsive to mouse and keyboard.

In the end the developer asked if we could turn off the screensaver. I had a quick look and couldn't be certain how I'd enabled it in the first place, so I removed 3 packages - xscreensaver mate-screensaver and electricsheep. The problem seems to have disappeared.

Offline

#16 2019-03-15 14:10:42

microbug
Member
Registered: 2015-02-07
Posts: 14

Re: System hangs -- troubleshooting help [resolved]

Ok final update (hopefully). After at least a month of stability I think I've solved it. First I updated everything to the latest possible version. Then I disabled all variable frequency settings in the BIOS (Cool 'n Quiet, P states / C states). This helped a bit but it was still unstable, shutting down every 5 hours or so. Then I set a fixed voltage for the CPU of 1.3V. This is much more than it should need without overclocking, but it's within safe limits. The system has now been stable for several weeks so I think this is resolved. Thanks all for your suggestions.

Offline

Board footer

Powered by FluxBB