You are not logged in.

#1 2018-10-10 20:19:17

microbug
Member
Registered: 2015-02-07
Posts: 8

System hangs -- troubleshooting help

I have been experiencing hangs in my system recently. The screen goes black and the system does not respond. Sometimes SYSRQ+B will force a reboot, other times I have to turn it off at the wall (even holding the power button does not force a shutdown). The hangs often happen when playing intensive games but not always.

I replaced my CPU and motherboard a few days ago, partly because I thought one of the two parts was broken and partly to upgrade to an 8c/16c processor. To be clear, these problems were happening before and after the upgrade. System specs:

CPU: Ryzen 7 2700 (new, not overclocked)
Motherboard: Gigabyte X470 Gaming 7 WiFi (new)
RAM: Corsair Vengeance 3000MHz (a few years old but passed   4 24 runs in memtest86)
GPU: Radeon RX Vega 64 (a few months old, also not overclocked)
PSU: Corsair HX1000i (Bought used, this could be the culprit but I'd like to check before I get another. I don't have another PSU around to use.)
Storage: 525GB Crucial SATA SSD
OS: Arch Linux 4.18.12 (system is up to date)

The journalctl logs from around a recent hang are here: https://pastebin.com/T6bWM13Z . I couldn't see anything out of the ordinary but maybe one of you will.

I'm about to lower the RAM speed to 2133MHz (3000MHz is from the XMP profile) but I think I've already tried that and it didn't help.

Any suggestions on how to proceed? Thanks in advance.

Edit: added more details
Edit 2: on rereading my post I realised that 4 memtest86 passes might not be enough, so I'm retesting with 20 passes. I'll report back after.

Last edited by microbug (2018-10-13 11:41:04)

Offline

#2 2018-10-11 03:17:41

Ropid
Member
Registered: 2015-03-09
Posts: 432

Re: System hangs -- troubleshooting help

That PSU you are worrying about is usually safe to use used. It's one of those rather expensive models where manufacturers are super confident and provide ten years warranty. Then again, maybe the person that sold it also had problems, and that's why it was sold?

What was your system like before the upgrade? Did you keep everything else besides the motherboard and CPU?

Maybe try the "linux-lts" kernel and see what happens.

I'm the kind of person that overclocks everything. This then means the computer crashes a lot while I experiment. I'm using btrfs, and running its "scrub" tool after crashing does rather often find corruption in files. If your PC wasn't 100% stable in the past, maybe some files are corrupted and then causing problems that would normally not happen? Maybe the hardware is fine right now and you just have to reinstall all packages to get fresh files? I don't know if there's a command or script for Arch that checks files for corruption. A script or one-liner that compares files with the contents of the packages in pacman's cache might be possible to do, so someone might have written that a script for this.

About that corruption issue, it's possible to reinstall all packages as follows, but I wouldn't run this on a computer that's not stable:

pacman -Qnq | sudo pacman -S -

(this will only reinstall Arch packages, not AUR packages)

Last edited by Ropid (2018-10-11 03:19:57)

Offline

#3 2018-10-11 09:41:32

microbug
Member
Registered: 2015-02-07
Posts: 8

Re: System hangs -- troubleshooting help

I think the person before me had been using the PSU for mining. As you say, it shouldn't be failing (and if it is, Corsair is usually pretty generous on warranties).

My system was equally unstable before the upgrade. Only the motherboard and CPU were swapped, everything else is the same.

I'm not using BTRFS so I can't do a scrub. I don't have much to lose at this point — this PC is mostly used for learning more about Linux and gaming so all I need to do is back up my save files. I'll try reinstalling all packages as you suggested, then the linux-lts kernel and finally reinstalling Arch.

I'm pretty confident it's not the RAM now. Memtest86 is now on 17 completed passes with no errors, I'll leave it for 20 but I think that rules out the RAM being defective.

Offline

#4 2018-10-11 10:52:19

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 5,454

Re: System hangs -- troubleshooting help

What's your BIOS/AGESA version? There have been some known issues with modern AGESA (1.0.0.4) versions. If you can I'd try to downgrade that. Also FWIW that's a pretty useless journal excerpt on any account. After creating a freeze and rebooting post the complete output of

journalctl -b-1

make sure it isn't truncated by your pager.

Offline

#5 2018-10-11 22:07:17

microbug
Member
Registered: 2015-02-07
Posts: 8

Re: System hangs -- troubleshooting help

Thanks for the help everyone. Reinstalling all packages seems to have worked, I haven't seen any hangs since then when normally I'm getting 3-5 per day.

FWIW I am on AGESA 1.0.0.4 so if I see further problems I'll downgrade. I'm aware of at least one issue with the latest AGESA (a timeout when starting up that causes a delay of a few seconds) that should be fixed in kernel 4.19 though. And thanks for the tip on journalctl, I didn't realise you could jump to the previous boot like that.

Offline

#6 2018-10-13 11:48:29

microbug
Member
Registered: 2015-02-07
Posts: 8

Re: System hangs -- troubleshooting help

So after a few days this is much better, but not completely solved. I think there used to be two failure modes; one where the kernel would still respond to SYSRQ+B and one where it wouldn't. Now I'm only seeing the latter; the machine is completely locked up and can only be reset by removing AC power. This happens often enough to still be very frustrating.

The (improved) Pastebin of journalctl is here: https://pastebin.com/d0sc0qRL.

I've downgraded to AGESA 1.0.0.2 so I'll see if that helps.

Last edited by microbug (2018-10-13 11:48:57)

Offline

Board footer

Powered by FluxBB