You are not logged in.

#1 2022-01-03 16:54:04

theneuralbit
Member
Registered: 2018-02-24
Posts: 5

[SOLVED] Arch headless server freezing after 24-72 hours uptime

I have an arch server that I typically run headless, although I've currently got it set up with a monitor and keyboard to help debug this issue. A few weeks ago I switched out the server's hardware, changing from an Intel to AMD processor, following the Top to Bottom guide on ArchWiki

Ever since then I've had this stability issue where the system will freeze after 24-72 hours. I can't access it over the network, and now that I have a monitor and keyboard, I can see the command prompt there is frozen too (but still shows the latest output). I've only been able to get the system back with a hard reset when this happens.

Here's a summary of what I've tried so far:
- Uninstall intel-ucode and installed amd-ucode (I was certain this was the issue when I discovered it, but alas it didn't fix anything).
- Flashed latest BIOS for the new motherboard.
- Checked journalctl with boot=-1 but there's nothing out of the ordinary right before the crash.
- I thought this might be a kernel panic so I tried to set up a crashdump kernel following the guide on ArchWiki to get some more debug information. Unfortunately I haven't been able to get it working, the crashdump kernel fails to boot when it can't find /dev/disk/by-uuid.

I'm not sure what else to try here, does anyone have any other suggestions? If this does sound like a kernel panic, can anyone advise on why the crashdump kernel is failing to boot?

Last edited by theneuralbit (2022-05-27 14:20:13)

Offline

#2 2022-01-03 17:04:47

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,532
Website

Re: [SOLVED] Arch headless server freezing after 24-72 hours uptime

Hopefully this post isn't beating the dead horse - but I'd agree with your first suspicion here:

theneuralbit wrote:

- Uninstall intel-ucode and installed amd-ucode

So on that, can you confirm that 1) you've updated your bootloader / bootmanager config(s) and 2) confirm that these new configs are really being used (e.g., perhaps also add a kernel parameter that will make obvious / notable changes to confirm it is used).

Last edited by Trilby (2022-01-03 17:05:04)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#3 2022-01-03 18:26:11

Zod
Member
From: Hoosiertucky
Registered: 2019-03-10
Posts: 630

Re: [SOLVED] Arch headless server freezing after 24-72 hours uptime

You might consider testing the ram.

At least 10 full passes if not over night.

Offline

#4 2022-01-03 20:53:23

lfitzgerald
Member
Registered: 2021-07-16
Posts: 162

Re: [SOLVED] Arch headless server freezing after 24-72 hours uptime

Assuming the above suggestions don't fix it, I would suggest logging the CPU and RAM usage as well to make sure that's not the cause. There are various tools that do it but a very simple solution is to write a script that appends the output of mpstat, free and so on to a file. Then have a systemd time run that every minute. Maybe there's a resource usage spike immediately before the crash.

Offline

#5 2022-01-03 22:29:41

avi9526
Member
Registered: 2015-05-15
Posts: 116

Re: [SOLVED] Arch headless server freezing after 24-72 hours uptime

0) 1+ pass of memtest as suggested above and cpuburn for 4 hours (pay attention to temperature)
1) Does this server have GPU normally or it's really headless? Does it have AMD GPU? Try

/etc/modprobe.d/amdgpu.conf

blacklist amdgpu

2) Have you tried enable systemd watchdog (if motherboard support it)
3) LTS kernel? Different kernels also freeze?
4) Check/Reinstall all packages (see wiki) in case of file system corruption
5) Enable REISUB commands, check if they are working when system freeze
/etc/sysctl.d/99-sysctl.conf

kernel.sysrq = 1

6) Out of repository/AUR packages?

Last edited by avi9526 (2022-01-03 22:42:27)

Offline

#6 2022-01-04 00:32:29

jonno2002
Member
Registered: 2016-11-21
Posts: 684

Re: [SOLVED] Arch headless server freezing after 24-72 hours uptime

i dont know which amd cpu you have but i had stability issues with an 1800x which i fixed with this bios setting:

Advanced/AMD CBS/Zen Common Options
    Power Supply Idle Control -> Typical Current Idle

Offline

#7 2022-01-14 02:24:39

theneuralbit
Member
Registered: 2018-02-24
Posts: 5

Re: [SOLVED] Arch headless server freezing after 24-72 hours uptime

Thanks for the suggestions everyone! Sorry I took so long to get back to this. I was actually waiting for an email notification and was surprised to find all this great advice when I checked back a week later.

Zod wrote:

You might consider testing the ram.

At least 10 full passes if not over night.

I installed memtest86+ and booted it up. The screen is turning off, typically somewhere during the second pass. The power LED is still on though. Perhaps this is a crash indicating bad RAM? I was expecting it just to report an error if something was wrong. I'll see if I can swap in another stick and get different behavior.

Trilby wrote:

So on that, can you confirm that 1) you've updated your bootloader / bootmanager config(s) and 2) confirm that these new configs are really being used

Good question, I did regenerate the grub config (grub-mkconfig). I also just did this again after installing memtest86+, and it found it and added it to the grub menu. I just checked the grub.cfg and it does have references to amd-ucode.img and not intel-ucode.img.

Offline

#8 2022-05-27 14:15:54

theneuralbit
Member
Registered: 2018-02-24
Posts: 5

Re: [SOLVED] Arch headless server freezing after 24-72 hours uptime

jonno2002 wrote:

i dont know which amd cpu you have but i had stability issues with an 1800x which i fixed with this bios setting:

Advanced/AMD CBS/Zen Common Options
    Power Supply Idle Control -> Typical Current Idle

This turned out to be the issue. Thank you!!

I'm fine turning this off if it's what I need to do to keep the machine running, but it would be nice to reduce power usage when idle though. I wonder if there's some software change that would resolve this?

Offline

Board footer

Powered by FluxBB