You are not logged in.
I've been running Arch Linux on multiple systems for years now without any real problems, but it has been acting up the last few months (I might even say the past year). With higher and higher CPU temps (up to 90 degrees Celsius, with me just booting into the system and going AFK; normal idle temps is < 40 C) for the first boot of the day, to it shutting itself off an hour ago. I cannot find the source of the problem. I have tried running virus checks, diagnostic software, checking system logs, the journal, and coredumps, but they all come back clean or nothing useful. Each kernel update seems to worsen the problem.
My main system where this has been happening the last few months:
- Motherboard: Gigabyte B650 Gaming X AX V2
- CPU: AMD 7800x3D
- GPU: Nvidia GTX 1080 Ti
- Memory: G.Skill DDR5-6000 16GBx2 (CL32-38-38-96 1.35v)
- Boot drive: Samsung 980 PRO NVME 1TB (262 GiB used [29%]; 653 GiB free [71%])
- Latest kernel: 6.12.10-arch1-1
- Packages: 942
- Desktop Environment: Xfce 4.20
- Window Manager: Xfce
- Mouse: Razer Viper 8k
- Keyboard: Wooting 60HE+
- Network Connection: Ethernet
- Bluetooth adapters: the 2 native bands from the motherboard
Arch Linux is my go-to operating system for everyday usage, and running local servers, which include web browsing, video playback, and Software Development and tooling. Main programs I use include:
- VS Code
- Discord
- Steam
- Firefox
- Tor
- FreeTube
- Spotify (adblock)
- psensor
- Thunar
- Git
- swift-bin
- pulseaudio
I made an account here to get help resolving this problem because the alternative is looking to use a different operating system and completely abandon Arch. Running Windows 10 on the same system on a SATA drive has had no problems remotely similar to what I've experienced the last few months on Arch, and I upgraded to Windows 10 back in 2015!
Last edited by RandomHashTags (2025-02-04 15:24:43)
Offline
When was the last time you vacuumed the dust out of the heat sink on your cpu?
Online
There's not much here that can be used to help you. 90 degree cpu temps sounds like a potential issue and the HW will turn off if you reach a thermal threshhold but that generally sounds more like a HW issue, potentially your PSU/CPU on it's way out or so. Did you check/clean out the system of dust or other potential contributors that would make this more of a physical problem?
"has had on Windows" are you actually regularly using Windows still or are you refering to "how it used to be" a few years ago?
I'd say test the LTS kernel to see whether it's kernel related or at least check top and /proc/interrupts whether some process/HW interrupt thrashes your CPU/memory.
Also for anyone to be able to help you, get us a journal of the failing boot, or if it's really a SNAFU journal of the current boot would help as well e.g.
sudo journalctl -b | curl -F 'file=@-' 0x0.st #Current boot, add -1 to the -b for previous if that was the crashing oneOffline
Last time I cleaned the system's hardware, which was barely any dust because the case I have has many dust filtering layers, was last month (December 2024). The system was assembled in February 2024 (with a new NVME drive).
I am still actively running Windows 10 (I dual boot), mainly gaming with medium graphics (CS2, Marvel Rivals, Terraria, Minecraft), with no issues. I'll test what you suggested and get back to you.
I also use htop and btop to monitor processes with no luck in identifying a culprit.
Last edited by RandomHashTags (2025-01-21 08:59:44)
Offline
Heat shouldn't be a real issue, modern Hardware throttles itself (like laptops will always throttle whan >1 cores are under load), and your 7800x3d should fall under this category. Also, these temps seem to be normal: https://linustechtips.com/topic/1530931 … rformance/. My 5900X also idles at ~55C (I have a very big belly on the CPU fan curve tho). Ain't intel ![]()
Do you have PBO enabled?
For the poweroff/crash: Which + age of the PSU? Does it still work/Boot?
A journal would be very useful, see #3.
am still actively running Windows 10 (I dual boot), mainly gaming with medium graphics (CS2, Marvel Rivals, Terraria, Minecraft), with no issues. I'll test what you suggested and get back to you.
Can you get the temps when running there? task manager doesn't show it, so you'll need your Mobo's software stack or HWinfo64
Last edited by jl2 (2025-01-21 09:01:12)
Why I run Arch? To "BTW I run Arch" the guy one grade younger.
And to let my siblings and cousins laugh at Arsch Linux...
Offline
There's not much here that can be used to help you. 90 degree cpu temps sounds like a potential issue and the HW will turn off if you reach a thermal threshhold but that generally sounds more like a HW issue, potentially your PSU/CPU on it's way out or so. Did you check/clean out the system of dust or other potential contributors that would make this more of a physical problem?
"has had on Windows" are you actually regularly using Windows still or are you refering to "how it used to be" a few years ago?
I'd say test the LTS kernel to see whether it's kernel related or at least check top and /proc/interrupts whether some process/HW interrupt thrashes your CPU/memory.
Also for anyone to be able to help you, get us a journal of the failing boot, or if it's really a SNAFU journal of the current boot would help as well e.g.
sudo journalctl -b | curl -F 'file=@-' 0x0.st #Current boot, add -1 to the -b for previous if that was the crashing one
The journal output can be found at: https://0x0.st/8H3n.txt , where the system turned off on Jan 21, 2025 at 00:26:18. I don't think it crashed, just forced a shutdown.
The PSU (MPG A850G PCIE5) was brand new and purchased at the same time as the other parts (February 2024). It has been working without any problems.
I'll record some BIOS settings and system temps in Windows, and update this post with them.
EDIT:
BIOS Version: F2
PBO: Disabled
Overclocking: Disabled
XMP/EXPO: Disabled
Re-Size BAR Support: Enabled
SR-IOV Support: Disabled
Fast Boot: Disabled
CSM Support: Enabled
Onboard LAN Controller: Enabled
PSS Support: Enabled
NX Mode: Enabled
(Super IO Configuration) Serial Port 1: Enabled
Legacy USB Support: Enabled
XHCI Hand-off: Enabled
USB Mass Storage Driver Support: Enabled
UEFI Network Stack: Disabled
A local CS2 deathmatch game against bots, on Windows 10, reached a peak CPU temperature of 66 degrees Celsius, and idles at 34.
Last edited by RandomHashTags (2025-01-21 10:24:19)
Offline
The spotify stack crashes are weird, something of note is that your UEFI version is from 2023 check whether there's an update here and try applying an UEFI update. Chances are this is up to some microcode or so which might be lacking on linux. Do you apply any overclocks or similar? See also e.g. https://wiki.archlinux.org/title/Ryzen# … nd_suspend and relevantly the note that newer UEFI could fix this without disabling idle states.
Offline
The spotify stack crashes are weird, something of note is that your UEFI version is from 2023 check whether there's an update here and try applying an UEFI update. Chances are this is up to some microcode or so which might be lacking on linux. Do you apply any overclocks or similar? See also e.g. https://wiki.archlinux.org/title/Ryzen# … nd_suspend and relevantly the note that newer UEFI could fix this without disabling idle states.
Spotify coredumps a lot. My guess it has something to do with the adblock (https://aur.archlinux.org/packages/spotify-adblock) but who really knows. I wish I could disable coredumps or system file access for the app but I haven't gotten around to it (if it is even possible).
I do not apply any overlocks or other system modifications. I enabled CSM and Re-Size bar, tweaked some fan curves and disabled some RGB. That's about it, as everything else was left as its default setting.
I'll update the BIOS and see what happens in the coming days, as this problem only happens on the first boot of the day.
(PS: the CPU cooler for the system is from Cooler Master, the Master Liquid ML240L V2 ARGB)
Last edited by RandomHashTags (2025-01-21 11:14:45)
Offline
is there a reason you run modern OSs in CSM? last time I had to use csm was back with win7 because the famous Win7 loader only worked in legacy mode and sites like [redacted] providing free activation servers weren't a thing back then
fun fact: both vista and 7 (NT6.x) were uefi ready - but I don't know anyone who used it actively
with win10 and a modern linux I don't see a reason for csm anymore
Offline
is there a reason you run modern OSs in CSM? last time I had to use csm was back with win7 because the famous Win7 loader only worked in legacy mode and sites like [redacted] providing free activation servers weren't a thing back then
fun fact: both vista and 7 (NT6.x) were uefi ready - but I don't know anyone who used it actively
with win10 and a modern linux I don't see a reason for csm anymore
Without CSM the BIOS was unable to see the SATA drives connected. Which meant I couldn't boot into Windows 10 or use my other SATA drives (but I could see them using Thunar in Arch).
Last edited by RandomHashTags (2025-01-21 21:44:18)
Offline
I am still actively running Windows 10 (I dual boot), mainly gaming with medium graphics (CS2, Marvel Rivals, Terraria, Minecraft), with no issues. I'll test what you suggested and get back to you.
3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
__stack_chk_fail doesn't look like it's just "spotify is clicked together by posers" - spotify is stack smashing, might be a bug, might indicate a CPU issue.
The CPU at 90°C (not weird-ass °F?) is certainly hot for no expected load: that has to come from somewhere =>
topIf nothing seems to charge the CPU and it heats up quickly and then the system powers down and it's neither windows not dust: the cooler might have lost contact to the die?
Or does windows run cool and stable?
Are the fans running? Does it help if you set them more aggressive to keep the system cool? ("tweaked some fan curves")
Also try to disable rebar.
Offline
I am still actively running Windows 10 (I dual boot), mainly gaming with medium graphics (CS2, Marvel Rivals, Terraria, Minecraft), with no issues. I'll test what you suggested and get back to you.
3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.__stack_chk_fail doesn't look like it's just "spotify is clicked together by posers" - spotify is stack smashing, might be a bug, might indicate a CPU issue.
The CPU at 90°C (not weird-ass °F?) is certainly hot for no expected load: that has to come from somewhere =>
topIf nothing seems to charge the CPU and it heats up quickly and then the system powers down and it's neither windows not dust: the cooler might have lost contact to the die?
Or does windows run cool and stable?
Are the fans running? Does it help if you set them more aggressive to keep the system cool? ("tweaked some fan curves")Also try to disable rebar.
Windows fast startup was disabled day 1. I just checked it right now and it is still disabled.
90 degrees Celsius is correct (not Fahrenheit). top, htop and btop don't show anything causing CPU load. Windows runs cool and stable even playing an intensive game with settings cranked (CS2, Marvel Rivals).
My fans (currently have 6, 3 for in and 3 for out) are always on. They're almost silent at idle and are set to increment exponentially until it reaches 100%, which is if any sensor reads >= 60°C.
I'll disable rebar if the other solutions mentioned don't do the trick.
Last edited by RandomHashTags (2025-01-21 23:12:12)
Offline
Without CSM the BIOS was unable to see the SATA drives connected. Which meant I couldn't boot into Windows 10 or use my other SATA drives (but I could see them using Thunar in Arch).
without any means to sound negative or even agressive by intention - but I HIGHLY doubt that and to me sounds way more like user error
for a bios/uefi to see connected SATA devices does NOT matter weather its set to regular uefi mode or have CSM/legacy mode enabled - a connected sata device will always show up and will always be accessible (the fact you were able to see them in thunar proves this point)
what you likely refer to is about booting - for which there actually is a difference - and not being able to boot an old windows install created on a system with CSM enabled when a new system is set to UEFI is an issue of the OS install - and hints that the old system you did the install was in CSM when you used it to create the current windows install - and it hints you just dragged the drive along - and unless you want to tinker around to make it uefi bootable (which is in fact possible) you should consider wipe it and do a clean new install in proper UEFI mode anyways
I could help with converting this old windows install from CSM to UEFI - but mods maybe don't do kindly to windows support here on the arch forums
as for the initial question: although both ryzen cpus as well as rdna gpus are designed to run at much higher temps when idle (my 5600 sits at 37C and my 7700xt at 47C) a system that jumps to nearly boiling point of water right from the boot sounds like some very severe issue
as you use a liquid cooling:
1) have you connected it correctly to your motherboard? according to the manual the pump has to be connected to the CPU_OPT header while the regular CPU_FAN is for fans only!
2) do you have https://archlinux.org/packages/extra/any/liquidctl/ installed and enabled? could be your pump failed - or due to some alignment maybe an air bubble got trapped in it
3) sanity check: do you feel any warm/hot air from the radiator exhaust at all? remember: the sole pupose of a liquid cooling system is the very same as in a car: the liquid is only a transport medium - but the actual heat exchange happens in the radiator - so if the cpu heats up but the radiator stays cool there's an obvious problem with the acutal liquid transfer - which can be a dead pump - hence check with liquidctl or in uefi
Offline
RandomHashTags wrote:Without CSM the BIOS was unable to see the SATA drives connected. Which meant I couldn't boot into Windows 10 or use my other SATA drives (but I could see them using Thunar in Arch).
without any means to sound negative or even agressive by intention - but I HIGHLY doubt that and to me sounds way more like user error
for a bios/uefi to see connected SATA devices does NOT matter weather its set to regular uefi mode or have CSM/legacy mode enabled - a connected sata device will always show up and will always be accessible (the fact you were able to see them in thunar proves this point)
what you likely refer to is about booting - for which there actually is a difference - and not being able to boot an old windows install created on a system with CSM enabled when a new system is set to UEFI is an issue of the OS install - and hints that the old system you did the install was in CSM when you used it to create the current windows install - and it hints you just dragged the drive along - and unless you want to tinker around to make it uefi bootable (which is in fact possible) you should consider wipe it and do a clean new install in proper UEFI mode anyways
I could help with converting this old windows install from CSM to UEFI - but mods maybe don't do kindly to windows support here on the arch forums
as for the initial question: although both ryzen cpus as well as rdna gpus are designed to run at much higher temps when idle (my 5600 sits at 37C and my 7700xt at 47C) a system that jumps to nearly boiling point of water right from the boot sounds like some very severe issue
as you use a liquid cooling:
1) have you connected it correctly to your motherboard? according to the manual the pump has to be connected to the CPU_OPT header while the regular CPU_FAN is for fans only!
2) do you have https://archlinux.org/packages/extra/any/liquidctl/ installed and enabled? could be your pump failed - or due to some alignment maybe an air bubble got trapped in it
3) sanity check: do you feel any warm/hot air from the radiator exhaust at all? remember: the sole pupose of a liquid cooling system is the very same as in a car: the liquid is only a transport medium - but the actual heat exchange happens in the radiator - so if the cpu heats up but the radiator stays cool there's an obvious problem with the acutal liquid transfer - which can be a dead pump - hence check with liquidctl or in uefi
The Windows install was originally on 8 and was upgraded to 8.1, than 10. It has a lot of personal and meaningful stuff on it. I'll try to convert it to UEFI, but I'll have to do a backup and more research about it first.
I did correctly install the CPU cooler to the motherboard. It is using the CPU_OPT header. Without the cooler it would overheat and shut itself off in less than a minute when in the BIOS. With the cooler it idles at < 30 C in the BIOS.
I do not have liquidctl. I didn't see the cooler supported on their GitHub so I never installed it.
I remember hot air being pushed out the system every time I checked when the CPU is under load (through the back and up top; the radiator is installed at the top).
Last edited by RandomHashTags (2025-01-22 00:46:07)
Offline
I would do this:
1, Boot the system with archlinux-x86_64.iso
2, Run a simple CPU benchmark test e.g:
iso# pacman -S 7zip
iso# pacman -S lm_sensors
iso# for i in $(seq 50);do /bin/7z b; done &
iso# /bin/watch sensors3, Repaste thermal paste CPU socket.
4, Rerun CPU benchmark.
Last edited by solskog (2025-01-22 01:09:40)
Offline
I would do this:
1, Boot the system with archlinux-x86_64.iso
2, Run a simple CPU benchmark test e.g:iso# pacman -S 7zip iso# pacman -S lm_sensors iso# for i in $(seq 50);do /bin/7z b; done & iso# /bin/watch sensors3, Repaste thermal paste CPU socket.
4, Rerun CPU benchmark.
I do not believe it is a cooler/CPU/thermal paste problem. I have not seen any sign of the problem since the thermal shutdown or upgrading the BIOS. I saw some new journal logs, that I never seen before, about "ACPI: \_SB_.PCI0.GPXX.XXXX: New power resource". I'm not sure if the thermal shutdown fixed/broke something in the PSU or maybe something from the new BIOS, but I do remember having some power outages in my area while I was using the system the past year (infrequent; usually due to bad weather). I am using a surge protector. I'll know more in a few days.
Offline
I am using a surge protector.
Good idea! A surge protector with a small battery backup would be even better. In case power is interrupted or fluctuates outside safe levels for a few seconds, which happenes quite often in our circurit.
Last edited by solskog (2025-01-25 21:25:08)
Offline
After two weeks of normal usage, after only updating the BIOS (leaving all settings the same; now running 6.13.1-arch1-1), I have not seen any similar problem arise. It seems my Discord installation got corrupted today, but I think its unrelated. Marking as solved.
Last edited by RandomHashTags (2025-02-05 20:24:05)
Offline
Discord corruption is a known interaction issue with glibc/certain electron versions, fixed in the canary channel: https://archlinux.org/news/glibc-241-co … tallation/
Offline