You are not logged in.
There are multiple other posts that I have seen already related to this.
But having looked through some of them, there seem to be no fixes so far.
I never saw any posts that looked similar to my hardware either, but I may have overlooked some posts.
My system randomly reboots. There are no errors in journalctl.
My system abruptly restarts. There is no freezing, no errors, nothing.
It doesn't seem to matter what I'm doing, but strangely enough, it doesn't seem to happen when my system is idling.
I wish I could say.
I am unfortunately sure, as at first I thought I was having crashes due to a beta BIOS update. But after having updated my BIOS to the latest stable version on MSI's website, I know that (probably) wasn't the problem.
I want to say that it started since kernel 6.10, but once again, I'm unsure of this.
CPU: AMD Ryzen 9 7900X
RAM: G.SKILL Trident Z5 Neo 64GB
MOBO: MSI MPG X670E Carbon Gaming WiFi
SSD #1: Samsung 980 Pro 2TB
SSD #2: Samsung 980 Pro 2TB
HDD: Western Digital BLACK 8TB - 7200 RPM
GPU: ASRock 7900XTX Taichi 24GB OC
PSU: Seasonic Prime PX-1600, 1600W 80+ Platinum (Insanely overkill, I know. I have my reasons.)
Cooler: Corsair iCUE H150i ELITE CAPELLIX XT (360 AIO)
Case: Lian Li O11D.
I have a total of 7 system fans in this case (including CPU AIO).
4 fans are intakes, and 3 are exhaust through my CPU's AIO radiator.
My temps do not ever get to crazy high levels.
My CPU is not overclocked. PBO is set to auto in my BIOS.
I have tried turning off Memory-context restore, and Power-down Enabled.
My RAM is using EXPO profile 1, and is running at 6000MHz. Since day 1 of building this rig, I have not had any problems with my RAM, and I seriously doubt that it's the problem.
But I'm happy to run any tests deemed necessary. My MOBO has Memtest86 built into it.
I seem to extremely doubt power problems being my issue. This system has been extremely stable until recently.
I have tried setting "systemd.log_level=debug" in my grub config, but I still wasn't getting any valuable information.
At this point, I'm not sure how to troubleshoot this issue. I was considering running an LTS kernel, but I saw this, and it seems that it may be pointless trying.
Any help would be greatly appreciated!
Edit:
I'm including this in the top post so that hopefully anyone else with this same problem can try fix their issue as well.
The solution for me was to downgrade to kernel 6.10.2. Don't forget to downgrade your headers as well.
Last edited by Caelence (2024-08-21 20:31:52)
Offline
Offline
I am trying that right now. Currently compiling the kernel with the patch. Will post an update as soon as possible.
Is it know if this patch will be applied to the kernel in the future?
Offline
SInce it sidesteps the issue by limiting the offending call to architechtures that actually support it, I'd be surprised if it doesn't make it into the kernel and Alex D. has already closed the bug, so I guess it's on its way.
Offline
After downloading the kernel source, using "git checkout 6.10.3.arch1-2," and making sure that the patch is in the source folder, I get this error at the end of building the kernel:
make[4]: *** [scripts/Makefile.build:485: drivers/gpu/drm] Error 2
make[3]: *** [scripts/Makefile.build:485: drivers/gpu] Error 2
make[2]: *** [scripts/Makefile.build:485: drivers] Error 2
make[1]: *** [/home/username/build/linux/src/linux-6.10.3/Makefile:1934: .] Error 2
make: *** [Makefile:240: __sub-make] Error 2Looking at the start of the build, I see this:
Setting version...
Applying patch linux-v6.10.3-arch1.patch...
patching file Makefile
patching file arch/Kconfig
patching file arch/x86/include/asm/apic.h
patching file arch/x86/kernel/apic/apic_flat_64.c
patching file drivers/firmware/sysfb.c
patching file include/linux/user_namespace.h
patching file init/Kconfig
patching file kernel/fork.c
patching file kernel/sysctl.c
patching file kernel/user_namespace.c
patching file sound/pci/hda/hda_controller.h
patching file sound/pci/hda/hda_intel.c
Setting config...But no actual mention of the .patch file I included.
I'm not exactly sure what I am doing wrong here.
Any advice?
I've tried following this guide as best as I can. But I don't seem to have any success.
This is my first time building the Arch Linux kernel, so I hope you're able to bare with me. I am used to coding and compiling with C++, but compiling the Linux kernel is pretty new to me.
Offline
The error tail is way too short to say where the problem may be, but at least for a test to isolate the cause you don't have to compile anything and can rely on downgrading to 6.10.2 from the https://wiki.archlinux.org/title/Arch_Linux_Archive (remember to also downgrade OOT modules like nvidia, virtualbox etc.) or just try the precompiled kernel in that thread.
But no actual mention of the .patch file I included.
Included "how"? If you run makepkg, the kernel PKGBUILD scans the source directory for *.patch file and auto-applies them, but it has to be that suffix and you have to run makepkg
Offline
The error tail is way too short to say where the problem may be, but at least for a test to isolate the cause you don't have to compile anything and can rely on downgrading to 6.10.2 from the https://wiki.archlinux.org/title/Arch_Linux_Archive (remember to also downgrade OOT modules like nvidia, virtualbox etc.) or just try the precompiled kernel in that thread.
But no actual mention of the .patch file I included.
Included "how"? If you run makepkg, the kernel PKGBUILD scans the source directory for *.patch file and auto-applies them, but it has to be that suffix and you have to run makepkg
Unfortunately, that was pretty much the only 'errors' I was seeing in the output, and it certainly didn't provide any information that was useful.
I was testing this out some more, and decided to try compile 6.10.3.arch1-1 rather than 6.10.3.arch1-2. with the patch. This turned out to be successful.
So, I'm unsure what changed between arch1-1 and arch1-2, but I was unsuccessful in getting it to compile with the patch.
I had included the .patch file in /home/username/build/linux/ directory, which was stated over here.
So, I will change to the patched kernel now, and will report back if I have any more crashing.
Offline
There's been more output during that build, hasn't?
You're not running red-letter-arch, black lines matter, too ![]()
Offline
Yes, I have a full log of it here. <- To clarify, this was when trying to build kernel 6.10.3.arch1-2 with the patch.
Maybe I'm just missing something, but I'm not seeing any errors throughout the log, besides what is at the end of it.
Feel free to take a quick look through it if you'd like. Thought I'd post it in case anyone was curious.
Apologies if I missed anything or seem like a complete idiot.
Last edited by Caelence (2024-08-18 21:41:46)
Offline
Nope, there's no compiler trace or anything.
You'd have to build verbose to unconditionally see what's actually going on, but I guess the cause is
patching file Makefileand would check whether the patch does anything around line 1934 of that file Makefile…
Offline
I will definitely try a verbose build at some point to give a better log.
Unfortunately, it seems that my system is still crashing.
I'm not sure if maybe I have done something wrong, but "uname -r" and neofetch both showed that I was running the new kernel.
I guess the better question is: Was the kernel actually patched?
I honestly am not entirely sure. What I do know is that my 0001-drm-amdgpu-sdma5.2-limit-wptr-workaround-to-sdma-5.2.patch file was in ~/build/linux/.
So, it should have been included, right?
Not really sure what to do from here. Got no logs of a crash either.
The only thing that looks suspicious to me is this line (which is much further up in my log file):
amdgpu 0000:19:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic dataYes, I am running Arch Linux on my iGPU. I do not use 7900XTX in Arch (except rarely).
Offline
Try to downgrade to 6.10.2 - if that stabilizes the situation, try the kernel loqs kindly provided. If that's also stable, it's most likely your local build.
If not, it's a different issue and we got to look elsewhere.
There's https://bbs.archlinux.org/viewtopic.php?id=298632 but that doesn't seem to cause random reboots (what doesn't mean a lot - if your CPU has a hiccup, pretty much everything can happen)
Offline
I have similar setup, and have been experiencing the same exact issue...
CPU: Ryzen 9700X
RAM: Crucial DDR5 Pro 32GB x 2
MOB: ProArt B650
SSD: Samsung NVMe EVO 990 2TB
GPU: Radeon 7700XT
PSU: Tuf-Gaming-1000G
After disabling EXPO from BIOS settings, the issue seems to be resolved completely.
Have you ever already tried disabling EXPO?
Offline
I have similar setup, and have been experiencing the same exact issue...
CPU: Ryzen 9700X
RAM: Crucial DDR5 Pro 32GB x 2
MOB: ProArt B650
SSD: Samsung NVMe EVO 990 2TB
GPU: Radeon 7700XT
PSU: Tuf-Gaming-1000GAfter disabling EXPO from BIOS settings, the issue seems to be resolved completely.
Have you ever already tried disabling EXPO?
I personally haven't, as I wasn't experiencing this issue in the past on Linux with EXPO enabled.
I have been running kernel 6.10.2 for the past couple of days, and I haven't had a single crash so far.
I've been meaning to build kernel 6.10.3 with the patch, but just haven't had time yet, unfortunately.
I will try and build it in the coming days, and will report on if the patch works.
But so far, I may as well mark this resolved, as 6.10.2 does seem to be stable.
I guess I only have a few questions to follow up with: How would I know when to update my kernel? I'm just not sure when I should update, as I don't know when a bugfix would be implemented.
How could I watch for that?
Offline
Try to downgrade to 6.10.2 - if that stabilizes the situation, try the kernel loqs kindly provided. If that's also stable, it's most likely your local build.
If not, it's a different issue and we got to look elsewhere.There's https://bbs.archlinux.org/viewtopic.php?id=298632 but that doesn't seem to cause random reboots (what doesn't mean a lot - if your CPU has a hiccup, pretty much everything can happen)
I have tried the downgrade, and it has been running smoothly so far.
Will try to build the kernel again with better logs when I have some more time available. I'll report back here when I've done that, but I'll be marking as [SOLVED] for now.
EDIT:
I unfortunately just don't have to time to try any more kernel builds. If anyone else wants to do this, please feel free to post your progress here.
According to this, the bug will be patched in the stable kernel in the coming weeks, so I'm not too bothered anyways.
I'll just update my kernel in the coming weeks.
Thanks for the help, everyone. I appreciate it.
Last edited by Caelence (2024-08-25 21:54:55)
Offline