You are not logged in.
Pages: 1
Hi, can please help me with nvidia-dkms installation issue.
I'm trying to put a new linux-zen kernel. And the driver is nvidia-dkms-555.58.02-1. When the kernel compilation happens, I get, “WARNING: `dkms install --no-depmod nvidia/555.58.02 -k 6.9.7-zen1-1-1-zen' exited 10`.”
In the logs:
./include/linux/thread_info.h:128:1: internal compiler error: Segmentation fault
In the file included from ./include/linux/linux/ktime.h:25,
from ./include/linux/timer.h:6,
from ./include/linux/workqueue.h:9,
from ./include/linux/srcu.h:21,
from ./include/linux/notifier.h:16,
from ./arch/x86/include/asm/uprobes.h:13,
from ./include/linux/uprobes.h:49,
from ./include/linux/mm_types.h:16,
from ./include/linux/mmzone.h:22,
from ./include/linux/linux/gfp.h:7,
from ./include/linux/umh.h:4,
from ./include/linux/kmod.h:9,
from ./include/linux/module.h:17,
from /var/lib/dkms/dkms/nvidia/555.58.02/build/nvidia/nv.c:24:
./include/linux/jiffies.h:583:1: internal compiler error: Illegal instruction
I tried to solve this problem with the `downgrade` utility, but any nvidia-dkms-xxx driver with any kernel version gets this error, not even necessarily zen, I tried both regular and lts driver. I also tried changing gcc version, linux-xxx-headers.
Last edited by replikeit (2024-07-25 17:24:03)
Offline
It's not the DKMS, that's a gcc bug.
https://bbs.archlinux.org/viewtopic.php … 2#p2173152
Typically temperature or a CPU bug, try to manually run
dkms install nvidia/555.58.02 -k 6.9.7-zen1-1-1-zen
and if that fails again
dkms install -j $(($(nproc)/2)) nvidia/555.58.02 -k 6.9.7-zen1-1-1-zen
(this will only use haf your cores)
Offline
Tried even with one kernel
sudo dkms install -j 1 nvidia/555.58.02 -k 6.10.1-zen1-1-zen
In file included from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-firmware.h:30,
from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv.h:43,
from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-linux.h:28,
from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps.c:24:
/var/lib/dkms/nvidia/555.58.02/build/common/inc/nvmisc.h:1: internal compiler error: Segmentation fault
1 | /*
0x1fab306 internal_error(char const*, ...)
???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues> for instructions.
make[3]: *** [scripts/Makefile.build:244: /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps.o] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from ./arch/x86/include/asm/nospec-branch.h:12,
from ./arch/x86/include/asm/irqflags.h:9,
from ./include/linux/irqflags.h:18,
from ./include/linux/spinlock.h:59,
from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-lock.h:29,
from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-linux.h:32,
from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps-imex.c:24:
./arch/x86/include/asm/msr-index.h:6: internal compiler error: Segmentation fault
6 |
0x1fab306 internal_error(char const*, ...)
???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues> for instructions.
make[3]: *** [scripts/Makefile.build:244: /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps-imex.o] Error 1
malloc(): invalid size (unsorted)
malloc(): invalid size (unsorted)
In file included from ./include/linux/atomic.h:82,
from ./include/linux/cpumask.h:14,
from ./arch/x86/include/asm/paravirt.h:21,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/timex.h:5,
from ./include/linux/timex.h:67,
from ./include/linux/time32.h:13,
from ./include/linux/time.h:60,
from ./include/linux/stat.h:19,
from ./include/linux/module.h:13,
from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-pci-table.c:25:
./include/linux/atomic/atomic-instrumented.h:4431:1: internal compiler error: Segmentation fault
4431 | atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
| ^~~~~~~~~~~~~~~~~~~~~~~
Hase the same.
Last edited by replikeit (2024-07-25 19:08:42)
Offline
Are you running OOM? Do you have physical swap (file or partition)?
Offline
OOM.
systemctl status systemd-oomd.service
○ systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer
Loaded: loaded (/usr/lib/systemd/system/systemd-oomd.service; disabled; pr>
Active: inactive (dead)
TriggeredBy: ○ systemd-oomd.socket
Docs: man:systemd-oomd.service(8)
man:org.freedesktop.oom1(5)
Swap
Device Boot Start End Sectors Size Id Type
/dev/nvme0n1p1 2048 1955839 1953792 954M ef EFI (FAT-12/16/32)
/dev/nvme0n1p2 1955840 138674175 136718336 65.2G 82 Linux swap / Solari
/dev/nvme0n1p3 138674176 3907029167 3768354992 1.8T 83 Linux
Last edited by replikeit (2024-07-25 19:23:44)
Offline
systemd-oomd.service socket activated, "systemctl status systemd-oomd.socket" will tell whether it would ever fire.
But w/ 65G swap (maybe "swapon" to ensure it's actually active) you're also not likely running OOM compiling the nvidia driver (though the authorative test would be to monitor RAM usage during the compilation)
The segfaults seem to shift, so it's not some specific token that triggers this.
The cpu/system temerature is ok, microcode loaded and memtest86+ unsuspicious?
type cc
type gcc
Offline
My configuration btw:
+ OS: Arch Linux x86_64
# Hostname: arch
### Kernel Release: 6.8.8-zen1-1-zen (this because I didn't reboote PC, after dkms error)
##### Uptime: 3:39
###### WM: None
; #####; DE: GNOME
+##.##### Packages: 1265
+########## RAM: 8117 MB / 64074 MB
#############; Processor Type: Intel(R) Core(TM) i9-14900KF
###############+ $EDITOR: None
####### ####### Root: 500G / 1.8T (27%) (btrfs)
.######; ;###;`".
.#######; ;#####.
#########. .########`
######' '######
;#### ####;
##' '##
#' `#
type cc
type gcc
cc is /usr/bin/cc
gcc is /usr/bin/gcc
>The cpu/system temerature is ok, microcode loaded and memtest86+ unsuspicious?
1. temp is ok, sensors output when dkms building
Core 0: +86.0°C (high = +80.0°C, crit = +100.0°C)
Core 4: +54.0°C (high = +80.0°C, crit = +100.0°C)
Core 8: +62.0°C (high = +80.0°C, crit = +100.0°C)
Core 12: +67.0°C (high = +80.0°C, crit = +100.0°C)
Core 16: +56.0°C (high = +80.0°C, crit = +100.0°C)
Core 20: +91.0°C (high = +80.0°C, crit = +100.0°C)
Core 24: +54.0°C (high = +80.0°C, crit = +100.0°C)
Core 28: +55.0°C (high = +80.0°C, crit = +100.0°C)
Core 32: +60.0°C (high = +80.0°C, crit = +100.0°C)
Core 33: +60.0°C (high = +80.0°C, crit = +100.0°C)
Core 34: +60.0°C (high = +80.0°C, crit = +100.0°C)
Core 35: +60.0°C (high = +80.0°C, crit = +100.0°C)
Core 36: +61.0°C (high = +80.0°C, crit = +100.0°C)
Core 37: +61.0°C (high = +80.0°C, crit = +100.0°C)
Core 38: +61.0°C (high = +80.0°C, crit = +100.0°C)
Core 39: +61.0°C (high = +80.0°C, crit = +100.0°C)
Core 40: +55.0°C (high = +80.0°C, crit = +100.0°C)
Core 41: +55.0°C (high = +80.0°C, crit = +100.0°C)
Core 42: +55.0°C (high = +80.0°C, crit = +100.0°C)
Core 43: +55.0°C (high = +80.0°C, crit = +100.0°C)
Core 44: +53.0°C (high = +80.0°C, crit = +100.0°C)
Core 45: +53.0°C (high = +80.0°C, crit = +100.0°C)
Core 46: +53.0°C (high = +80.0°C, crit = +100.0°C)
Core 47: +53.0°C (high = +80.0°C, crit = +100.0°C)
2. microcode
sudo lsinitcpio --early /boot/initramfs-linux-zen.img
early_cpio
kernel/
kernel/x86/
kernel/x86/microcode/
kernel/x86/microcode/GenuineIntel.bin
3. memtest will try, thanks
Offline
Offline
I'm using arch linux around half of year, and every update was okay before. This can be because of intel?
Offline
The problem manifests under pressure - usage patterns might be a factor.
GCC was updated 3 days ago, so that might play a role, but for a systematic gcc bug, you'd expect reports from users all over the place and of course the package maintainers.
Also you've posted 4 segfaults in 4 different locations, ie. right now it at least looks completely non-deterministic.
And you've a suspicious CPU.
Idk whether that's it, but would suggest to look there.
Offline
Memtest86+ 0 erros
Offline
Also tried with another memory, still the same...
Offline
The compiler errors still happen at random locations?
Did you check and in doubt adjust your BIOS settings according to the radgametools link?
Offline
Hey, I'm experiencing the same issue. I have a ryzen 7 3700x. I'm not entirely sure this is a CPU issue.
Edit: I think this is a different issue with the same effect. I'm pretty sure its a problem with my headers
Last edited by EnderMaster08 (2024-07-26 14:05:38)
Offline
Does the build end with a "internal compiler error: Segmentation fault"?
That's not a problem with any headers, but a compiler bug.
If this happens on multiple, different systems, there might actually be a genuine bug in gcc
Offline
Pages: 1