You are not logged in.
Pages: 1
Hi, can please help me with nvidia-dkms installation issue.
I'm trying to put a new linux-zen kernel. And the driver is nvidia-dkms-555.58.02-1. When the kernel compilation happens, I get, “WARNING: `dkms install --no-depmod nvidia/555.58.02 -k 6.9.7-zen1-1-1-zen' exited 10`.” 
In the logs:
./include/linux/thread_info.h:128:1: internal compiler error: Segmentation fault
In the file included from ./include/linux/linux/ktime.h:25,
                 from ./include/linux/timer.h:6,
                 from ./include/linux/workqueue.h:9,
                 from ./include/linux/srcu.h:21,
                 from ./include/linux/notifier.h:16,
                 from ./arch/x86/include/asm/uprobes.h:13,
                 from ./include/linux/uprobes.h:49,
                 from ./include/linux/mm_types.h:16,
                 from ./include/linux/mmzone.h:22,
                 from ./include/linux/linux/gfp.h:7,
                 from ./include/linux/umh.h:4,
                 from ./include/linux/kmod.h:9,
                 from ./include/linux/module.h:17,
                 from /var/lib/dkms/dkms/nvidia/555.58.02/build/nvidia/nv.c:24:
./include/linux/jiffies.h:583:1: internal compiler error: Illegal instructionI tried to solve this problem with the `downgrade` utility, but any nvidia-dkms-xxx driver with any kernel version gets this error, not even necessarily zen, I tried both regular and lts driver. I also tried changing gcc version, linux-xxx-headers.
Last edited by replikeit (2024-07-25 17:24:03)
Offline

It's not the DKMS, that's a gcc bug.
https://bbs.archlinux.org/viewtopic.php … 2#p2173152
Typically temperature or a CPU bug, try to manually run
dkms install nvidia/555.58.02 -k 6.9.7-zen1-1-1-zenand if that fails again
dkms install -j  $(($(nproc)/2)) nvidia/555.58.02 -k 6.9.7-zen1-1-1-zen(this will only use haf your cores)
Offline
Tried even with one kernel
 sudo dkms install -j 1 nvidia/555.58.02 -k 6.10.1-zen1-1-zen In file included from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-firmware.h:30,
                 from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv.h:43,
                 from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-linux.h:28,
                 from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps.c:24:
/var/lib/dkms/nvidia/555.58.02/build/common/inc/nvmisc.h:1: internal compiler error: Segmentation fault
    1 | /*
0x1fab306 internal_error(char const*, ...)
	???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues> for instructions.
make[3]: *** [scripts/Makefile.build:244: /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps.o] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from ./arch/x86/include/asm/nospec-branch.h:12,
                 from ./arch/x86/include/asm/irqflags.h:9,
                 from ./include/linux/irqflags.h:18,
                 from ./include/linux/spinlock.h:59,
                 from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-lock.h:29,
                 from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-linux.h:32,
                 from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps-imex.c:24:
./arch/x86/include/asm/msr-index.h:6: internal compiler error: Segmentation fault
    6 | 
0x1fab306 internal_error(char const*, ...)
	???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues> for instructions.
make[3]: *** [scripts/Makefile.build:244: /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps-imex.o] Error 1
malloc(): invalid size (unsorted)
malloc(): invalid size (unsorted)
In file included from ./include/linux/atomic.h:82,
                 from ./include/linux/cpumask.h:14,
                 from ./arch/x86/include/asm/paravirt.h:21,
                 from ./arch/x86/include/asm/cpuid.h:62,
                 from ./arch/x86/include/asm/processor.h:19,
                 from ./arch/x86/include/asm/timex.h:5,
                 from ./include/linux/timex.h:67,
                 from ./include/linux/time32.h:13,
                 from ./include/linux/time.h:60,
                 from ./include/linux/stat.h:19,
                 from ./include/linux/module.h:13,
                 from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-pci-table.c:25:
./include/linux/atomic/atomic-instrumented.h:4431:1: internal compiler error: Segmentation fault
 4431 | atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
      | ^~~~~~~~~~~~~~~~~~~~~~~Hase the same.
Last edited by replikeit (2024-07-25 19:08:42)
Offline

Are you running OOM? Do you have physical swap (file or partition)?
Offline
OOM.
 systemctl status systemd-oomd.service  ○ systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer
     Loaded: loaded (/usr/lib/systemd/system/systemd-oomd.service; disabled; pr>
     Active: inactive (dead)
TriggeredBy: ○ systemd-oomd.socket
       Docs: man:systemd-oomd.service(8)
             man:org.freedesktop.oom1(5)Swap
Device         Boot     Start        End    Sectors  Size Id Type
/dev/nvme0n1p1           2048    1955839    1953792  954M ef EFI (FAT-12/16/32)
/dev/nvme0n1p2        1955840  138674175  136718336 65.2G 82 Linux swap / Solari
/dev/nvme0n1p3      138674176 3907029167 3768354992  1.8T 83 LinuxLast edited by replikeit (2024-07-25 19:23:44)
Offline

systemd-oomd.service socket activated, "systemctl status systemd-oomd.socket" will tell whether it would ever fire.
But w/ 65G swap (maybe "swapon" to ensure it's actually active) you're also not likely running OOM compiling the nvidia driver (though the authorative test would be to monitor RAM usage during the compilation)
The segfaults seem to shift, so it's not some specific token that triggers this.
The cpu/system temerature is ok, microcode loaded and memtest86+ unsuspicious?
type cc
type gccOffline
My configuration btw:
               +                OS: Arch Linux x86_64
               #                Hostname: arch
              ###               Kernel Release: 6.8.8-zen1-1-zen (this because I didn't reboote PC, after dkms error)
             #####              Uptime: 3:39
             ######             WM: None
            ; #####;            DE: GNOME
           +##.#####            Packages: 1265
          +##########           RAM: 8117 MB / 64074 MB
         #############;         Processor Type: Intel(R) Core(TM) i9-14900KF
        ###############+        $EDITOR: None
       #######   #######        Root: 500G / 1.8T (27%) (btrfs)
     .######;     ;###;`".      
    .#######;     ;#####.       
    #########.   .########`     
   ######'           '######    
  ;####                 ####;   
  ##'                     '##   
 #'                         `#  type cc                                                                                                                                                                                                              
type gcccc is /usr/bin/cc
gcc is /usr/bin/gcc>The cpu/system temerature is ok, microcode loaded and memtest86+ unsuspicious?
1. temp is ok, sensors output when dkms building 
Core 0:        +86.0°C  (high = +80.0°C, crit = +100.0°C)
Core 4:        +54.0°C  (high = +80.0°C, crit = +100.0°C)
Core 8:        +62.0°C  (high = +80.0°C, crit = +100.0°C)
Core 12:       +67.0°C  (high = +80.0°C, crit = +100.0°C)
Core 16:       +56.0°C  (high = +80.0°C, crit = +100.0°C)
Core 20:       +91.0°C  (high = +80.0°C, crit = +100.0°C)
Core 24:       +54.0°C  (high = +80.0°C, crit = +100.0°C)
Core 28:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 32:       +60.0°C  (high = +80.0°C, crit = +100.0°C)
Core 33:       +60.0°C  (high = +80.0°C, crit = +100.0°C)
Core 34:       +60.0°C  (high = +80.0°C, crit = +100.0°C)
Core 35:       +60.0°C  (high = +80.0°C, crit = +100.0°C)
Core 36:       +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 37:       +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 38:       +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 39:       +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 40:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 41:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 42:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 43:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 44:       +53.0°C  (high = +80.0°C, crit = +100.0°C)
Core 45:       +53.0°C  (high = +80.0°C, crit = +100.0°C)
Core 46:       +53.0°C  (high = +80.0°C, crit = +100.0°C)
Core 47:       +53.0°C  (high = +80.0°C, crit = +100.0°C)2. microcode
sudo lsinitcpio --early /boot/initramfs-linux-zen.img early_cpio
kernel/
kernel/x86/
kernel/x86/microcode/
kernel/x86/microcode/GenuineIntel.bin3. memtest will try, thanks
Offline

Offline
I'm using arch linux around half of year, and every update was okay before. This can be because of intel?
Offline

The problem manifests under pressure - usage patterns might be a factor.
GCC was updated 3 days ago, so that might play a role, but for a systematic gcc bug, you'd expect reports from users all over the place and of course the package maintainers.
Also you've posted 4 segfaults in 4 different locations, ie. right now it at least looks completely non-deterministic.
And you've a suspicious CPU.
Idk whether that's it, but would suggest to look there.
Offline
Memtest86+ 0 erros 
Offline
Also tried with another memory, still the same...
Offline

The compiler errors still happen at random locations?
Did you check and in doubt adjust your BIOS settings according to the radgametools link?
Offline
Hey, I'm experiencing the same issue. I have a ryzen 7 3700x. I'm not entirely sure this is a CPU issue.
Edit: I think this is a different issue with the same effect. I'm pretty sure its a problem with my headers
Last edited by EnderMaster08 (2024-07-26 14:05:38)
Offline

Does the build end with a "internal compiler error: Segmentation fault"?
That's not a problem with any headers, but a compiler bug.
If this happens on multiple, different systems, there might actually be a genuine bug in gcc
Offline
Pages: 1