Nvidia-dkms installing

replikeit · 2024-07-25 17:23:29

Hi, can please help me with nvidia-dkms installation issue.
I'm trying to put a new linux-zen kernel. And the driver is nvidia-dkms-555.58.02-1. When the kernel compilation happens, I get, “WARNING: `dkms install --no-depmod nvidia/555.58.02 -k 6.9.7-zen1-1-1-zen' exited 10`.”

In the logs:

./include/linux/thread_info.h:128:1: internal compiler error: Segmentation fault
In the file included from ./include/linux/linux/ktime.h:25,
                 from ./include/linux/timer.h:6,
                 from ./include/linux/workqueue.h:9,
                 from ./include/linux/srcu.h:21,
                 from ./include/linux/notifier.h:16,
                 from ./arch/x86/include/asm/uprobes.h:13,
                 from ./include/linux/uprobes.h:49,
                 from ./include/linux/mm_types.h:16,
                 from ./include/linux/mmzone.h:22,
                 from ./include/linux/linux/gfp.h:7,
                 from ./include/linux/umh.h:4,
                 from ./include/linux/kmod.h:9,
                 from ./include/linux/module.h:17,
                 from /var/lib/dkms/dkms/nvidia/555.58.02/build/nvidia/nv.c:24:
./include/linux/jiffies.h:583:1: internal compiler error: Illegal instruction

I tried to solve this problem with the `downgrade` utility, but any nvidia-dkms-xxx driver with any kernel version gets this error, not even necessarily zen, I tried both regular and lts driver. I also tried changing gcc version, linux-xxx-headers.

Last edited by replikeit (2024-07-25 17:24:03)

seth · 2024-07-25 18:17:28

It's not the DKMS, that's a gcc bug.
https://bbs.archlinux.org/viewtopic.php … 2#p2173152

Typically temperature or a CPU bug, try to manually run

dkms install nvidia/555.58.02 -k 6.9.7-zen1-1-1-zen

and if that fails again

dkms install -j  $(($(nproc)/2)) nvidia/555.58.02 -k 6.9.7-zen1-1-1-zen

(this will only use haf your cores)

replikeit · 2024-07-25 19:08:04

Tried even with one kernel

 sudo dkms install -j 1 nvidia/555.58.02 -k 6.10.1-zen1-1-zen

In file included from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-firmware.h:30,
                 from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv.h:43,
                 from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-linux.h:28,
                 from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps.c:24:
/var/lib/dkms/nvidia/555.58.02/build/common/inc/nvmisc.h:1: internal compiler error: Segmentation fault
    1 | /*
0x1fab306 internal_error(char const*, ...)
	???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues> for instructions.
make[3]: *** [scripts/Makefile.build:244: /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps.o] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from ./arch/x86/include/asm/nospec-branch.h:12,
                 from ./arch/x86/include/asm/irqflags.h:9,
                 from ./include/linux/irqflags.h:18,
                 from ./include/linux/spinlock.h:59,
                 from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-lock.h:29,
                 from /var/lib/dkms/nvidia/555.58.02/build/common/inc/nv-linux.h:32,
                 from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps-imex.c:24:
./arch/x86/include/asm/msr-index.h:6: internal compiler error: Segmentation fault
    6 | 
0x1fab306 internal_error(char const*, ...)
	???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues> for instructions.
make[3]: *** [scripts/Makefile.build:244: /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-caps-imex.o] Error 1
malloc(): invalid size (unsorted)
malloc(): invalid size (unsorted)
In file included from ./include/linux/atomic.h:82,
                 from ./include/linux/cpumask.h:14,
                 from ./arch/x86/include/asm/paravirt.h:21,
                 from ./arch/x86/include/asm/cpuid.h:62,
                 from ./arch/x86/include/asm/processor.h:19,
                 from ./arch/x86/include/asm/timex.h:5,
                 from ./include/linux/timex.h:67,
                 from ./include/linux/time32.h:13,
                 from ./include/linux/time.h:60,
                 from ./include/linux/stat.h:19,
                 from ./include/linux/module.h:13,
                 from /var/lib/dkms/nvidia/555.58.02/build/nvidia/nv-pci-table.c:25:
./include/linux/atomic/atomic-instrumented.h:4431:1: internal compiler error: Segmentation fault
 4431 | atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
      | ^~~~~~~~~~~~~~~~~~~~~~~

Hase the same.

Last edited by replikeit (2024-07-25 19:08:42)

seth · 2024-07-25 19:18:14

Are you running OOM? Do you have physical swap (file or partition)?

replikeit · 2024-07-25 19:23:00

OOM.

 systemctl status systemd-oomd.service

○ systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer
     Loaded: loaded (/usr/lib/systemd/system/systemd-oomd.service; disabled; pr>
     Active: inactive (dead)
TriggeredBy: ○ systemd-oomd.socket
       Docs: man:systemd-oomd.service(8)
             man:org.freedesktop.oom1(5)

Swap

Device         Boot     Start        End    Sectors  Size Id Type
/dev/nvme0n1p1           2048    1955839    1953792  954M ef EFI (FAT-12/16/32)
/dev/nvme0n1p2        1955840  138674175  136718336 65.2G 82 Linux swap / Solari
/dev/nvme0n1p3      138674176 3907029167 3768354992  1.8T 83 Linux

Last edited by replikeit (2024-07-25 19:23:44)

seth · 2024-07-25 19:52:02

systemd-oomd.service socket activated, "systemctl status systemd-oomd.socket" will tell whether it would ever fire.
But w/ 65G swap (maybe "swapon" to ensure it's actually active) you're also not likely running OOM compiling the nvidia driver (though the authorative test would be to monitor RAM usage during the compilation)
The segfaults seem to shift, so it's not some specific token that triggers this.

The cpu/system temerature is ok, microcode loaded and memtest86+ unsuspicious?

type cc
type gcc

replikeit · 2024-07-25 20:09:13

My configuration btw:

               +                OS: Arch Linux x86_64
               #                Hostname: arch
              ###               Kernel Release: 6.8.8-zen1-1-zen (this because I didn't reboote PC, after dkms error)
             #####              Uptime: 3:39
             ######             WM: None
            ; #####;            DE: GNOME
           +##.#####            Packages: 1265
          +##########           RAM: 8117 MB / 64074 MB
         #############;         Processor Type: Intel(R) Core(TM) i9-14900KF
        ###############+        $EDITOR: None
       #######   #######        Root: 500G / 1.8T (27%) (btrfs)
     .######;     ;###;`".      
    .#######;     ;#####.       
    #########.   .########`     
   ######'           '######    
  ;####                 ####;   
  ##'                     '##   
 #'                         `#

type cc                                                                                                                                                                                                              
type gcc

cc is /usr/bin/cc
gcc is /usr/bin/gcc

>The cpu/system temerature is ok, microcode loaded and memtest86+ unsuspicious?
1. temp is ok, sensors output when dkms building

Core 0:        +86.0°C  (high = +80.0°C, crit = +100.0°C)
Core 4:        +54.0°C  (high = +80.0°C, crit = +100.0°C)
Core 8:        +62.0°C  (high = +80.0°C, crit = +100.0°C)
Core 12:       +67.0°C  (high = +80.0°C, crit = +100.0°C)
Core 16:       +56.0°C  (high = +80.0°C, crit = +100.0°C)
Core 20:       +91.0°C  (high = +80.0°C, crit = +100.0°C)
Core 24:       +54.0°C  (high = +80.0°C, crit = +100.0°C)
Core 28:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 32:       +60.0°C  (high = +80.0°C, crit = +100.0°C)
Core 33:       +60.0°C  (high = +80.0°C, crit = +100.0°C)
Core 34:       +60.0°C  (high = +80.0°C, crit = +100.0°C)
Core 35:       +60.0°C  (high = +80.0°C, crit = +100.0°C)
Core 36:       +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 37:       +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 38:       +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 39:       +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 40:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 41:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 42:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 43:       +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 44:       +53.0°C  (high = +80.0°C, crit = +100.0°C)
Core 45:       +53.0°C  (high = +80.0°C, crit = +100.0°C)
Core 46:       +53.0°C  (high = +80.0°C, crit = +100.0°C)
Core 47:       +53.0°C  (high = +80.0°C, crit = +100.0°C)

2. microcode

sudo lsinitcpio --early /boot/initramfs-linux-zen.img

early_cpio
kernel/
kernel/x86/
kernel/x86/microcode/
kernel/x86/microcode/GenuineIntel.bin

3. memtest will try, thanks

seth · 2024-07-25 20:16:36

https://www.radgametools.com/oodleintel.htm

replikeit · 2024-07-25 20:29:23

I'm using arch linux around half of year, and every update was okay before. This can be because of intel?

seth · 2024-07-25 20:37:23

The problem manifests under pressure - usage patterns might be a factor.
GCC was updated 3 days ago, so that might play a role, but for a systematic gcc bug, you'd expect reports from users all over the place and of course the package maintainers.
Also you've posted 4 segfaults in 4 different locations, ie. right now it at least looks completely non-deterministic.

And you've a suspicious CPU.

Idk whether that's it, but would suggest to look there.

replikeit · 2024-07-25 22:18:31

Memtest86+ 0 erros

replikeit · 2024-07-25 22:51:55

Also tried with another memory, still the same...

seth · 2024-07-26 05:55:51

The compiler errors still happen at random locations?
Did you check and in doubt adjust your BIOS settings according to the radgametools link?

EnderMaster08 · 2024-07-26 13:36:05

Hey, I'm experiencing the same issue. I have a ryzen 7 3700x. I'm not entirely sure this is a CPU issue.
Edit: I think this is a different issue with the same effect. I'm pretty sure its a problem with my headers

Last edited by EnderMaster08 (2024-07-26 14:05:38)

seth · 2024-07-26 17:45:07

Does the build end with a "internal compiler error: Segmentation fault"?
That's not a problem with any headers, but a compiler bug.
If this happens on multiple, different systems, there might actually be a genuine bug in gcc

Arch Linux

#1 2024-07-25 17:23:29

Nvidia-dkms installing

#2 2024-07-25 18:17:28

Re: Nvidia-dkms installing

#3 2024-07-25 19:08:04

Re: Nvidia-dkms installing

#4 2024-07-25 19:18:14

Re: Nvidia-dkms installing

#5 2024-07-25 19:23:00

Re: Nvidia-dkms installing

#6 2024-07-25 19:52:02

Re: Nvidia-dkms installing

#7 2024-07-25 20:09:13

Re: Nvidia-dkms installing

#8 2024-07-25 20:16:36

Re: Nvidia-dkms installing

#9 2024-07-25 20:29:23

Re: Nvidia-dkms installing

#10 2024-07-25 20:37:23

Re: Nvidia-dkms installing

#11 2024-07-25 22:18:31

Re: Nvidia-dkms installing

#12 2024-07-25 22:51:55

Re: Nvidia-dkms installing

#13 2024-07-26 05:55:51

Re: Nvidia-dkms installing

#14 2024-07-26 13:36:05

Re: Nvidia-dkms installing

#15 2024-07-26 17:45:07

Re: Nvidia-dkms installing

Board footer