You are not logged in.
Hi,
I'm using dell R730 with two Nvidia K80.
I followed these steps for installing the custom kernel. After sudo reboot, the server is stuck at initial ramdisk ... then restart. I check the image files associated with the custom kernel and they all seem to be there.
cd build/
pkgctl repo clone --protocol=https linux
cd linux
in PKGBUILD:
pkgbase=linux-custom
Type in: install -Dt "$builddir/tools/bpf/resolve_btfids" tools/bpf/resolve_btfids/resolve_btfids
updpkgsums
cd build/linux
sudo pacman -S kmod
makepkg -s
sudo pacman -U /home/vorlket/build/linux/linux-custom-6.10.4.arch2-1-x86_64.pkg.tar.zst /home/vorlket/build/linux/linux-custom-headers-6.10.4.arch2-1-x86_64.pkg.tar.zst /home/vorlket/build/linux/linux-custom-docs-6.10.4.arch2-1-x86_64.pkg.tar.zst
sudo grub-mkconfig -o /boot/grub/grub.cfg
In /etc/modprobe.d/nvidia_drm.conf:
options nvidia_drm modeset=1
options nvidia_drm fbdev=1
In /etc/default/grub:
GRUB_CMDLINE_LINUX=”nvidia_drm.modeset=1”
In /etc/mkinitcpio.conf
MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)
sudo grub-mkconfig -o /boot/grub/grub.cfg
sudo mkinitcpio -p linux-custom
sudo pacman -S nvidia-settings
sudo rebootThe above code worked fine for dell T620 and precision 7810.
I appreciate if you could please help me out.
Last edited by vorlket (2025-07-20 05:02:26)
Offline
I'm using dell R730 with two Nvidia K80.
MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)
dkms status
pacman -Qs 'nvidia|kernel'https://wiki.archlinux.org/title/Dynami … le_Support
https://archlinux.org/packages/extra/x8 … idia-dkms/
https://archlinux.org/packages/extra/x8 … open-dkms/
Online
[vorlket@server ~]$ dkms status
nvidia/470.256.02, 6.15.6-arch1-1-custom, x86_64: installed[vorlket@server ~]$ pacman -Qs 'nvidia|kernel'
local/dkms 3.2.1-1
Dynamic Kernel Modules System
local/egl-wayland 4:1.1.19-1
EGLStream-based Wayland external platform
local/iptables 1:1.8.11-2
Linux kernel packet control tool (using legacy interface)
local/kmod 34.2-1
Linux kernel module management tools and library
local/libdrm 2.4.125-1
Userspace interface to kernel DRM services
local/libnetfilter_conntrack 1.0.9-2
Library providing an API to the in-kernel connection tracking state table
local/libnfnetlink 1.0.2-2
Low-level library for netfilter related kernel/userspace communication
local/libsysprof-capture 48.0-5
Kernel based performance profiler - capture library
local/libvdpau 1.5-3
Nvidia VDPAU library
local/libxnvctrl 575.64-1
NVIDIA NV-CONTROL X extension
local/linux 6.15.6.arch1-1
The Linux kernel and modules
local/linux-api-headers 6.15-1
Kernel headers sanitized for use in userspace
local/linux-custom 6.15.6.arch1-1
The Linux kernel and modules
local/linux-custom-docs 6.15.6.arch1-1
Documentation for the Linux kernel
local/linux-custom-headers 6.15.6.arch1-1
Headers and scripts for building modules for the Linux kernel
local/linux-firmware-nvidia 20250708-1
Firmware files for Linux - Firmware for NVIDIA GPUs and SoCs
local/mtdev 1.1.7-1
A stand-alone library which transforms all variants of kernel MT events to the slotted type B protocol
local/nvidia-470xx-dkms 470.256.02-7.98
NVIDIA drivers - module sources
local/nvidia-470xx-utils 470.256.02-7.98
NVIDIA drivers utilities
local/nvidia-settings 575.64-1
Tool for configuring the NVIDIA graphics driver
local/opencl-nvidia-470xx 470.256.02-7.98
OpenCL implemention for NVIDIA
local/texlive-latex 2025.2-1 (texlive)
TeX Live - LaTeX fundamental packages
local/texlive-latexextra 2025.2-1 (texlive)
TeX Live - LaTeX additional packagesOffline
That looks unsuspicious ![]()
Do you get some error messages w/ https://wiki.archlinux.org/title/Genera … l_messages and pot also "nvidia_drm.modeset=1 nomodeset" (ignore the oxymoron, the first will block the simpledrm device which I think even loads on nomodeset)
Online
I read the link but not sure what to do. Could you please give me instructions, like press e at grub and add this parameter and etc.? Thanks.
Offline
Depends on whether you actually use grub: https://wiki.archlinux.org/title/Kernel_parameters
Online
I used debug kernel parameter at the grub menu and it output the following:
...
[ ...] nvidia_drm: unknown parameter 'fbdev' ignored
[ ...] [drm] [nvidia-drm] [GPU ID 0x00000600] Loading driver
...
[ ...] [drm] [nvidia-drm] [GPU ID 0x00008400] Loading driver <-- last line and restartsDo you need to see the whole output or is this enough? I don't know how to output it onto another machine.
Last edited by vorlket (2025-07-15 10:28:58)
Offline
A screenshot if this is enough.
Last edited by vorlket (2025-07-15 10:39:21)
Offline
You're on nvidia 470
options nvidia_drm fbdev=1but the 470xx drivers don't support that.
Also things start to stall once nvidia_drm loads.
Can you boot w/ "nvidia_drm.modeset=1 nomodeset" ?
Resp. since this is a custom kernel you likely won't have the hack to block the simplydumb device on nvidia_drm.modeset=1 - you'd have to try "CONFIG_DRM_SIMPLEDRM=n", otherwise this thing loads and your nvidia driver cannot take over from it.
Online
So, I have to recompile the kernel without the simpledrm?
Offline
If that's what's getting in the way, yes.
Can you boot nomodeset?
You might be able to get away w/ "initcall_blacklist=simpledrm_platform_driver_init" in addition, but that applies fairly late.
Online
Both simpledrm and fbdev are for graphics display, so for compute devices like k80, they are not needed, or rather not supported, right?
I wonder how they didn't get in the way for dell t620 with two nvidia k20c.
Last edited by vorlket (2025-07-15 15:54:32)
Offline
Wow, hold on - that thing doesn't register as VGA device? Do you have another GPU in the system?
And yes, the entire drm stack is for graphical output - which is also where things seem to fail w/ your setup.
Online
Yes, I have a vga output apart from the two k80s.
Offline
And that's specifically what kind of device?
If it's not nvidia, remove the nvidia modules from the MODULES array.
If it is nvidia, you'll have to elaborate ("lspci -k" from any working system will do)
Online
I don't think it's nvidia, it's an intel integrated one although I have to check to be sure. I'll check tomorrow as this machine is loud and turning it on at 2am will wake the whole family up. I'll get back to you. Thanks.
Offline
[vorlket@server ~]$ lspci | grep -i vga
0e:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. G200eR2 (rev01)Remove the entire modules,
from
MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)to
MODULES=()?
I'd like to install CUDA later on.
Last edited by vorlket (2025-07-16 09:58:40)
Offline
I'd like to install CUDA later on.
That has nothing to do w/ the modules hanging around in the initramfs and loading very early on.
0e:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. G200eR2 (rev01)
Rather add the "mgag200" module and in doubt also try to boot nomodeset (into the multi-user.target only) - matrox isn't very common anymore and I'm not sure you might not face issues because of that.
Online
Ok, will recompile the kernel and remove the nvidia related configurations on Saturday and let you know.
Offline
You can start w/ only swapping nvidia out and mgag200 in - ideally that should™ be sufficient (mgag200 replacing the simpledrm device and nvidia loading somewhen later when the GPU shows up during the boot) - the only caveat is that I really don't know how the mgag200 module actually behaves.
Online
Now, the custom kernel boots. But, when I nvidia-smi, the machine restarts without any output after a min.
Offline
Spontaneous reboot means either of
* underpowered
* overheated
* broken/overclocked CPU
* bad/overclocked RAM
Do you not get this w/ the regular kernel?
What's the point of the custom kernel itfp?
Online
With the regular kernel,
[vorlket@server ~]$nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.The custom kernel is to install NVIDIA tesla K80 driver and then CUDA.
[vorlket@server ~]$lsmod | grep nvidia
nvidia_drm 81920 0
nvidia_uvm 2781184 0
nvidia_modeset 1515520 1 nvidia_drm
nvidia 40755200 2 nvidia_uvm,nvidia_modeset
video 81920 2 dell_wmi,nvidia_modesetLast edited by vorlket (2025-07-17 11:36:05)
Offline
I haven't tried the nomodeset yet. Will try it and see if it resolves the issue.
Offline
https://aur.archlinux.org/packages/nvidia-340xx-dkms doesn't build w/ the LTS kernel?
Please post your complete system journal for the boot before running nvidia-smi
sudo journalctl -b | curl -F 'file=@-' 0x0.stOnline