You are not logged in.
Background info
Up to date Arch linux install running systemd and Gnome desktop GDM,
$ uname -a
Linux HOST 6.9.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 16 Jun 2024 19:06:37 +0000 x86_64 GNU/Linux
There is an NVIDIA GTX1050 GPU card on that 6 year old machine for which neither a driver (NVIDIA v. 550.90.07 or Nouveau NV137 - GP107M) and apparently nor a kernel module (such as module 'nvidia-uvm') is present. In other words that piece of NVIDIA hardware is effectively disabled.
Issue
At every boot, a very long pause (30~40s) with black screen and blinking cursor takes place before getting the GDM session login prompt on screen:
The waiting ends with the message:
[TIME] Timed out waiting for device /dev/tpmrm0
I found the corresponding entry in the boot log, right after the kernel module loading loop exits, but before it, looking at the boot log I noticed another Trusted Platform Module Interface Specification tpm_tis related entry I had not seen before:
.... kernel: tpm_tis MSFT0101:00: probe with driver tpm_tis failed with error -1
another security-related entry which may or may not be related (I think it is not related):
.... kernel: x86/cpu: SGX disabled by BIOS.
which I also found in the dmesg output.
Thinking this might be kernel booting hooks and module related I went to my /etc/mkcpinitio.conf. The list of module and hooks is:
MODULES=(intel_agp i915 crc32 libcrc32c crc32c_generic crc32c-intel crc32-pclmul crypto-crc32 bluetooth)
....
HOOKS=(base udev resume autodetect modconf kms keyboard keymap consolefont block filesystems fsck)
Any pointers welcome.
Last edited by Cbhihe (2024-06-24 09:07:18)
I like strawberries, therefore I'm not a bot.
Offline
Online
Thank you, @seth, for pointing me to @redqueen's 2024-06-18
# systemctl mask dev-tpmrm0.device
Created symlink '/etc/systemd/system/dev-tpmrm0.device' → '/dev/null'
The Timed out waiting for device /dev/tpmrm0 entry disappeared from the boot journal and the waittime is approximately cut by 4 but I still have 7 or 8 seconds of black screen with a blinking cursor. However
.... kernel: tpm_tis MSFT0101:00: probe with driver tpm_tis failed with error -1
is still there. It might be unrelated but I could not find any reference to tmp_tis in terms of the linux kernel.
I like strawberries, therefore I'm not a bot.
Offline
You should have lost a 90s delay?
Check
systemd-analyze critical-chain
Wrt your nvidia GPU, you've a pascal chip but neither nouveau nor nvidia load - nouveau is likely blacklisted by nvidia-utils (there's a reference to libGLX_nvidia.so.0)?
You need https://aur.archlinux.org/packages/nvidia-340xx-dkms for the GPU
Edit: hold on, wrong physicist - your GPU is supported by the regular nvidia driver
pacman -Qs nvidia
Last edited by seth (2024-06-22 21:28:09)
Online
90s delay ? No, the empty black screen with a blinking cursor moment was always well under the minute to start with... more like 30s, but it stood out.
$ systemd-analyze time
Startup finished in 15.211s (firmware) + 21.085s (loader) + 3.721s (kernel) + 5.732s (userspace) = 45.750s
graphical.target reached after 5.503s in userspace.
$ systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
graphical.target @5.503s
└─multi-user.target @5.503s
└─docker.service @3.960s +1.542s
└─network-online.target @3.959s
└─netctl-wait-online.service @3.577s +375ms
└─network.target @3.573s
└─netctl-auto@wifi0.service @3.069s +503ms
└─basic.target @3.066s
└─dbus-broker.service @3.040s +23ms
└─dbus.socket @3.034s
└─sysinit.target @3.031s
└─systemd-resolved.service @2.267s +763ms
└─run-credentials-systemd\x2dresolved.service.mount @2.928s
if I understood the man page well, the latter does not give info on timed out "activation" states.
And to see NVIDIA GPU card-related and already installed packages:
$ pacman -Qs nvidia
local/bumblebee 3.2.1-21
local/egl-wayland 2:1.1.13-2
local/libvdpau 1.5-2
local/nvidia-utils 550.90.07-3
local/opencl-nvidia 550.90.07-3
I like strawberries, therefore I'm not a bot.
Offline
You don't have any nvidia kernel modules (nvidia, nvidia-lts, nvidia-dkms) installed.
Other than that you're spending most time waiting for docker (1.5s) and the wireless network (as precondition for docker)
Online
There are indeed no nvidia related modules listed in the MODULES array in mkinitcpio.conf per my original post.
Still this black screen and its frigging blinking cursor bug me to no end... and somehow I think it's related to something wrong (or that I don't understand well enough) in that machine's boot sequence.
So since kms is there (listed as a hook and staring me in the face), and its role is basically to invoke the relevant driver module to include in the boot sequence (thus OBVIATING the need to have said kernel modules explicitly listed in the MODULES array instead), I'll just try removing the kms hook, adding various NVIDIA GPU-related modules in the MODULES array and see what happens with that disabled GPU card and how the the boot sequence fares.
Will report on that ASAP.
I like strawberries, therefore I'm not a bot.
Offline
After installing the nvidia package
# pacman -Syu nvidia
...
$ lspci -nnk | grep nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
$ pacman -Qs nvidia
local/bumblebee 3.2.1-21
local/egl-wayland 2:1.1.13-2
local/libvdpau 1.5-2
local/nvidia 550.90.07-4
local/nvidia-utils 550.90.07-3
local/opencl-nvidia 550.90.07-3
and modifying /etc/mkinitcpio.conf MODULES and HOOKS arrays so:
MODULES=(intel_agp i915 crc32 libcrc32c crc32c_generic crc32c-intel crc32-pclmul crypto-crc32 nvidia nvidia_drm nvidia_uvm nvidia_modeset bluetooth)
....
HOOKS=(base udev resume autodetect modconf keyboard keymap consolefont block filesystems fsck)
I also modified the blacklisted modules to leave only nouveau blacklisted:
# cat /etc/modprobe.d/blacklist-modules.conf
blacklist nouveau
#blacklist nvidia
#blacklist nvidia_drm
#blacklist nvidia_uvm
#blacklist nvidia_modeset
Still in /etc/mobprobe.d/, I created a new module configuration file for NVIDIA DRM kernel modesetting, following the wiki just to allow for possible Wayland compositors:
# cat /etc/modprobe.d/nvidia_drm_modeset.conf
options nvidia_drm modeset=1
rebuilt initramfs with "sudo mkinitcpio -P" , rebooted and ... nothing happened, same wait time, same black screen, same blinking cursor, but mostly:
$ grep -i nvidia < <(lsmod)
yielded zilch, meaning that no nvidia related module were loaded. o_O !¿#¡¿
EDIT:
The new boot journal is here and has the entry:
HOST systemd-modules-load[299]: Module 'nvidia_uvm' is deny-listed (by kmod)
and the complete output of the initramfs rebuild looks ordinary to me:
# mkinitcpio -P
==> Building image from preset: /etc/mkinitcpio.d/linux.preset: 'default'
==> Using configuration file: '/etc/mkinitcpio.conf'
-> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux.img
==> Starting build: '6.9.6-arch1-1'
-> Running build hook: [base]
-> Running build hook: [udev]
-> Running build hook: [resume]
-> Running build hook: [autodetect]
-> Running build hook: [modconf]
-> Running build hook: [keyboard]
-> Running build hook: [keymap]
-> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
-> Running build hook: [block]
-> Running build hook: [filesystems]
-> Running build hook: [fsck]
==> Generating module dependencies
==> Creating zstd-compressed initcpio image: '/boot/initramfs-linux.img'
-> Early uncompressed CPIO image generation successful
==> Initcpio image generation successful
==> Building image from preset: /etc/mkinitcpio.d/linux.preset: 'fallback'
==> Using configuration file: '/etc/mkinitcpio.conf'
-> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux-fallback.img -S autodetect
==> Starting build: '6.9.6-arch1-1'
-> Running build hook: [base]
-> Running build hook: [udev]
-> Running build hook: [resume]
-> Running build hook: [modconf]
-> Running build hook: [keyboard]
-> Running build hook: [keymap]
-> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
-> Running build hook: [block]
-> Running build hook: [filesystems]
-> Running build hook: [fsck]
==> Generating module dependencies
==> Creating zstd-compressed initcpio image: '/boot/initramfs-linux-fallback.img'
-> Early uncompressed CPIO image generation successful
==> Initcpio image generation successful
Last edited by Cbhihe (2024-06-23 19:24:45)
I like strawberries, therefore I'm not a bot.
Offline
modinfo nvidia | head -n8
modprobe -c | grep -v alias | grep -E 'nvidia|nouveau'
Edit: thoug you're running on the intel chip anyway - have you tried to disable GDM debugging?
Last edited by seth (2024-06-23 19:52:48)
Online
$ modinfo nvidia | head -n8
filename: /lib/modules/6.9.6-arch1-1/extramodules/nvidia.ko.xz
alias: char-major-195-*
version: 550.90.07
supported: external
license: NVIDIA
firmware: nvidia/550.90.07/gsp_tu10x.bin
firmware: nvidia/550.90.07/gsp_ga10x.bin
srcversion: 77579897AEBAB0555999FA0
$ modprobe -c | grep -v alias | grep -E 'nvidia|nouveau'
blacklist nouveau
blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset
blacklist nvidia_uvm
blacklist nouveau
blacklist nouveau
options nvidia_drm modeset=1
Somehow the four modules I just added to the MODULES array in mkinitcpio.conf, i.e. nvidia, nvidia_drm, nvidia_modeset, nvidia_uvm are still blacklisted. What gives ?
Was not aware that GDM debugging was on, on that box... Will look into that tomorrow morning. Thanks a lot @seth.
I like strawberries, therefore I'm not a bot.
Offline
Did you reboot after building the initramfs?
Did you maybe forget to mount the boot partition?
Is there maybe a second config file that still blacklists those modules?
Online
GDM debugging is now disabled in /etc/gdm/custom.conf
Yup ! I reboot after each initramfs rebuilding. Always.
Mounting the boot partition is automatic from /etc/fstab.
$ lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
nvme0n1
├─nvme0n1p1 vfat FAT32 4E..-..6E 120.1M 25% /boot/efi
├─nvme0n1p3 ext4 1.0 /var 2dd5a6b7-7d.......................04f0e951a6a 50.1G 65% /var
├─nvme0n1p5 ext4 1.0 /boot 061f3fa7-77.......................f61ed3c6058 650.5M 26% /boot
├─nvme0n1p7 ext4 1.0 /home c96f3541-.......................-795e833b9ac9 58.4G 72% /home
├─nvme0n1p6 ext4 1.0 /root 0f2397cc-.......................5710b54238a2 11.3G 61% /
└─nvme0n1p2 swap 1 swap c1e198e1.......................6-ff6dd6b2d415 [SWAP]
As for the second config file, I actually looked for one and will continue to do so, perhaps with a system search in the complete /etc/ subtree, but the mere possibility of a second conf file is now borderline weird.... (Why on earth would anybody do that ?)
Last edited by Cbhihe (2024-06-23 21:44:34)
I like strawberries, therefore I'm not a bot.
Offline
Instead of the entire /etc, grep these paths: https://man.archlinux.org/man/core/kmod/modprobe.d.5.en
Online
Got it ! The blacklisting occurred in /lib/modprobe.d/bumblebee.conf. Not so weird after all.
I will keep it as is for now to avoid getting in the intricacies of setting up either optimus-manager or nvidia-xrun at this point.
Solved.
I like strawberries, therefore I'm not a bot.
Offline
Does the GPU consume an intolerable amount of battery w/ the regular-ass prime setup?
Online
I just booted up once with the GTX 1050 GPU enabled and I thought the laptop's two fans were about to make the box lift off the table. The laptop's battery lost the top 20% of its charge in about 10~15min, which may be an indication that the battery is not what it used to be... or that the GPU hogs on it big time.
But that was not my prime concern (no pun intended), even though it is a factor in keeping a lid on the GPU.
Rather I read quite a bit (Wikis and all) on the amount of tweaking and daemon/service-enabling it takes to make this a somewhat reliable production box in terms of resumption of activity after hibernation, switching between GPU off and on states, etc... It's just that I cannot invest the time right now and I am not certain that the box' main user would like that or even needs it since we have a GPU cluster out here for heavy computational loads. One GTX1050 more or one less won't make a difference.
So the idea is to get back to it when that box is retired from prod, which I expect should happen before the end of the year.
Thanks again @Seth. Help much appreciated. ;-)
Last edited by Cbhihe (2024-06-24 15:36:13)
I like strawberries, therefore I'm not a bot.
Offline