You are not logged in.

#1 2025-06-30 19:50:29

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

[SOLVED] New issues with NVIDIA 575.64

A few days ago (28th) I had updated my system as I normally do, firmware package issue, go and fix it, it all works fine. Update went fine, mkinitcpio had no errors, then I finally go to reboot. Works fine, except it then tried to load the NVIDIA module in the initramfs, it suddenly spits out this log:

archlinux kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
archlinux kernel: NVRM: No NVIDIA GPU found.
archlinux kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 235
archlinux systemd-modules-load[182]: Failed to insert module 'nvidia': No such device
archlinux kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
archlinux kernel: NVRM: No NVIDIA GPU found.
archlinux kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 235
archlinux systemd-modules-load[182]: Failed to insert module 'nvidia_modeset': No such device
archlinux kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
archlinux kernel: NVRM: No NVIDIA GPU found.
archlinux kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 235
archlinux systemd-modules-load[182]: Failed to insert module 'nvidia_uvm': No such device
archlinux kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
archlinux kernel: NVRM: No NVIDIA GPU found.
archlinux kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 235

And nvidia-persistence fails to start up with

 nvidia-persistenced[626]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 143 has read and write permissions for those files.

however my computer boots up fine. It's a hybrid laptop to be clear, ASUS TUF A17. Far after Init starts, before the firewall starts, it finally has this output:

nvidia 0000:01:00.0: enabling device (0000 -> 0003)
nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  575.64  Release Build  (root@)

then finally the nvidia module loads, and everything is fine. it's for some reason late loading despite modules set in /etc/mkinitcpio:

MODULES=(amdgpu nvidia nvidia_modeset nvidia_uvm nvidia_drm r8169 usbhid xhci_hcd)

My modules here I've used for about half a year now in this configuration, with no issues, as my laptop is a hybrid system with an AMD igpu. Never had an issue like this before, and yes I do have modeset enabled in kernel commandline as well, that shouldn't be an issue. I ignored it for quite some time until my laptop did not wake up from suspension on the 29th, then I swapped to the LTS kernel hopefully to fix it. When I put my laptop to sleep then, it didn't at all. Watched my external keyboard flicker then immediately come back on a few times, opened up my laptop, checked logs. Had a kernel warning in nv.c:

WARNING: CPU: 9 PID: 686617 at /build/nvidia-lts/src/nvidia/575.64/build/nvidia/nv.c:4648 nv_set_system_power_state+0x41e/0x480 [nvidia]

the stacktrace is here https://pastebin.com/41nXwSLY
and the suspension giving these errors:

nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5
nvidia 0000:01:00.0: PM: failed to suspend async: error -5
systemd-sleep[686660]: Failed to put system to sleep. System resumed again: Input/output error

My nvidia packages all match the version 575.64.

All I have done was, check the news, package changes, possible issues before updating, nothing seemed too out of the ordinary, then updated and resolved the issue with the new firmware packages. I have already tried reinstalling all nvidia packages and regenerating initramfs, and a different kernel that has a different kernel module (nvidia-open for mainline, nvidia-lts for lts kernel, they have always worked fine.), it has the same issue, except on LTS I _cant_ suspend. is this a nvidia module regression? Race condition? ACPI bus no longer initializing the dGPU during early boot? I already am at a loss for how to debug this. Was there any recent change that requires more manual intervention that I am not aware of for the 575.64 modules?

Last edited by Sync Shard (2025-07-07 16:49:26)


"What is the cost of lies?"

Offline

#2 2025-06-30 21:30:51

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

Please post your complete system journal for the boot:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

There's a couple of daemons/tools that can disable the GPU (eg. optimus-manager, supegfxd which is related to asusd, …) you could face aspm issues, lack NVreg_PreserveVideoMemoryAllocations=1 or it's GSP related or …

Offline

#3 2025-06-30 21:59:00

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

Here are the logs for the boot on the LTS kernel, has the suspend errors at the end, and boot errors:
https://0x0.st/8URD.txt
And current boot if needed:
https://0x0.st/8U7-.txt

I indeed have Supergfxd, so I quickly went to disable, stop, uninstall, and regenerate initramfs in case it had any initramfs files (it probably does not, but I wanted to be sure.), and reboot but the issue _persists without_ it at all. I have the NVreg_PreserveVideoMemoryAllocations=1 parameter, and my GPU is a GTX 1650, which is Turing architecture which the open drivers need at minimum, which means it does support the nvidia-open module.

I've noticed nvidia-powerd failed now, but after a quick search realized it was only for Ampere GPUs, so I disabled it, I should've looked up what it was before enabling but it doesn't matter now.

My bios *does* have some issues, however I am on the latest I can get with my A17 (v307, website doesnt have any newer for my model), not to mention the issue only happened when I upgraded from nvidia 570 to 575. I'm just hoping I've overlooked something and this isnt like a big change happened to the NVIDIA module that my bios/ACPI doesn't like.


"What is the cost of lies?"

Offline

#4 2025-07-01 07:50:15

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

Jun 30 17:40:05 archlinux kernel: Linux version 6.15.4-arch2-1 (linux@archlinux) (gcc (GCC) 15.1.1 20250425, GNU ld (GNU Binutils) 2.44.0) #1 SMP PREEMPT_DYNAMIC Fri, 27 Jun 2025 16:35:07 +0000
…
Jun 30 17:40:10 archlinux kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
Jun 30 17:40:10 archlinux kernel: NVRM: No NVIDIA GPU found.
Jun 30 17:40:10 archlinux kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 235
Jun 30 17:40:10 archlinux systemd-modules-load[186]: Failed to insert module 'nvidia': No such device
…
Jun 30 17:40:11 ASUS-A17-Arch systemd[1]: Finished Remount Root and Kernel File Systems.
…
Jun 30 17:40:13 ASUS-A17-Arch systemd[1]: nvidia-powerd.service: Failed with result 'exit-code'.
Jun 30 17:40:13 ASUS-A17-Arch systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Jun 30 17:40:13 ASUS-A17-Arch systemd[1]: Failed to start NVIDIA Persistence Daemon.
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: [10de:1f9d] type 00 class 0x030000 PCIe Legacy Endpoint
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: BAR 3 [mem 0x00000000-0x01ffffff 64bit pref]
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: BAR 5 [io  0x0000-0x007f]
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: ROM [mem 0x00000000-0x0007ffff pref]
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: 63.008 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x8 link at 0000:00:01.1 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: Adding to iommu group 13
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: vgaarb: bridge control possible
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
Jun 30 17:40:13 ASUS-A17-Arch kernel: pci 0000:01:00.1: [10de:10fa] type 00 class 0x040300 PCIe Endpoint
…
Jun 30 17:40:14 ASUS-A17-Arch asusd[766]: [INFO  asusd]        daemon v6.1.12

17:40:05: boot starts
17:40:10: in the initramfs the kernel module is loaded (explicitly?) but the GPU doesn't respond
17:40:11: you're leaving the initramfs
17:40:13: after some nvidia userspace failures, the GPU finally shows up on the bus - 8s into the boot
17:40:14: asusd starts only now

acpi_osi=Ubuntu

Why?

Is there a parallel os? Ubuntu? Windows?

Don't

Jun 30 17:40:09 archlinux systemd-modules-load[186]: Inserted module 'amdgpu'
Jun 30 17:40:09 archlinux kernel: nvidia: loading out-of-tree module taints kernel.

explicitly load these modules w/ systemd-modules-load but the nvidia GPU showing up really late on the bus is probably an issue - can you configure anything about the graphics in the BIOS/UEFI?

Offline

#5 2025-07-01 09:31:59

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

seth wrote:
acpi_osi=Ubuntu

Why?

That was something I did a long long time ago, I can remove it, but it does not do anything. It was back when I was tinkering desperately to fix an issue I was overlooking, and I realized my bios for some reason responded to acpi_osi=Ubuntu. Doesn't do anything, just havent removed it yet that's all.

seth wrote:

Is there a parallel os? Ubuntu? Windows?

No. I single-boot Arch with two kernels. No windows or other OS.


seth wrote:

Don't explicitly load these modules w/ systemd-modules-load but the nvidia GPU showing up really late on the bus is probably an issue - can you configure anything about the graphics in the BIOS/UEFI?

I don't know why systemd-modules-load is doing that, I only have amdgpu module in my initramfs. The only thing I have in /etc/modules-load.d is i2c-dev. And regarding BIOS/UEFI, no, I have no option for graphics in there. no settings related to gpus or power saving, however I did have fast boot on, so I turned it off, but nothing changed, the issue is still happening.

I'm going to test EFISTUB boot, maybe something is happening with my UEFI and Sysd-boot. Extremely unlikely, but worth a try I guess. If no response in 5 minutes, it did nothing.

EDIT: It did nothing. I went ahead and tried tsc=unstable and also removed that useless kernel parameter. Did not fix it, but less junk in there at least. I looked at /etc/modules-load.d/ and /usr/lib/modules-load.d/ for whatever is putting amdgpu into it, but there is nothing.

ls /usr/lib/modules-load.d/
bluez.conf  ddcutil.conf  nvidia-utils.conf
ls /etc/modules-load.d/
gnutls.conf  i2c-dev.conf

Last edited by Sync Shard (2025-07-01 09:58:06)


"What is the cost of lies?"

Offline

#6 2025-07-01 12:11:24

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

Only boot into the multi-user.target (2nd link below) and then check

lsmod | grep nvidia

If it's loaded, test 

nvidia-smi

otherwise get a cup of coffee and re-check 

lsmod | grep nvidia

If it's still not there run (and likely fail)

nvidia-smi

and then see whether this results in the nvidia-module being loaded - otherwise just to see whether nvidia-smi works after the delay.
Also post your mkinitcpio.conf and

lsinitcpio /boot/initramfs-linux.img | grep nvidi

(chances are the kms hook might have added it?)

Nothing of this explains why the nvidia GPU appears w/ 8s delay on the bus… hmm

Offline

#7 2025-07-01 12:42:14

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

I checked with multi-user.target even though I already boot into TTY (ly dm), and yes,

# lsmod | grep nvidia 
nvidia_uvm           4005888  8
nvidia_drm            143360  8
nvidia_modeset       2174976  4 nvidia_drm
nvidia              12951552  68 nvidia_uvm,nvidia_modeset
drm_ttm_helper         16384  3 amdgpu,nvidia_drm
video                  81920  4 asus_wmi,amdgpu,asus_nb_wmi,nvidia_modeset

outputs all the nvidia modules, and they are fully active just like that the moment I get into the tty. NVIDIA modules always load immediately, just very late into boot and not before init (when they should be loading.)

The initramfs also does have all the modules that I would think I need, though you could probably check it against your own to verify if you have a NVIDIA gpu:

# lsinitcpio /boot/EFI/Linux/arch-linux.efi | grep nvidia

usr/lib/modules/6.15.4-arch2-1/extramodules/nvidia-drm.ko.zst
usr/lib/modules/6.15.4-arch2-1/extramodules/nvidia-modeset.ko.zst
usr/lib/modules/6.15.4-arch2-1/extramodules/nvidia-uvm.ko.zst
usr/lib/modules/6.15.4-arch2-1/extramodules/nvidia.ko.zst
usr/lib/modules/6.15.4-arch2-1/kernel/drivers/hid/hid-nvidia-shield.ko.zst
etc/modprobe.d/nvidia.conf
usr/lib/firmware/nvidia/
usr/lib/firmware/nvidia/575.64/
usr/lib/firmware/nvidia/575.64/gsp_ga10x.bin
usr/lib/firmware/nvidia/575.64/gsp_tu10x.bin
usr/lib/modprobe.d/nvidia-sleep.conf
usr/lib/modprobe.d/nvidia-utils.conf

The only hint I even have is that the subversions of my nvidia drivers are slightly off. But I heard that only the major (###.) and minor (.##) versions matter and not the dash. My system has been fully updated with -Syu, ive never done a partial update.

# pacman -Q | grep nvidia

lib32-nvidia-utils 575.64-2
lib32-opencl-nvidia 575.64-2
libva-nvidia-driver 0.0.14-1
linux-firmware-nvidia 20250627-1
nvidia-lts 1:575.64-3
nvidia-open 575.64-5
nvidia-prime 1.0-5
nvidia-settings 575.64-1
nvidia-utils 575.64-1
opencl-nvidia 575.64-1

and ive had those sub versions not matter before and be fine even if they weren't the same.


every line of my mkinitcpio that isn't commented (i still have the default commented out sections, so i'll only put the relevant stuff.)

MODULES=(amdgpu nvidia nvidia_modeset nvidia_uvm nvidia_drm nvme r8169 ext4 usbhid xhci_hcd usb_storage)
BINARIES=()
FILES=()
HOOKS=(base systemd keyboard plymouth autodetect microcode modconf sd-vconsole block filesystems fsck)

then in /etc/mkinitcpio.conf.d/asus.conf:

MODULES+=(hid_asus asus_wmi asus_nb_wmi)

and if it matters at all, all of my modprobe.d files:

blacklist.conf:
blacklist nouveau
options nouveau modeset=0

nvidia.conf:
options nvidia_drm modeset=1 fbdev=1
options nvidia NVreg_UsePageAttributeTable=1

supergfxd.conf:
# Automatically generated by supergfxd
blacklist nouveau
alias nouveau off

Ill probably remove the supergfdx since I already have it blacklisted.


"What is the cost of lies?"

Offline

#8 2025-07-01 12:48:53

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

The initramfs also does have all the modules that I would think I need

The plan would be to move the nvidia stuff out of the initramfs, where

MODULES=(amdgpu nvidia nvidia_modeset nvidia_uvm nvidia_drm nvme r8169 ext4 usbhid xhci_hcd usb_storage)

they're explicitly added

the subversions of my nvidia drivers are slightly of

The thing behind the final dash is the build number, it's normal to vary a bit if one package got rebuilt (for wahtever reason, maybe to pick up a patch) but the other didn't.

The core issue remains that the nvidia GPU shows up so late on the bus, we can currently only check whether this remains a problem if the nvidia module doesn't get loaded pre-maturely.
And then you might have to artificially delay the boot though I almost suspect that you're somehow™ causing the GPU to show up when starting the display server.

Offline

#9 2025-07-01 13:15:38

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

seth wrote:

And then you might have to artificially delay the boot though I almost suspect that you're somehow™ causing the GPU to show up when starting the display server.

I dont even have a display server I boot into, that's the weird part. I boot directly into TTY or LY DM (TUI DM), immediately check on nvidia-smi and its fully started and running.

I can also see that during the boot, about 1-2 seconds before it is finished booting (ie, plymouth ends), my screen backlight flickers on to full brightness right as the nvidia module kicks on. I can see (as I dont use quiet splash parameters) that the nvidia module starts before any DM is starting.

The reason why I even updated this month is because on driver 570.153.02-2 I had some weird suspension issues I've never had before, where my laptop would just completely freeze when waking from suspension every now and then. Previous versions didn't do that. Spam Ctrl + Alt + Delete does not work, REISUB does nothing, etc. It happened twice in a week, so I thought 'maybe it's just a weird bug" and I updated to 575.64 and now this all happened after I rebooted.

I suppose I can just keep using the mainline and not use LTS often, and hope just one day that if this is indeed a regression then it will be fixed eventually, and I can bare with my PC not waking up occasionally. Just needs a hard reboot and it's fine again. The module always fully loads before it's done booting, just very late.

It could also maybe be an asusd or ROG Control Center regression that messes with the ACPI bus or something. I'd hope not but it might. If I'm an outlier and no one else has had this issue, then it's probably just something weird that I missed that is brand new.

I could also try DKMS again and see if compiling the modules makes it work fine, but last time I tried that, my system OOM-killed the update itself over and over and gave me a massive scare 3 separate times. With only 2 threads in /etc/makepkg.conf and no parallel in dkms config, so I have come to despise DKMS hmm


"What is the cost of lies?"

Offline

#10 2025-07-01 13:54:26

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

that the nvidia module starts before any DM is starting.

and *after* the GPU shows up on the bus?
The errors in your OP are exclusively because the nvidia modules get explicitly loaded by

archlinux systemd-modules-load[182]: Failed to insert module 'nvidia': No such device

before the GPU even show up *at all*, ie. before you could see it in lspci.
Once the GPU finally shows up, everything actually works as expected.

The GPU being late is very much a HW/firmware thing - there's nothing the OS could do about that.
What you can do at the OS level is
1. preventing the modules from being explicitly loaded (by not doing that and I assume their presence in the modules hook is what triggers that)
2. stalling any GUI until the GPU finally shows up (either manually, enough™ sleep or polling the bus/nvida-smi until it's there)

Offline

#11 2025-07-02 10:26:51

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

seth wrote:

and *after* the GPU shows up on the bus?

After the NVIDIA module finally loads the boot then continues and finishes up, then Plymouth ends and the DM and everything starts back up. The boot actively waits for NVIDIA to load due to systemd-modules-load, even though nothing is telling it to load NVIDIA; I've already shared the config files for systemd-modules-load.

Ive realized that it isn't worse with the LTS kernel, it's present in both of them. If I were to play any steam game, any at all that uses the NVIDIA gpu, then I quit out and go to suspend or hibernate, it won't let me and I get the same error from trying to suspend as in that LTS kernel log I provided, but also for hibernation as well. It fully prevents me from suspending or hibernating _even with mainline._ ATP I I'm going to bare with it until hopefully the next NVIDIA update fixes this, and if it doesn't, I will rollback to 570.153, where I know the issues were not happening at all.

But so far, what I know is that it's effecting both the proprietary (nvidia-lts) and open (nvidia-open) drivers, and only started happening when I upgraded to nvidia 575.64 and fixed the linux-firmware package shenanigans exactly how it said on the news.

If I were to guess, this is more likely to be a linux-firmware-nvidia issue possibly, (if it is a package issue at all, even if unlikely)? This feels deeper than the driver itself.

I'll provide my last boot log of the failed suspend and failed hibernate, next time I get on my pc.

Edit: Perhaps another wild guess is that mkinitcpio is forgetting to include a module from linux-firmware-nvidia. If that were the case though I would figure the GPU would not be recognized at all nor start up... unless it were somewhere post-init, in which case that maybe could explain it? I'm pulling at strings at this point. Feel free to laugh if this is solved and I'm wrong.

EDIT 2:
last boot log, suspend happens at 05:42:29 and hibernating at 05:43:50.
http://0x0.st/80BU.txt

Last edited by Sync Shard (2025-07-02 10:55:29)


"What is the cost of lies?"

Offline

#12 2025-07-02 12:41:57

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

I've already shared the config files for systemd-modules-load.

The ones from your initramfs?

archlinux systemd-modules-load[182]: Failed to insert module 'nvidia': No such device

systemd-modules-load *did* (does?) try to load the module - that's a non-negotiable fact.

mkdir /tmp/myinitramfs
cd /tmp/myinitramfs
lsinitcpio -x /boot/initramfs-linux-lts.img # I guess the lts one is still relevant?
ls -lR | curl -F 'file=@-' 0x0.st
tail -n 1000 etc/modules-load.d/*

If I were to guess, this is more likely to be a linux-firmware-nvidia issue possibly

Explicitly loading the modules tooearly™ is a local configuration issue.
The device showing up on the bus only 8s into the boot is a hardware/firmware (as in "uefi", not the nvidia firmware) issue
I suppose we cannot do anything about the latter (except if there's a mitigating BIOS update) so we'll have to work around that.

mkinitcpio is forgetting to include a module from linux-firmware-nvidia

specifically starting by NOT adding nvidia to the initramfs itfp.

Offline

#13 2025-07-02 12:59:01

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

seth wrote:

systemd-modules-load *did* (does?) try to load the module - that's a non-negotiable fact.

Yep. But nothing is in my /etc/modules-load.d/ to warrant it doing that. Not even in /usr/lib/modues-load.d/.
Output of tail -n 1000 /etc/modules-load.d/*:

==> /etc/modules-load.d/gnutls.conf <==
#tls

==> /etc/modules-load.d/i2c-dev.conf <==
i2c-dev

Basically the same for /usr/lib/modules-load.d. Nothing in there that tells it to load NVIDIA; I already checked. I've not had a bios/uefi update or any firmware change since I got my laptop, so it suddenly not appearing on the bus til 8 seconds from updating from NVIIDA 570.153 to 575.64 just _REALLY_ feels like an update issue and not my BIOS, unless they changed how they handled UEFI and ACPI stuff. Possible, but I'd feel like they wouldn't break something this much by just changing that.

seth wrote:

specifically starting by NOT adding nvidia to the initramfs itfp.

And I already have all of the modules needed, is there a new module I need to add apart from "nvidia nvidia_modeset nvidia_uvm nvidia_drm"? I would assume not, and if not, I already have it all in my mkinitcpio as I already have shared.

Here is my extracted initramfs:
https://0x0.st/80uo.txt


"What is the cost of lies?"

Offline

#14 2025-07-02 13:04:15

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,990

Re: [SOLVED] New issues with NVIDIA 575.64

Sync Shard wrote:
seth wrote:

specifically starting by NOT adding nvidia to the initramfs itfp.

And I already have all of the modules needed, is there a new module I need to add apart from "nvidia nvidia_modeset nvidia_uvm nvidia_drm"? I would assume not, and if not, I already have it all in my mkinitcpio as I already have shared.

I think seth wants you to remove the nvidia module from that line.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#15 2025-07-02 13:07:10

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

"Desperately"

Also what's in ./etc/modules-load.d/MODULES.conf - the one in from the extracted initramfs! NOT!

Output of "tail -n 1000 /etc/modules-load.d/*:"

We don't care about the files on your root partition at this point.

Offline

#16 2025-07-02 13:25:29

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

Lone_Wolf wrote:

I think seth wants you to remove the nvidia module from that line

ah I see. I went ahead and did that and it booted in fine, with less errors, however it did still fail once with the no GPU found error, but it retried and worked fine. Shall I provide the journal?

seth wrote:

Also what's in ./etc/modules-load.d/MODULES.conf - the one in from the extracted initramfs!

whoops, sorry. Misunderstanding. The initramfs only has modules from my root's /etc/mkinitcpio.conf:

amdgpu
r8169
usbhid
xhci_hcd

That's what's in the initramfs without NVIDIA (LTS without the nvidia modules in initramfs),
and with NVIDIA modules:

amdgpu
nvidia
nvidia_modeset
nvidia_uvm
nvidia_drm
r8169
usbhid
xhci_hcd
hid_asus
asus_wmi
asus_nb_wmi

So they're there. I suppose if they were not there, then it would not be loading the modules at all after the first failure.


"What is the cost of lies?"

Offline

#17 2025-07-02 13:49:24

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

I'd assume adding them to the MODULES array gets you the entry in MODULES.conf

it did still fail once with the no GPU found error, but it retried and worked fine. Shall I provide the journal?

Yes please.
But because the nvidia GPU shows up so late you'll have to actively delay and GUI target that shall make use of it.
I've only ever seen this w/ eGPUs where it's rather normal and expectable - the GPU might take a second to actually initialize but not appearing on even the bus for several seconds is wild.

Offline

#18 2025-07-02 14:04:17

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

seth wrote:

But because the nvidia GPU shows up so late you'll have to actively delay and GUI target that shall make use of it.

Since the module is active by the time I've booted I dont really need to wait or delay. It's all active the moment its done booting. The errors are still there and are pretty scary though, and I've never had an issue even close to this before until 575.64 and linux-firmware package changes. I also think it slows down booting due to the errors, but that's a small issue.

The boot errors and inability to suspend/hibernate are probably related. I've never had suspension just actively fail and kick my pc out of it. It was either it doesn't wake up (rarely) or doesn't go to sleep cause it's doing something (i.e. timeshift backup).

https://0x0.st/80u9.txt
Journal for no nvidia in initramfs. I noticed in the log it says its loading the NVIDIA module, showing the kernel taint warning, then quite a bit after, it finally says "no nvidia gpu found". Bit odd, but it does eventually still load.


"What is the cost of lies?"

Offline

#19 2025-07-02 14:59:17

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

NVreg_PreserveVideoMemoryAllocations=1

That should be

nvidia.NVreg_PreserveVideoMemoryAllocations=1

but is likely also default anyway.

Highlights:

Jul 02 09:07:51 ASUS-A17-Arch systemd[1]: Finished Remount Root and Kernel File Systems.
…
Jul 02 09:07:51 ASUS-A17-Arch kernel: nvidia: loading out-of-tree module taints kernel.
Jul 02 09:07:52 ASUS-A17-Arch kernel: NVRM: No NVIDIA GPU found.
…
Jul 02 09:07:52 ASUS-A17-Arch (udev-worker)[457]: asus-nb-wmi: Process 'systemctl restart asusd.service' failed with exit code 1.
Jul 02 09:07:52 ASUS-A17-Arch kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 234
Jul 02 09:07:52 ASUS-A17-Arch systemd-modules-load[370]: Failed to insert module 'nvidia_uvm': No such device
…
Jul 02 09:07:53 ASUS-A17-Arch nvidia-persistenced[608]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 143 has read and write permissions for those files.
Jul 02 09:07:53 ASUS-A17-Arch nvidia-persistenced[603]: nvidia-persistenced failed to initialize. Check syslog for more details.
Jul 02 09:07:53 ASUS-A17-Arch nvidia-persistenced[608]: Shutdown (608)
Jul 02 09:07:53 ASUS-A17-Arch systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
Jul 02 09:07:53 ASUS-A17-Arch kernel: pcieport 0000:00:01.1: pciehp: Slot(0): Card present
Jul 02 09:07:53 ASUS-A17-Arch kernel: pcieport 0000:00:01.1: pciehp: Slot(0): Link Up
Jul 02 09:07:53 ASUS-A17-Arch systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Jul 02 09:07:53 ASUS-A17-Arch systemd[1]: Failed to start NVIDIA Persistence Daemon.
Jul 02 09:07:53 ASUS-A17-Arch kernel: pci 0000:01:00.0: [10de:1f9d] type 00 class 0x030000 PCIe Legacy Endpoint
Jul 02 09:07:53 ASUS-A17-Arch kernel: pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
Jul 02 09:07:53 ASUS-A17-Arch kernel: pci 0000:01:00.0: BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
Jul 02 09:07:53 ASUS-A17-Arch kernel: pci 0000:01:00.0: BAR 3 [mem 0x00000000-0x01ffffff 64bit pref]
Jul 02 09:07:53 ASUS-A17-Arch kernel: pci 0000:01:00.0: BAR 5 [io  0x0000-0x007f]

I supposed what's happening is that a udev rule for asus-nb-wm tries to start asusd.service and that tries to load the nvidia module
Later on nvidia-persistenced starts and fails and interestingly *that* is also when the GPU shows up on the bus (as very next event!)
=> try to remove asusd?

The system still spends 5s in the initramfs, but that's not nvidia related anymore for sure.

There seems no sleeping attempt, but apparently

Jul 02 09:07:47 archlinux systemd[1]: Starting Resume from hibernation...
Jul 02 09:07:47 archlinux kernel: usb 2-2: New USB device found, idVendor=0bc2, idProduct=2322, bcdDevice= 0.00
Jul 02 09:07:47 archlinux kernel: usb 2-2: New USB device strings: Mfr=2, Product=3, SerialNumber=1
Jul 02 09:07:47 archlinux kernel: usb 2-2: Product: Expansion
Jul 02 09:07:47 archlinux kernel: usb 2-2: Manufacturer: Seagate
Jul 02 09:07:47 archlinux kernel: usb 2-2: SerialNumber: NZ0EF48Z
Jul 02 09:07:47 archlinux kernel: usb-storage 4-1:1.0: USB Mass Storage device detected
Jul 02 09:07:47 archlinux kernel: scsi host0: usb-storage 4-1:1.0
Jul 02 09:07:47 archlinux kernel: usbcore: registered new interface driver usb-storage
Jul 02 09:07:47 archlinux kernel: input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input4
Jul 02 09:07:47 archlinux systemd[1]: systemd-hibernate-resume.service: Deactivated successfully.
Jul 02 09:07:47 archlinux systemd[1]: Finished Resume from hibernation.
Jul 02 09:07:47 archlinux kernel: PM: Image not found (code -22)
…
Jul 02 09:07:51 ASUS-A17-Arch systemd[1]: Clear Stale Hibernate Storage Info was skipped because of an unmet condition check (ConditionPathExists=/sys/firmware/efi/efivars/HibernateLocation-8cf2644b-4b0b-428f-9387-6d876050dc67).
Jul 02 09:07:52 ASUS-A17-Arch systemd[1]: Clear Stale Hibernate Storage Info was skipped because of an unmet condition check (ConditionPathExists=/sys/firmware/efi/efivars/HibernateLocation-8cf2644b-4b0b-428f-9387-6d876050dc67).
Jul 02 09:07:52 ASUS-A17-Arch systemd[1]: Clear Stale Hibernate Storage Info was skipped because of an unmet condition check (ConditionPathExists=/sys/firmware/efi/efivars/HibernateLocation-8cf2644b-4b0b-428f-9387-6d876050dc67).

Did you wake from hibernation here?

Last edited by seth (2025-07-02 14:59:38)

Offline

#20 2025-07-02 15:33:05

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

seth wrote:
NVreg_PreserveVideoMemoryAllocations=1

That should be

nvidia.NVreg_PreserveVideoMemoryAllocations=1

but is likely also default anyway.

I think you are right. The kernel message that says that NVreg_PreserveVideoMemoryAllocations=1 isnt there when I prefix it with a "nvidia.", so thanks.

seth wrote:

=> try to remove asusd?

I went ahead and removed it because I had a feeling that was it now that you've mentioned it. But sadly, there is no change whatsoever. I looked at the journal again and saw that, consistently every time the moment NVIDIA tries to load, systemd-modules-load with amdgpu is always right behind it. So I tried it with my LTS kernel, removing amdgpu from mkinitcpio and regenerating only LTS, cause it wouldn't hurt to try. No change either.

seth wrote:

There seems no sleeping attempt, but apparently

That's just cause I have the kernel parameter set for resume=<UUID> pointing at my swap partition, it always checks that when booting, but no, I didn't hibernate or suspend.

In this previous log here though, I did try to hibernate and suspend, let me resend in case the errors were missed:
https://0x0.st/80BU.txt
They're near the end of this log when I tried to suspend, then hibernate.


"What is the cost of lies?"

Offline

#21 2025-07-02 18:24:32

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

Sleep fails because

Jul 02 05:42:30 ASUS-A17-Arch suspend[642807]: nvidia-suspend.service
Jul 02 05:42:30 ASUS-A17-Arch logger[642807]: <13>Jul  2 05:42:30 suspend: nvidia-suspend.service
Jul 02 05:42:30 ASUS-A17-Arch kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1353
Jul 02 05:42:30 ASUS-A17-Arch kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from status @ kernel_gsp.c:4615
Jul 02 05:42:30 ASUS-A17-Arch kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from kgspCreateRadix3(pGpu, pKernelGsp, &pKernelGsp->pSRRadix3Descriptor, NULL, NULL, gspfwSRMeta.sizeOfSuspendResumeData) @ kernel_gsp_tu102.c:1303

https://wiki.archlinux.org/title/NVIDIA … er_suspend => move nvidia.NVreg_TemporaryFilePath away from any tmpfs ?
If you're not running the session on the nvidia GPU and don't sleep the system while getting fragged on superturboturkeypuncher³, you can also just disable the VRAM preservation.

Offline

#22 2025-07-03 08:47:20

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

seth wrote:

move nvidia.NVreg_TemporaryFilePath away from any tmpfs ?

It's not in a tmpfs for me atm. The default is /var/tmp/, which, if my knowledge is right, isn't a tmpfs. I also checked with "mount | grep tmpfs" and "mount | grep /var/tmp/" and neither showed up, so it's not in a tmpfs already.

seth wrote:

If you're not running the session on the nvidia GPU and don't sleep the system while getting fragged on superturboturkeypuncher³, you can also just disable the VRAM preservation.

I definitely do run my WM on my GPU, and I can actually sleep my system completely fine if I have NOT played any steam game, even if Steam is open.

The moment I do play a steam game though? Even if I close it and then suspend (which I do already)? Welp, doesn't let me. If I even close Steam it will error every time with a kernel warning in "nv.c", and fail to suspend/hibernate. I might try to fully close my WM then sleep my system in a TTY next time to see if anything different happens, cause at that point there is not a single thing running on the NVIDIA gpu.


"What is the cost of lies?"

Offline

#23 2025-07-03 12:08:27

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

Small update. Suspending does work if I fully exit out to the TTY and make sure *nothing* is running on the GPU. If I try to suspend after playing a game then it will fail to suspend. So at least I got a way to suspend/hibernate now, even if it's wacky and makes no sense. Can't suspend until next boot unless I make sure the GPU is not rendering a single thing. hmm


"What is the cost of lies?"

Offline

#24 2025-07-03 17:03:41

seth
Member
Registered: 2012-09-03
Posts: 66,225

Re: [SOLVED] New issues with NVIDIA 575.64

What does nvidia-smi look like before and after steam breaks the suspend (because of the VRAM preservation)?

Offline

#25 2025-07-03 17:21:38

Sync Shard
Member
Registered: 2024-08-17
Posts: 26

Re: [SOLVED] New issues with NVIDIA 575.64

seth wrote:

What does nvidia-smi look like before and after steam breaks the suspend (because of the VRAM preservation)?

Looks completely normal, nothing unusual. To be clear Steam never broke suspension before. I've suspended many times with steam open. Only now it breaks after a game uses the GPU at all, until next boot or I exit out to TTY _then_ suspend.

But nothing is different before and after, with nvidia-smi. The error still happens, kernel warns regarding nv.c, buncha nvidia errors, then it fails to suspend and kicks the system out of suspension.

Earlier I tried to downgrade to 570.153.02 (my previous version that I know worked fine) so I can see if it was linux-firmware-nvidia or not. However I ran into a problem... Mkinitcpio couldn't find the nvidia modules. Had errors regarding:

[2025-07-03T11:27:03-0400] [ALPM-SCRIPTLET] ==> ERROR: module not found: 'nvidia'
[2025-07-03T11:27:03-0400] [ALPM-SCRIPTLET] ==> ERROR: module not found: 'nvidia_modeset'
[2025-07-03T11:27:03-0400] [ALPM-SCRIPTLET] ==> ERROR: module not found: 'nvidia_uvm'
[2025-07-03T11:27:03-0400] [ALPM-SCRIPTLET] ==> ERROR: module not found: 'nvidia_drm'

even though I downgraded _all_ NVIDIA packages at once as intended, and made double sure that they were all the same exact version. So I had to re-upgrade, I did not test booting, cause I quite frankly dont think that thing would've booted. So that's another odd thing. I made doubly sure I was downgrading to a version that worked and made sure I was doing all of them.. and it doesn't now? The only package I didn't change was linux-firmware cause I would've had to delete all firmware packages and reinstall the previous generic linux-firmware package, and I was unsure if that would've been required for just downgrading NVIDIA modules... It wouldn't make much sense if it was required, it's firmware and not the NVIDIA module itself...

Apparently it is? My gut is telling me this is a linux-firmware issue now. But I did originally think it was asusd, then that was wrong. edit2: I also checked the package myself. It wasn't even updated this month in the repo for my specific TU117, so I got no clue now.

EDIT: The suspension failure broke my audio. My pc speakers do not show up. Had to restart my WM to use my headset.

Last edited by Sync Shard (2025-07-03 17:57:59)


"What is the cost of lies?"

Offline

Board footer

Powered by FluxBB