You are not logged in.

#126 2024-09-25 12:22:08

obap74
Member
Registered: 2021-03-18
Posts: 92

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Interesting. I'll try and report back!

Is the systemd-homed.service.d drop-in file override actually required if not using systemd-homed though?

Note that there are also:
- /usr/lib/systemd/system/systemd-hibernate.service.d/10-nvidia-no-freeze-session.conf
- /usr/lib/systemd/system/systemd-hybrid-sleep.service.d/10-nvidia-no-freeze-session.conf
- /usr/lib/systemd/system/systemd-suspend-then-hibernate.service.d/10-nvidia-no-freeze-session.conf

Edit: not working for me unfortunately.  The machine enters suspend state but when resuming, the display remains black. I cannot switch to any TTY.

local/linux 6.10.10.arch1-1
local/linux-headers 6.10.10.arch1-1
local/linux-firmware 20240909.552ed9b8-1
local/nvidia 560.35.03-6
local/nvidia-utils 560.35.03-3
% systemctl list-unit-files | grep nvidia
nvidia-hibernate.service                                                  enabled         disabled
nvidia-persistenced.service                                               disabled        disabled
nvidia-powerd.service                                                     disabled        disabled
nvidia-resume.service                                                     enabled         disabled
nvidia-suspend.service                                                    enabled         disabled
% cat /etc/modprobe.d/nvidia.conf
options nvidia_drm modeset=1
options nvidia_drm fbdev=1

options nvidia \
    NVreg_PreserveVideoMemoryAllocations=1 \
    NVreg_TemporaryFilePath=/var/tmp

Tried with fbdev=0 just to make sure.

Also enabled nvidia modules in initramfs:

% cat /etc/mkinitcpio.conf.d/custom.conf
MODULES=(ext4 nvme nvidia nvidia_uvm nvidia_drm nvidia_modeset)
[...]

Last edited by obap74 (2024-09-25 13:44:30)

Offline

#127 2024-09-25 15:20:59

bertieb
Member
Registered: 2023-11-29
Posts: 15

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

obap74 wrote:

Is the systemd-homed.service.d drop-in file override actually required if not using systemd-homed though?

Not sure- I was assuming they were needed based on this:

bWpdZW8n wrote:

You can remove these two drop-in files, but they will be re-added after updating the `nvidia-utils` packages.

I know this because a few days ago, I upgraded `nvidia-utils` from `560.35.03-2` to `560.35.03-3`.

But I don't appear to be using systemd-homed:

bertieb@zeus:~$ systemctl list-unit-files | grep homed

UNIT FILE                                    STATE           PRESET
systemd-homed-activate.service               disabled        enabled
systemd-homed-firstboot.service              disabled        disabled
systemd-homed.service                        disabled        enabled

Though I admit this is one of the areas (systemd in general) where I am less well-versed.

obap74 wrote:

Note that there are also:
- /usr/lib/systemd/system/systemd-hibernate.service.d/10-nvidia-no-freeze-session.conf
- /usr/lib/systemd/system/systemd-hybrid-sleep.service.d/10-nvidia-no-freeze-session.conf
- /usr/lib/systemd/system/systemd-suspend-then-hibernate.service.d/10-nvidia-no-freeze-session.conf

Edit: not working for me unfortunately.  The machine enters suspend state but when resuming, the display remains black. I cannot switch to any TTY.

I also have those files.

It's a shame that it's not working for you still :-\ The only difference I seem to have are the persistenced and powerd services enabled.

Offline

#128 2024-09-26 02:22:33

bWpdZW8n
Member
Registered: 2024-09-23
Posts: 4

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Hi guys, I'm back. Trying to add more information.

For `nvidia-*.service`,
I enabled `nvidia-suspend.service`, `nvidia-hibernate.service`, `nvidia-resume.service`
and disabled `nvidia-persistenced.service`, `nvidia-powerd.service`.
The last two are up to you, I don't think they will affect the resume.

For `systemd-homed.service`,
now that you mentioned it, I realized I didn't enable it either.
I have tested it. It's okay not to override the drop-in file if you're not using this service.

For these drop-in files added by nvidia in `systemd-hibernate.service.d`,
`systemd-hybrid-sleep.service.d`, `systemd-suspend-then-hibernate.service.d`,
I also have those files, but I didn't notice them nor override them.
It's okay to ignore them if you're not using those services.

Hi @bertieb, for the "console display freezes" issue you mentioned in #125,
maybe you can try to early load the nvidia kernel modules to see if it works.
As stated in this wiki section: https://wiki.archlinux.org/title/NVIDIA#Early_loading

Hi @obap74, I'm sorry this approach didn't work for you.

Last edited by bWpdZW8n (2024-09-26 02:29:48)

Offline

#129 2024-09-26 09:28:54

bertieb
Member
Registered: 2023-11-29
Posts: 15

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

bWpdZW8n wrote:

Hi guys, I'm back. Trying to add more information.

(...snip...)

Hi @bertieb, for the "console display freezes" issue you mentioned in #125,
maybe you can try to early load the nvidia kernel modules to see if it works.
As stated in this wiki section: https://wiki.archlinux.org/title/NVIDIA#Early_loading

Thanks for the info smile I can give that a shot, though I ran into another issue on resume this morning:

Computer woke up, displays were on but showing a black screen. No response on keyboard or mouse, but I could ssh in. Couldn't see what the issue was on my phone (ssh) so I tried rebooting and the computer stopped responding, rejecting further ssh connections presumably due to pending shutdown.

Checking the journal for that boot, there's a bunch of repeated stack traces (uncut journal output from just before and just after suspend: https://0x0.st/XY0i.log) :

journal wrote:

Sep 25 23:10:42 zeus systemd[1]: Reached target Sleep.
Sep 25 23:10:42 zeus systemd[1]: Starting NVIDIA system suspend actions...
Sep 25 23:10:42 zeus suspend[889118]: nvidia-suspend.service
Sep 25 23:10:42 zeus logger[889118]: <13>Sep 25 23:10:42 suspend: nvidia-suspend.service
Sep 25 23:10:42 zeus root[889235]: ACPI group/action undefined: jack/lineout / LINEOUT
Sep 25 23:10:42 zeus root[889237]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Sep 25 23:10:42 zeus root[889239]: ACPI group/action undefined: jack/lineout / LINEOUT
Sep 25 23:10:42 zeus root[889241]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Sep 25 23:10:42 zeus kernel: ------------[ cut here ]------------
Sep 25 23:10:42 zeus kernel: WARNING: CPU: 13 PID: 889191 at include/linux/rwsem.h:80 follow_pte+0x1de/0x200
Sep 25 23:10:42 zeus kernel: Modules linked in: tls dm_snapshot dm_bufio snd_seq_dummy snd_hrtimer snd_seq snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mc hid_logitech_hidpp nct6775 nct6775_core mousedev joydev >
Sep 25 23:10:42 zeus kernel:  ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_mod nvme nvme_core crc32c_intel xhci_pci xhci_pci_renesas nvme_auth
Sep 25 23:10:42 zeus kernel: CPU: 13 PID: 889191 Comm: nvidia-sleep.sh Tainted: P           OE      6.10.10-arch1-1 #1 e28ee6293423e91d57555c4cc06eb839714254b7
Sep 25 23:10:42 zeus kernel: Hardware name: ASUS System Product Name/PRIME B550-PLUS, BIOS 2006 03/19/2021
Sep 25 23:10:42 zeus kernel: RIP: 0010:follow_pte+0x1de/0x200
Sep 25 23:10:42 zeus kernel: Code: fe d9 00 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 b2 f5 ff ff 48 8b 35 6b dd 5c 01 48 81 e6 00 00 00 c0 eb 8d <0f> 0b 48 3b 1f 0f 83 50 fe ff ff bd ea ff ff ff eb b6 49 8b 3c 24
Sep 25 23:10:42 zeus kernel: RSP: 0018:ffffaa6606cb3b40 EFLAGS: 00010246
Sep 25 23:10:42 zeus kernel: RAX: 0000000000000000 RBX: 00007a8096880000 RCX: ffffaa6606cb3b80
Sep 25 23:10:42 zeus kernel: RDX: ffffaa6606cb3b78 RSI: 00007a8096880000 RDI: ffff9e56eef35140
Sep 25 23:10:42 zeus kernel: RBP: ffffaa6606cb3bc0 R08: ffffaa6606cb3d18 R09: 0000000000000000
Sep 25 23:10:42 zeus kernel: R10: 000000000040000b R11: ffffffffc42a6f00 R12: ffffaa6606cb3b80
Sep 25 23:10:42 zeus kernel: R13: ffffaa6606cb3b78 R14: ffff9e5517125800 R15: 0000000000000000
Sep 25 23:10:42 zeus kernel: FS:  00007c6253a16b80(0000) GS:ffff9e598f080000(0000) knlGS:0000000000000000
Sep 25 23:10:42 zeus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 23:10:42 zeus kernel: CR2: 000058fb01b1f048 CR3: 00000006a5e34000 CR4: 0000000000f50ef0
Sep 25 23:10:42 zeus kernel: PKRU: 55555554
Sep 25 23:10:42 zeus kernel: Call Trace:
Sep 25 23:10:42 zeus kernel:  <TASK>
Sep 25 23:10:42 zeus kernel:  ? follow_pte+0x1de/0x200
Sep 25 23:10:42 zeus kernel:  ? __warn.cold+0x8e/0xe8
Sep 25 23:10:42 zeus kernel:  ? follow_pte+0x1de/0x200
Sep 25 23:10:42 zeus kernel:  ? report_bug+0xff/0x140
Sep 25 23:10:42 zeus kernel:  ? handle_bug+0x3c/0x80
Sep 25 23:10:42 zeus kernel:  ? exc_invalid_op+0x17/0x70
Sep 25 23:10:42 zeus kernel:  ? asm_exc_invalid_op+0x1a/0x20
Sep 25 23:10:42 zeus kernel:  ? follow_pte+0x1de/0x200
Sep 25 23:10:42 zeus kernel:  follow_phys+0x49/0x110
Sep 25 23:10:42 zeus kernel:  untrack_pfn+0x55/0x120
Sep 25 23:10:42 zeus kernel:  unmap_single_vma+0xa6/0xe0
Sep 25 23:10:42 zeus kernel:  zap_page_range_single+0x122/0x1d0
Sep 25 23:10:42 zeus kernel:  unmap_mapping_range+0x116/0x140
Sep 25 23:10:42 zeus kernel:  nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 33bfc7aedb858e5dd8bc7c25b18773a9fcaf76cb]
Sep 25 23:10:42 zeus kernel:  nv_set_system_power_state+0x1cd/0x470 [nvidia 33bfc7aedb858e5dd8bc7c25b18773a9fcaf76cb]
Sep 25 23:10:42 zeus kernel:  nv_procfs_write_suspend+0xef/0x170 [nvidia 33bfc7aedb858e5dd8bc7c25b18773a9fcaf76cb]
Sep 25 23:10:42 zeus kernel:  proc_reg_write+0x5d/0xa0
Sep 25 23:10:42 zeus kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Sep 25 23:10:42 zeus kernel:  vfs_write+0xf8/0x460
Sep 25 23:10:42 zeus kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Sep 25 23:10:42 zeus kernel:  ? __count_memcg_events+0x58/0xf0
Sep 25 23:10:42 zeus kernel:  ksys_write+0x6d/0xf0
Sep 25 23:10:42 zeus kernel:  do_syscall_64+0x82/0x190
Sep 25 23:10:42 zeus kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Sep 25 23:10:42 zeus kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Sep 25 23:10:42 zeus kernel: RIP: 0033:0x7c6253b937a4
Sep 25 23:10:42 zeus kernel: Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 28 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
Sep 25 23:10:42 zeus kernel: RSP: 002b:00007fff768c8308 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Sep 25 23:10:42 zeus kernel: RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007c6253b937a4
Sep 25 23:10:42 zeus kernel: RDX: 0000000000000008 RSI: 000058fb01b1ec40 RDI: 0000000000000001
Sep 25 23:10:42 zeus kernel: RBP: 00007fff768c8330 R08: 0000000000000410 R09: 0000000000000001
Sep 25 23:10:42 zeus kernel: R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000008
Sep 25 23:10:42 zeus kernel: R13: 000058fb01b1ec40 R14: 00007c6253c6f5c0 R15: 00007c6253c6cea0
Sep 25 23:10:42 zeus kernel:  </TASK>
Sep 25 23:10:42 zeus kernel: ---[ end trace 0000000000000000 ]---

These repeat multiple times.

After resume, there's a page fault / oops followed by other oopses:

journal wrote:

Sep 26 09:46:12 zeus kernel: BUG: unable to handle page fault for address: ffffaa6604fcec64
Sep 26 09:46:12 zeus kernel: #PF: supervisor read access in kernel mode
Sep 26 09:46:12 zeus kernel: #PF: error_code(0x0000) - not-present page
Sep 26 09:46:12 zeus kernel: PGD 100000067 P4D 100000067 PUD 100233067 PMD 20c7c3067 PTE 0
Sep 26 09:46:12 zeus kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Sep 26 09:46:12 zeus kernel: CPU: 0 PID: 586 Comm: irq/92-nvidia Tainted: P        W  OE      6.10.10-arch1-1 #1 e28ee6293423e91d57555c4cc06eb839714254b7
Sep 26 09:46:12 zeus kernel: Hardware name: ASUS System Product Name/PRIME B550-PLUS, BIOS 2006 03/19/2021
Sep 26 09:46:12 zeus kernel: RIP: 0010:_nv012662rm+0xbd/0x130 [nvidia]
Sep 26 09:46:12 zeus kernel: Code: 8b 45 20 41 bf 01 00 00 00 41 89 54 24 20 41 89 44 24 24 4c 89 e6 4c 89 ef e8 bf 2f 6b 00 49 89 c4 48 85 c0 74 5f 49 8b 0c 24 <8b> 41 04 0f ae e8 41 39 44 24 20 74 dc 8b 41 08 0f b7 d8 25 00 00
Sep 26 09:46:12 zeus kernel: RSP: 0018:ffffaa6606effcc0 EFLAGS: 00010286
Sep 26 09:46:12 zeus kernel: RAX: ffff9e53c6b874d8 RBX: 0000000000000001 RCX: ffffaa6604fcec60
Sep 26 09:46:12 zeus kernel: RDX: fffffffffffffff0 RSI: ffff9e529d8ba008 RDI: ffff9e529d8ba900
Sep 26 09:46:12 zeus kernel: RBP: ffff9e52979a2bf0 R08: 0000000000000000 R09: 0000000000000020
Sep 26 09:46:12 zeus kernel: R10: ffff9e52979a2c34 R11: ffffffffc0817220 R12: ffff9e53c6b874d8
Sep 26 09:46:12 zeus kernel: R13: ffff9e529d8ba900 R14: ffff9e529d8ba008 R15: 0000000000000000
Sep 26 09:46:12 zeus kernel: FS:  0000000000000000(0000) GS:ffff9e598ea00000(0000) knlGS:0000000000000000
Sep 26 09:46:12 zeus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 26 09:46:12 zeus kernel: CR2: ffffaa6604fcec64 CR3: 00000003fa820000 CR4: 0000000000f50ef0
Sep 26 09:46:12 zeus kernel: PKRU: 55555554
Sep 26 09:46:12 zeus kernel: Call Trace:
Sep 26 09:46:12 zeus kernel:  <TASK>
Sep 26 09:46:12 zeus kernel:  ? __die_body.cold+0x19/0x27
(...snip...)
Sep 26 09:46:12 zeus kernel: note: irq/92-nvidia[586] exited with irqs disabled
Sep 26 09:46:12 zeus kernel: BUG: kernel NULL pointer dereference, address: 000000000000032c
Sep 26 09:46:12 zeus kernel: #PF: supervisor read access in kernel mode
Sep 26 09:46:12 zeus kernel: #PF: error_code(0x0000) - not-present page
Sep 26 09:46:12 zeus kernel: PGD 0 P4D 0
(...snip...)
Sep 26 09:46:12 zeus kernel: note: irq/92-nvidia[586] exited with irqs disabled
Sep 26 09:46:12 zeus kernel: Fixing recursive fault but reboot is needed!
Sep 26 09:46:12 zeus kernel: BUG: scheduling while atomic: irq/92-nvidia/586/0x00000000

nvidia-sleep.sh is a fairly simple shell script in /usr/bin/ that tries to do some simple save/restore of X and uses the /proc/driver/nvidia/suspend interface. References to it are found in cases with similar-looking problems:

- Nvidia driver 550.76-1 fails suspending; results in black screen unresponsiveness
- Multiple kernel oopses before suspending caused by nvidia-sleep.sh, Linux 6.10 regression? WARNING: CPU: PID: at include/linux/rwsem.h:80 follow_pte
- "NVIDIA Suspend fix"

The second link seems to be the same issue- the response is:

amrits wrote:

Acknowledged your latest test result.
Engineering team is already working upon it.

On 23rd August. Meanwhile the OP of that thread has reverted to kernel 6.9.12.

Last edited by bertieb (2024-09-26 09:29:26)

Offline

#130 2024-09-26 23:30:20

bWpdZW8n
Member
Registered: 2024-09-23
Posts: 4

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Hi @bertieb,

I keep receiving the first error during the suspend, but not every time, and I can still resume successfully.

I've received the second error before. I haven't received it after I could resume successfully.
Edit: I think the second error is related to xorg. I got this error when I ran gnome with xorg. I don't get this error when I run gnome with wayland.

I don't know how to solve them.

Here is an nvidia document called "Configuring Power Management Support".
It talks about "reserving all video memory allocations", "systemd configuration", etc.,
including `nvidia-sleep.sh` and `/proc/driver/nvidia/suspend`.
https://download.nvidia.com/XFree86/Lin … ement.html

Maybe you could try disabling `nvidia-persistenced.service` and `nvidia-powerd.service` and see if that helps.

Last edited by bWpdZW8n (2024-10-21 00:32:00)

Offline

#131 2024-09-30 22:49:32

bWpdZW8n
Member
Registered: 2024-09-23
Posts: 4

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

I met a resume failure today.

I pressed the power button to resume, but the resume failed, some errors were printed on the screen,
couldn't switch to tty using `ctrl+alt+f*`, I long pressed the power button to force shutdown.

I searched and found similar failure reports on the nvidia developer forums:
https://forums.developer.nvidia.com/t/n … -10/276166

The output is slightly different,
in my case I only received:
`[drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002600] Failed to map NvKmsKapiMemory 0x00000000...`,
but it was printed several times with different suffixes `0x00000000...`.

However, I can't find these error messages using `journalctl`.
(But I noticed that I've received these messages before.)

I took the advice from that link and enabled and started `nvidia-persistenced.service`.
Edit: not working, met the same problem two days later. Maybe related to wayland, vlc and nvidia.

Then I suspended and resumed, the resume succeeded.

Through journalctl, I found that this time I got an error message during the resume process:
`archlinux kernel: [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002600] Failed to map NvKmsKapiMemory 0x000000005348fcd7`.
(I tried again and received it again.)

Additionally, I still get the `Comm: nvidia-sleep.sh Tainted: P` error during the suspend.

Last edited by bWpdZW8n (2024-10-03 02:19:15)

Offline

#132 2024-10-01 06:25:24

seth
Member
Registered: 2012-09-03
Posts: 59,897

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Additionally, I still get the `Comm: nvidia-sleep.sh Tainted: P` error during the suspend.

Something relateed to the process generated a kernel warning, oops or module crash.
"Tainted: P" is normal and irrelevant since you've the nvidia module loaded, https://docs.kernel.org/admin-guide/tai … rnels.html

Offline

#133 2024-10-02 13:41:38

obap74
Member
Registered: 2021-03-18
Posts: 92

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

I've been able to resume from suspend just fine twelve times in a row (no reboot in between) over the last six days. Then, today after a reboot: six times in a row.

Setup:

- Gentoo (I know, not Arch, but hopefully this could help solving the issue on Arch)
- sys-kernel/gentoo-sources-6.6.52
- x11-drivers/nvidia-drivers-550.107.02-r1
- nvidia-{resume,suspend,hibernate}.service units enabled

- /etc/modprobe.d/nvidia.conf content:

# NVIDIA drivers options
# See /usr/share/doc/nvidia-drivers-*/README.txt* for more information.

# nvidia-drivers and nouveau cannot be used at same time.
# Comment out the following line if you wish to allow nouveau.
blacklist nouveau

# Kernel Mode Setting (notably needed for fbdev and wayland).
# Enabling may possibly cause issues with SLI and Reverse PRIME.
options nvidia-drm modeset=1

# Enable experimental framebuffer console support (requires modeset=1 above).
# Replaces efifb, simpledrm, or similar once loaded (emphasis on being
# experimental, "may" cause issues X mode switching, sleep, or more).
options nvidia-drm fbdev=1

# Suspend options. Note that Allocations=1 requires suspend hooks currently
# only used when either systemd or elogind is used to suspend. If using
# neither or have issues, try Allocations=0 (revert if it does not help
# as =0 is not recommended).
options nvidia \
    NVreg_PreserveVideoMemoryAllocations=1 \
    NVreg_TemporaryFilePath=/var/tmp

# !!! Security Warning !!!
# Do not change the DeviceFile options unless you know what you are doing.
# Only add trusted users to the 'video' group, these users may be able to
# crash, compromise, or irreparably damage the machine.
options nvidia \
    NVreg_DeviceFileGID=27 \
    NVreg_DeviceFileMode=432 \
    NVreg_DeviceFileUID=0 \
    NVreg_ModifyDeviceFiles=1

# Should be no need to touch anything below.
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia

- /etc/dracut.conf.d/options.conf content:

add_drivers+=" nvidia nvidia-drm nvidia_modeset "

I also ditched physlock in favor of betterlockscreen since the former doesn't seem to play well with PreserveVideoMemoryAllocations=1.

Quite encouraging :
- twelve times and six times with a reboot in between is quite a lot of suspend/resume, it seems reliable
- there were both short and long periods of time (overnight) between the different suspend for the first six days - from experience, I noticed that resuming issues were most likely to occur after several hours.
-  This is withPreserveVideoMemoryAllocations enabled (original issue)

Regarding Arch now, I did other tests trying to replicate the working Gentoo config.

- 6.6.52-1-lts + nvidia-lts 1:560.35.03-7
- 6.11.1-arch1-1 + nvidia 560.35.03-9
- nvidia-{resume,suspend,hibernate}.service units also enabled
- also switched from physlock to betterlockscreen
- /etc/modprobe.d/nvidia.conf content

options nvidia_drm modeset=1
options nvidia_drm fbdev=1
options nvidia \
    NVreg_PreserveVideoMemoryAllocations=1 \
    NVreg_TemporaryFilePath=/var/tmp

/etc/mkinitcpio.conf.d/custom.conf modules:

MODULES=(ext4 nvme nvidia nvidia_drm nvidia_modeset nvidia_uvm)

It's not a 1:1 setup unfortunately, I'm aware many things can differ between both distros, especially kernel settings. I still have to try 6.6.52 + NVIDIA 560.35.03 on Gentoo (will report back when done).

Unfortunately, it's still not working on Arch for me. I'm getting the same kernel errors than @bertieb after resuming from suspend. The only difference is that I can see some portions of the errors (no black screen). I cannot switch to any TTY at this point, forced to reboot.

- with 6.6.52-1-lts

Oct 02 11:32:39 archlinux-desktop kernel: PM: suspend exit
Oct 02 11:32:39 archlinux-desktop systemd-sleep[7752]: System returned from sleep operation 'suspend'.
Oct 02 11:32:39 archlinux-desktop kernel: BUG: unable to handle page fault for address: ffffc90007fcec04
Oct 02 11:32:39 archlinux-desktop kernel: #PF: supervisor read access in kernel mode
Oct 02 11:32:39 archlinux-desktop kernel: #PF: error_code(0x0000) - not-present page
Oct 02 11:32:39 archlinux-desktop kernel: PGD 100000067 P4D 100000067 PUD 10020d067 PMD 0
Oct 02 11:32:39 archlinux-desktop kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Oct 02 11:32:39 archlinux-desktop kernel: CPU: 0 PID: 216 Comm: irq/50-nvidia Tainted: P           OE      6.6.52-1-lts #1 bbac716beed94ca2c106085e63a4316e001912e7
Oct 02 11:32:39 archlinux-desktop kernel: Hardware name: MSI MS-7885/X99S SLI PLUS (MS-7885), BIOS 1.F2 06/13/2019
Oct 02 11:32:39 archlinux-desktop kernel: RIP: 0010:_nv012662rm+0xbd/0x130 [nvidia]
Oct 02 11:32:39 archlinux-desktop kernel: Code: 8b 45 20 41 bf 01 00 00 00 41 89 54 24 20 41 89 44 24 24 4c 89 e6 4c 89 ef e8 bf 2f 6b 00 49 89 c4 48 85 c0 7
4 5f 49 8b 0c 24 <8b> 41 04 0f ae e8 41 39 44 24 20 74 dc 8b 41 08 0f b7 d8 25 00 00
Oct 02 11:32:39 archlinux-desktop kernel: RSP: 0000:ffffc9000047bd08 EFLAGS: 00010282
Oct 02 11:32:39 archlinux-desktop kernel: RAX: ffff88810b88ce98 RBX: 0000000000000001 RCX: ffffc90007fcec00
Oct 02 11:32:39 archlinux-desktop kernel: RDX: fffffffffffffff0 RSI: ffff88811b204008 RDI: ffff88811b204900
Oct 02 11:32:39 archlinux-desktop kernel: RBP: ffff88811b2cabf0 R08: 0000000000000000 R09: 0000000000000020
Oct 02 11:32:39 archlinux-desktop kernel: R10: ffff88811b2cac34 R11: ffffffffc07546a0 R12: ffff88810b88ce98
Oct 02 11:32:39 archlinux-desktop kernel: R13: ffff88811b204900 R14: ffff88811b204008 R15: 0000000000000000
Oct 02 11:32:39 archlinux-desktop kernel: FS:  0000000000000000(0000) GS:ffff88889fa00000(0000) knlGS:0000000000000000
Oct 02 11:32:39 archlinux-desktop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 02 11:32:39 archlinux-desktop kernel: CR2: ffffc90007fcec04 CR3: 0000000107c4e002 CR4: 00000000001706f0
Oct 02 11:32:39 archlinux-desktop kernel: Call Trace:
Oct 02 11:32:39 archlinux-desktop kernel:  <TASK>
Oct 02 11:32:39 archlinux-desktop kernel:  ? __die+0x23/0x70
Oct 02 11:32:39 archlinux-desktop kernel:  ? page_fault_oops+0x174/0x530
Oct 02 11:32:39 archlinux-desktop kernel:  ? _nv012662rm+0xbd/0x130 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  ? search_module_extables+0x19/0x60
Oct 02 11:32:39 archlinux-desktop kernel:  ? search_bpf_extables+0x5f/0x80
Oct 02 11:32:39 archlinux-desktop kernel:  ? exc_page_fault+0x175/0x180
Oct 02 11:32:39 archlinux-desktop kernel:  ? asm_exc_page_fault+0x26/0x30
Oct 02 11:32:39 archlinux-desktop kernel:  ? _nv012663rm+0x1f0/0x1f0 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  ? _nv012662rm+0xbd/0x130 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  ? _nv012662rm+0x42/0x130 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  _nv036737rm+0x19e/0x2f0 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  _nv031308rm+0x24/0xc0 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  _nv022578rm+0x295/0x3c4 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  _nv033274rm+0x63/0xc0 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  _nv012814rm+0x276/0x3d0 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  _nv033284rm+0x167/0x1d0 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  _nv000746rm+0x113/0x148 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  ? __pfx_irq_thread_fn+0x10/0x10
Oct 02 11:32:39 archlinux-desktop kernel:  rm_isr_bh+0x20/0x5c [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  nvidia_isr_kthread_bh+0x1f/0x50 [nvidia 9db920cb97dd0ea90cc741c121d19cfd01f99b79]
Oct 02 11:32:39 archlinux-desktop kernel:  irq_thread_fn+0x23/0x60
Oct 02 11:32:39 archlinux-desktop kernel:  irq_thread+0xfc/0x1c0
Oct 02 11:32:39 archlinux-desktop kernel:  ? __pfx_irq_thread_dtor+0x10/0x10
Oct 02 11:32:39 archlinux-desktop kernel:  ? __pfx_irq_thread+0x10/0x10
Oct 02 11:32:39 archlinux-desktop kernel:  kthread+0xe8/0x120
Oct 02 11:32:39 archlinux-desktop kernel:  ? __pfx_kthread+0x10/0x10
Oct 02 11:32:39 archlinux-desktop kernel:  ret_from_fork+0x34/0x50
Oct 02 11:32:39 archlinux-desktop kernel:  ? __pfx_kthread+0x10/0x10
Oct 02 11:32:39 archlinux-desktop kernel:  ret_from_fork_asm+0x1b/0x30
Oct 02 11:32:39 archlinux-desktop kernel:  </TASK>
Oct 02 11:32:39 archlinux-desktop kernel: Modules linked in: libcrc32c bridge stp llc rfkill intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi vfat irqbypass fat snd_hda_intel rapl snd_usb_audio snd_intel_dspcfg intel_cstate snd_intel_sdw_acpi snd_usbmidi_lib iTCO_wdt snd_hda_codec snd_ump intel_pmc_bxt snd_rawmidi iTCO_vendor_support snd_hda_core snd_seq_device mc snd_hwdep snd_pcm intel_uncore mxm_wmi snd_timer pcspkr i2c_i801
Oct 02 11:32:39 archlinux-desktop kernel:  i2c_smbus mei_me e1000e snd lpc_ich mei soundcore joydev mousedev mac_hid sg loop fuse nfnetlink bpf_preload ip_tables x_tables usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd xhci_pci xhci_pci_renesas nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) video wmi nvidia(POE) nvme nvme_core nvme_common ext4 crc32c_generic crc32c_intel crc16 mbcache jbd2
Oct 02 11:32:39 archlinux-desktop kernel: CR2: ffffc90007fcec04
Oct 02 11:32:39 archlinux-desktop kernel: ---[ end trace 0000000000000000 ]---

- with 6.11.1-arch1-1

Oct 02 11:22:19 archlinux-desktop kernel: ------------[ cut here ]------------
Oct 02 11:22:19 archlinux-desktop kernel: WARNING: CPU: 0 PID: 14316 at include/linux/rwsem.h:80 follow_pte+0x1de/0x200
Oct 02 11:22:19 archlinux-desktop kernel: Modules linked in: libcrc32c bridge stp llc rfkill vfat fat intel_rapl_msr int
el_rapl_common x86_pkg_temp_thermal snd_hda_codec_realtek intel_powerclamp coretemp snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_usb
_audio kvm_intel snd_hda_intel snd_usbmidi_lib snd_intel_dspcfg snd_intel_sdw_acpi snd_ump kvm snd_rawmidi snd_hda_codec snd_seq_device snd_hda_core iTCO_wdt
 mc intel_pmc_bxt snd_hwdep iTCO_vendor_support i2c_i801 snd_pcm rapl i2c_smbus intel_cstate snd_timer mxm_wmi e1000e
Oct 02 11:22:19 archlinux-desktop kernel:  intel_uncore pcspkr i2c_mux mei_me snd ptp lpc_ich pps_core mei soundcore joydev mousedev mac_hid sg loop nfnetlin
k ip_tables x_tables dm_crypt hid_generic usbhid cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_gen
eric ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd xhci_pci xhci_pci_renesas nvidia_uvm(POE) nvidia_drm(PO
E) drm_ttm_helper ttm nvidia_modeset(POE) video wmi nvidia(POE) nvme nvme_core nvme_auth ext4 crc32c_generic crc32c_intel crc16 mbcache jbd2
Oct 02 11:22:19 archlinux-desktop kernel: CPU: 0 UID: 0 PID: 14316 Comm: nvidia-sleep.sh Tainted: P           OE      6.11.1-arch1-1 #1 5d189185854f2d35e5fd15c112844f59d15980c4
Oct 02 11:22:19 archlinux-desktop kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Oct 02 11:22:19 archlinux-desktop kernel: Hardware name: MSI MS-7885/X99S SLI PLUS (MS-7885), BIOS 1.F2 06/13/2019
Oct 02 11:22:19 archlinux-desktop kernel: RIP: 0010:follow_pte+0x1de/0x200
Oct 02 11:22:19 archlinux-desktop kernel: Code: cc cc cc 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 42 f1 ff ff 48 8b 35 bb ae 5c 01 48 81 e6 00 00 00 c0 eb 8d <0f> 0b 48 3b 1f 0f 83 50 fe ff ff bd ea ff ff ff eb b6 49 8b 3c 24
Oct 02 11:22:19 archlinux-desktop kernel: RSP: 0018:ffffaa21c0fc7a20 EFLAGS: 00010246
Oct 02 11:22:19 archlinux-desktop kernel: RAX: 0000000000000000 RBX: 00007f5c7e0f1000 RCX: ffffaa21c0fc7a60
Oct 02 11:22:19 archlinux-desktop kernel: RDX: ffffaa21c0fc7a58 RSI: 00007f5c7e0f1000 RDI: ffff962ceeef0508
Oct 02 11:22:19 archlinux-desktop kernel: RBP: ffffaa21c0fc7aa0 R08: ffffaa21c0fc7bf8 R09: 0000000000000000
Oct 02 11:22:19 archlinux-desktop kernel: R10: 0000000000000001 R11: 0000000000000003 R12: ffffaa21c0fc7a60
Oct 02 11:22:19 archlinux-desktop kernel: R13: ffffaa21c0fc7a58 R14: ffff962c06c70b00 R15: 0000000000000000
Oct 02 11:22:19 archlinux-desktop kernel: FS:  0000760ded5b4b80(0000) GS:ffff96339f800000(0000) knlGS:0000000000000000
Oct 02 11:22:19 archlinux-desktop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 02 11:22:19 archlinux-desktop kernel: CR2: 00005fcc92899208 CR3: 00000001eb3ca002 CR4: 00000000001706f0
Oct 02 11:22:19 archlinux-desktop kernel: Call Trace:
Oct 02 11:22:19 archlinux-desktop kernel:  <TASK>
Oct 02 11:22:19 archlinux-desktop kernel:  ? follow_pte+0x1de/0x200
Oct 02 11:22:19 archlinux-desktop kernel:  ? __warn.cold+0x8e/0xe8
Oct 02 11:22:19 archlinux-desktop kernel:  ? follow_pte+0x1de/0x200
Oct 02 11:22:19 archlinux-desktop kernel:  ? report_bug+0xff/0x140
Oct 02 11:22:19 archlinux-desktop kernel:  ? handle_bug+0x3c/0x80
Oct 02 11:22:19 archlinux-desktop kernel:  ? exc_invalid_op+0x17/0x70
Oct 02 11:22:19 archlinux-desktop kernel:  ? asm_exc_invalid_op+0x1a/0x20
Oct 02 11:22:19 archlinux-desktop kernel:  ? follow_pte+0x1de/0x200
Oct 02 11:22:19 archlinux-desktop kernel:  follow_phys+0x49/0x110
Oct 02 11:22:19 archlinux-desktop kernel:  untrack_pfn+0x55/0x120
Oct 02 11:22:19 archlinux-desktop kernel:  unmap_single_vma+0xa6/0xe0
Oct 02 11:22:19 archlinux-desktop kernel:  zap_page_range_single+0x122/0x1d0
Oct 02 11:22:19 archlinux-desktop kernel:  unmap_mapping_range+0x116/0x140
Oct 02 11:22:19 archlinux-desktop kernel:  nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1399d3f830003da5be868e801cc730564cdf009c]
Oct 02 11:22:19 archlinux-desktop kernel:  nv_set_system_power_state+0x1cd/0x470 [nvidia 1399d3f830003da5be868e801cc730564cdf009c]
Oct 02 11:22:19 archlinux-desktop kernel:  nv_procfs_write_suspend+0xef/0x170 [nvidia 1399d3f830003da5be868e801cc730564cdf009c]
Oct 02 11:22:19 archlinux-desktop kernel:  proc_reg_write+0x5d/0xa0
Oct 02 11:22:19 archlinux-desktop kernel:  vfs_write+0xf8/0x460
Oct 02 11:22:19 archlinux-desktop kernel:  ? __mod_memcg_lruvec_state+0xa0/0x150
Oct 02 11:22:19 archlinux-desktop kernel:  ? __lruvec_stat_mod_folio+0x83/0xd0
Oct 02 11:22:19 archlinux-desktop kernel:  ? set_ptes.isra.0+0x41/0x90
Oct 02 11:22:19 archlinux-desktop kernel:  ksys_write+0x6d/0xf0
Oct 02 11:22:19 archlinux-desktop kernel:  do_syscall_64+0x82/0x190
Oct 02 11:22:19 archlinux-desktop kernel:  ? __count_memcg_events+0x58/0xf0
Oct 02 11:22:19 archlinux-desktop kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
Oct 02 11:22:19 archlinux-desktop kernel:  ? handle_mm_fault+0x1bb/0x2c0
Oct 02 11:22:19 archlinux-desktop kernel:  ? do_user_addr_fault+0x36c/0x620
Oct 02 11:22:19 archlinux-desktop kernel:  ? exc_page_fault+0x81/0x190
Oct 02 11:22:19 archlinux-desktop kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Oct 02 11:22:19 archlinux-desktop kernel: RIP: 0033:0x760ded7317a4
Oct 02 11:22:19 archlinux-desktop kernel: Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 28 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
Oct 02 11:22:19 archlinux-desktop kernel: RSP: 002b:00007ffd8dc5c7f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Oct 02 11:22:19 archlinux-desktop kernel: RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 0000760ded7317a4
Oct 02 11:22:19 archlinux-desktop kernel: RDX: 0000000000000008 RSI: 00005fcc92898e00 RDI: 0000000000000001
Oct 02 11:22:19 archlinux-desktop kernel: RBP: 00007ffd8dc5c820 R08: 0000000000000410 R09: 0000000000000001
Oct 02 11:22:19 archlinux-desktop kernel: R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000008
Oct 02 11:22:19 archlinux-desktop kernel: R13: 00005fcc92898e00 R14: 0000760ded80d5c0 R15: 0000760ded80aea0
Oct 02 11:22:19 archlinux-desktop kernel:  </TASK>
Oct 02 11:22:19 archlinux-desktop kernel: ---[ end trace 0000000000000000 ]---
Oct 02 11:22:19 archlinux-desktop kernel: ------------[ cut here ]------------

Could these few differences in /etc/modprobe.d/nvidia.conf have an impact? On Gentoo, x11-drivers/nvidia-drivers comes with a default config file, I haven't tried adding these lines to the Arch one:

 !!! Security Warning !!!
# Do not change the DeviceFile options unless you know what you are doing.
# Only add trusted users to the 'video' group, these users may be able to
# crash, compromise, or irreparably damage the machine.
options nvidia \
    NVreg_DeviceFileGID=27 \
    NVreg_DeviceFileMode=432 \
    NVreg_DeviceFileUID=0 \
    NVreg_ModifyDeviceFiles=1

# Should be no need to touch anything below.
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia

Offline

#134 2024-10-02 14:12:53

seth
Member
Registered: 2012-09-03
Posts: 59,897

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Looks like gentoo is adding a static GID 27 (for video), but "NVreg_DeviceFileMode=432" looks whacko and the example at https://wiki.gentoo.org/wiki/NVIDIA/nvi … re_enabled has it at 660 (which makes way more sense)

- x11-drivers/nvidia-drivers-550.107.02-r1

but you're running the 560xx drivers on arch, aren't you?
https://aur.archlinux.org/packages/nvidia-535xx-dkms is pretty much the last driver before nvidia went into a malstrøm of problems…

Offline

#135 2024-10-02 15:38:11

obap74
Member
Registered: 2021-03-18
Posts: 92

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

seth wrote:

Looks like gentoo is adding a static GID 27 (for video)

Yes:

% rg "video" /etc/group
20:video:x:27:obap
seth wrote:

but "NVreg_DeviceFileMode=432" looks whacko and the example at https://wiki.gentoo.org/wiki/NVIDIA/nvi … re_enabled has it at 660 (which makes way more sense)

Indeed. #Kernel_module_parameters section says the default is "Undefined", yet the default value in /etc/modprobe.d/nvidia.conf when installing x11-drivers/nvidia-drivers is 432.
What is "DeviceFile" exactly anyway? I guess Arch isn't setting this at all since there is no /etc/modprobe.d/nvidia.conf file installed by NVIDIA packages? Would different permissions have any impact on suspend/resume?

seth wrote:

- x11-drivers/nvidia-drivers-550.107.02-r1

but you're running the 560xx drivers on arch, aren't you?
https://aur.archlinux.org/packages/nvidia-535xx-dkms is pretty much the last driver before nvidia went into a malstrøm of problems…

Yes, as I said I still have to try 6.6.52 + 560 on my Gentoo install. For now I've updated to (~)550.120 (is testing on Gentoo): suspend/resume still works fine. I successfully resumed 6 times in a row.  Will try 560.35.03 soon, this will be a more realistic comparison with Arch...

Last edited by obap74 (2024-10-02 15:42:09)

Offline

#136 2024-10-03 09:11:23

obap74
Member
Registered: 2021-03-18
Posts: 92

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

I've just tried 6.6.52 + 560.35.03 on my Gentoo install. Resuming failed at first attempt.
It behaves exactly like on Arch with 6.6.52-1-lts and nvidia-lts 1:560.35.03-7: when resuming, the kernel trace errors are displayed and the machine is completely unresponsive.

% journalctl -b -1
[...]
Oct 03 08:42:27 gentoo-desktop kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  560.35.03  Fri Aug 16 21:39:15 UTC 2024
[...]
Oct 03 08:50:07 gentoo-desktop systemd-sleep[11005]: System returned from sleep operation 'suspend'.
Oct 03 08:50:07 gentoo-desktop kernel: BUG: unable to handle page fault for address: ffffc90006fcec04
Oct 03 08:50:07 gentoo-desktop kernel: #PF: supervisor read access in kernel mode
Oct 03 08:50:07 gentoo-desktop kernel: #PF: error_code(0x0000) - not-present page
Oct 03 08:50:07 gentoo-desktop kernel: PGD 100000067 P4D 100000067 PUD 10021c067 PMD 0
Oct 03 08:50:07 gentoo-desktop kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Oct 03 08:50:07 gentoo-desktop kernel: CPU: 0 PID: 518 Comm: irq/59-nvidia Tainted: P           O    T  6.6.52-gentoo-custom #3
Oct 03 08:50:07 gentoo-desktop kernel: Hardware name: MSI MS-7885/X99S SLI PLUS (MS-7885), BIOS 1.F2 06/13/2019
Oct 03 08:50:07 gentoo-desktop kernel: RIP: 0010:_nv035574rm+0x4cd/0x540 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel: Code: 8b 45 20 41 bf 01 00 00 00 41 89 54 24 20 41 89 44 24 24 4c 89 e6 4c 89 ef e8 bf 2f 6b 00 49 89 c4 48 85 c0 74 5f 49 8b 0c 24 <8b> 41 04 0f ae e8 41 39 44 24 20 74 dc 8b 41 08 0f b7 d8 25 00 00
Oct 03 08:50:07 gentoo-desktop kernel: RSP: 0018:ffffc900004d7d10 EFLAGS: 00010282
Oct 03 08:50:07 gentoo-desktop kernel: RAX: ffff88810c1e2a58 RBX: 0000000000000001 RCX: ffffc90006fcec00
Oct 03 08:50:07 gentoo-desktop kernel: RDX: fffffffffffffff0 RSI: ffff88811a67b008 RDI: ffff88811a67b900
Oct 03 08:50:07 gentoo-desktop kernel: RBP: ffff888103ef5bf0 R08: 0000000000000000 R09: 0000000000000020
Oct 03 08:50:07 gentoo-desktop kernel: R10: ffff888103ef5c34 R11: ffffffffc06c1f20 R12: ffff88810c1e2a58
Oct 03 08:50:07 gentoo-desktop kernel: R13: ffff88811a67b900 R14: ffff88811a67b008 R15: 0000000000000000
Oct 03 08:50:07 gentoo-desktop kernel: FS:  0000000000000000(0000) GS:ffff88889fa00000(0000) knlGS:0000000000000000
Oct 03 08:50:07 gentoo-desktop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 03 08:50:07 gentoo-desktop kernel: CR2: ffffc90006fcec04 CR3: 00000003bb82a003 CR4: 00000000001706f0
Oct 03 08:50:07 gentoo-desktop kernel: Call Trace:
Oct 03 08:50:07 gentoo-desktop kernel:  <TASK>
Oct 03 08:50:07 gentoo-desktop kernel:  ? __die+0x1f/0x70
Oct 03 08:50:07 gentoo-desktop kernel:  ? page_fault_oops+0x141/0x440
Oct 03 08:50:07 gentoo-desktop kernel:  ? search_module_extables+0x33/0x60
Oct 03 08:50:07 gentoo-desktop kernel:  ? search_bpf_extables+0x5b/0x80
Oct 03 08:50:07 gentoo-desktop kernel:  ? fixup_exception+0x22/0x310
Oct 03 08:50:07 gentoo-desktop kernel:  ? exc_page_fault+0x147/0x150
Oct 03 08:50:07 gentoo-desktop kernel:  ? asm_exc_page_fault+0x22/0x30
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv035574rm+0x410/0x540 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv035574rm+0x4cd/0x540 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv036737rm+0x19e/0x330 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv031308rm+0x24/0xc0 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv022578rm+0x295/0x430 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv033274rm+0x63/0x490 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv033274rm+0x336/0x490 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv033284rm+0x167/0x1d0 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? _nv023454rm+0x1d3/0xaa0 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? __pfx_irq_thread_fn+0x10/0x10
Oct 03 08:50:07 gentoo-desktop kernel:  ? rm_isr_bh+0x20/0x70 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? nvidia_isr_kthread_bh+0x1b/0x870 [nvidia]
Oct 03 08:50:07 gentoo-desktop kernel:  ? irq_thread_fn+0x1f/0x60
Oct 03 08:50:07 gentoo-desktop kernel:  ? irq_thread+0xd8/0x190
Oct 03 08:50:07 gentoo-desktop kernel:  ? __pfx_irq_thread_dtor+0x10/0x10
Oct 03 08:50:07 gentoo-desktop kernel:  ? __pfx_irq_thread+0x10/0x10
Oct 03 08:50:07 gentoo-desktop kernel:  ? kthread+0xe4/0x110
Oct 03 08:50:07 gentoo-desktop kernel:  ? __pfx_kthread+0x10/0x10
Oct 03 08:50:07 gentoo-desktop kernel:  ? ret_from_fork+0x30/0x50
Oct 03 08:50:07 gentoo-desktop kernel:  ? __pfx_kthread+0x10/0x10
Oct 03 08:50:07 gentoo-desktop kernel:  ? ret_from_fork_asm+0x1b/0x30
Oct 03 08:50:07 gentoo-desktop kernel:  </TASK>
Oct 03 08:50:07 gentoo-desktop kernel: Modules linked in: nvidia_uvm(PO) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_intel_dspcfg kvm_intel snd_hda_codec rapl snd_hwdep snd_hda_core intel_cstate vfat fat intel_uncore i2c_i801 snd_pcm i2c_smbus snd_timer snd e1000e lpc_ich soundcore fuse loop nfnetlink dm_crypt trusted nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crct10dif_pclmul
Oct 03 08:50:07 gentoo-desktop kernel:  ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 drm_kms_helper nvme drm nvme_core
Oct 03 08:50:07 gentoo-desktop kernel: CR2: ffffc90006fcec04
Oct 03 08:50:07 gentoo-desktop kernel: ---[ end trace 0000000000000000 ]---

I'll stick with 550 for now.

Offline

#137 2024-11-16 07:22:50

nu_kru
Member
Registered: 2009-12-14
Posts: 8

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

After a long time frozen on nvidia 535 and kernel 6.5, I tried Nvidia 565 and kernel 6.11, and with KDE  and Wayland it works (on xorg I get a black screen after resume, although I am not sure if it's an issue with SDDM.)

I have a 970 GTX.

After a week, everything is fine except for Firefox, where I had to set widget.wayland.vsync.enabled to false, otherwise I experience 1-second stuttering on YouTube videos. (Anyway I don't have stuttering after disabled this option) Firefox with Xwayland crashes every time I close a window.

[julio@ssdazathoth ~]$ systemctl list-unit-files | grep nvidia
nvidia-hibernate.service                     enabled         disabled
nvidia-persistenced.service                  enabled         disabled
nvidia-powerd.service                        disabled        disabled
nvidia-resume.service                        enabled         disabled
nvidia-suspend.service                       enabled         disabled

[julio@ssdazathoth ~]$ cat /etc/modprobe.d/nvidia.conf
options nvidia_drm modeset=1 fbdev=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_TemporaryFilePath=/var/tmp

[julio@ssdazathoth ~]$ cat /usr/lib/systemd/system/systemd-suspend.service.d/10-nvidia-no-freeze-session.conf 
[Service]
Environment="SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=false"

[julio@ssdazathoth ~]$ cat /usr/lib/systemd/system/systemd-homed.service.d/10-nvidia-no-freeze-session.conf
[Service]
Environment="SYSTEMD_HOME_LOCK_FREEZE_SESSION=false"

Offline

#138 2024-12-06 22:18:57

Tharbad
Member
Registered: 2016-02-27
Posts: 283

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Can anyone verify that this bug is solved and I can stop using nvidia 550?

Offline

#139 2024-12-07 00:15:37

contessaptv
Member
Registered: 2024-11-17
Posts: 1

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Still having the same issue with linux 6.12 and nvidia 565 with laptop rtx4070. I'm able to suspend, and upon resume, keyboard backlight lights up and fans start spinning. However screen remains black and unable to switch tty. Not really sure what I can do at this point other than reverting to older nvidia, been trying solutions for weeks. For now, I've disabled sleep on lid close, hopefully a fix will present itself soon.

Offline

#140 2024-12-08 00:21:04

Tharbad
Member
Registered: 2016-02-27
Posts: 283

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

I'm with 550 and kernel 6.11.9. It seems that 6.12 requires something newer than 550.

Offline

#141 2024-12-12 00:04:09

zephyr42
Member
Registered: 2024-12-07
Posts: 1

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Reinstalled some weeks ago and also still can't make it wake up from suspension. Nvidia-565.77-5, Linux 6.12.4.arch1-1, on the logs system goes to suspend and then nothing else. Never wakes. With or without PreserveVideoMemoryAllocation, it doesn't seem to make a difference, a bit of a noob, so I might have messed something else up though. 4070ti, dual-boot on different drives, encrypted drive, no secureboot, archinstall.
No TTY, keyboard lights turn on, but apart from that nothing, have to reset the PC.
Without

systemctl list-unit-files | grep nvidia
nvidia-hibernate.service                     enabled         disabled
nvidia-persistenced.service                  disabled        disabled
nvidia-powerd.service                        disabled        disabled
nvidia-resume.service                        enabled         disabled
nvidia-suspend.service                       enabled         disabled

Without PreserveVideoMemoryAllocation
https://0x0.st/Xhh-.txt

With it on grub

head -n 10  /etc/default/grub
# GRUB boot loader configuration

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Arch"
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet nvidia.NVreg_PreserveVideoMemoryAllocations=1"
GRUB_CMDLINE_LINUX="cryptdevice=UUID=fc7eeee8-4414-4a27-b8cf-de0cf10c13de:cryptlvm root=/dev/ArchinstallVg/root rootfstype=btrfs"

https://0x0.st/XhhR.txt

Both just end with:

Dec 11 23:31:27 archlinux systemd[1]: nvidia-suspend.service: Deactivated successfully.
Dec 11 23:31:27 archlinux systemd[1]: Finished NVIDIA system suspend actions.
Dec 11 23:31:27 archlinux systemd[1]: nvidia-suspend.service: Consumed 910ms CPU time, 587.5M memory peak.
Dec 11 23:31:27 archlinux systemd[1]: Starting System Suspend...
Dec 11 23:31:27 archlinux systemd-sleep[2319]: User sessions remain unfrozen on explicit request ($SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0).
Dec 11 23:31:27 archlinux systemd-sleep[2319]: This is not recommended, and might result in unexpected behavior, particularly
Dec 11 23:31:27 archlinux systemd-sleep[2319]: in suspend-then-hibernate operations or setups with encrypted home directories.
Dec 11 23:31:27 archlinux systemd-sleep[2319]: Performing sleep operation 'suspend'

Last edited by zephyr42 (2024-12-12 00:08:16)

Offline

#142 2024-12-20 07:38:00

Tharbad
Member
Registered: 2016-02-27
Posts: 283

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

According to NVIDIA 565 feedback thread this bug is solved by kernel 6.12.
It should be solved in kernel 6.12.1 according to this

According to zephyr42 it came back at 6.12.4. Can anyone else verify?
I'm going to check 6.12.1.

Offline

#143 2024-12-20 07:40:34

EnzephaloN
Member
Registered: 2024-02-29
Posts: 24

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

My ThinkPad P1 g4 had broken resume after hibernation since kernel 6.11.

I tried many things, nothing worked. The nvidia-services are enabled. I use nvidia-open-dkms drivers.
Yesterday I removed all nvidia-entries from MODULES in /etc/mkinitcpio.conf . I kept "nvidia_drm.modeset=1 nvidia_drm.fbdev=1" in /etc/default/grub and I kept PreserveVideoMemoryAllocation-settings in /etc/modprobe.d/nvidia.conf .

After a restart I tried to hibernate->resume, and it works!

Offline

#144 2024-12-20 08:53:46

seth
Member
Registered: 2012-09-03
Posts: 59,897

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

The 6.12.{0,1,2} build had the simpledrm hack disabled - people might simply have been running on the simpledrm device by accident.
Also https://forums.developer.nvidia.com/t/5 … 310777/354 seems to be about https://bbs.archlinux.org/viewtopic.php?id=301357&p=3 and not related to system sleeps?

Offline

#145 2024-12-20 09:30:15

Tharbad
Member
Registered: 2016-02-27
Posts: 283

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

6.12.1 with latest 565 (565.77-2) is working with 1 caveat: it's takes up to 2 min after login when returning from suspend for the system to be responsive.
There's nothing relevant in the logs except for discord app reporting its gpu thread has crashed and restarted.

Tried removing options nvidia NVreg_PreserveVideoMemoryAllocations=1 but things just got worse: I can login but I can't use the GUI.

Offline

#146 2024-12-20 09:36:33

seth
Member
Registered: 2012-09-03
Posts: 59,897

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Do you have a journal for such run (w/ NVreg_PreserveVideoMemoryAllocations=1 in place)
"takes up to 2 min after login" would be 4 dbus or 1 dbus and one systemd timeout… or the kernel timeout (120s)

Offline

#147 2024-12-20 10:53:09

Tharbad
Member
Registered: 2016-02-27
Posts: 283

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

seth wrote:

Do you have a journal for such run (w/ NVreg_PreserveVideoMemoryAllocations=1 in place)
"takes up to 2 min after login" would be 4 dbus or 1 dbus and one systemd timeout… or the kernel timeout (120s)

Searched timeout and it seems that some other apps had a timeout of 30s to 45s. Didn't find any dbus timeout.
I don't know what's "4 dbus or 1 dbus and one systemd timeout… or the kernel timeout" means.

Last edited by Tharbad (2024-12-20 11:12:14)

Offline

#148 2024-12-20 14:38:09

seth
Member
Registered: 2012-09-03
Posts: 59,897

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

If you send a dbus message, it takes 25s w/o respond until the server says "ok, not coming anymore".
Systemd services have a 90s default timeout until the failing job gets killed.
The kernel has a 120s timeout until it triggers a slowpath oops

The rest is math for "up to 2 minutes"

Grepping for "timeout" will not ncessarily cut it. You'll actually have to look at the critical period or post it for review.

Offline

#149 2024-12-21 09:36:47

Tharbad
Member
Registered: 2016-02-27
Posts: 283

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Log from this morning: https://pastebin.com/PtrEh3Mb

Had to kill plasma as it was stuck and didn't return at all.

Offline

#150 2024-12-21 18:54:15

seth
Member
Registered: 2012-09-03
Posts: 59,897

Re: NVIDIA - cannot resume from suspend with PreserveVideoMemoryAllocation

Despite

Dec 20 21:33:42 <PC-Name> systemd[1]: Starting NVIDIA system suspend actions...
Dec 20 21:33:42 <PC-Name> suspend[171757]: nvidia-suspend.service
Dec 20 21:33:42 <PC-Name> logger[171757]: <13>Dec 20 21:33:42 suspend: nvidia-suspend.service
Dec 20 21:33:44 <PC-Name> systemd[1]: nvidia-suspend.service: Deactivated successfully.
Dec 20 21:33:44 <PC-Name> systemd[1]: Finished NVIDIA system suspend actions.
Dec 20 21:33:44 <PC-Name> systemd[1]: nvidia-suspend.service: Consumed 1.922s CPU time, 2.3G memory peak.
Dec 21 11:16:03 <PC-Name> systemd[1]: Starting NVIDIA system resume actions...
Dec 21 11:16:03 <PC-Name> suspend[172666]: nvidia-resume.service
Dec 21 11:16:03 <PC-Name> logger[172666]: <13>Dec 21 11:16:03 suspend: nvidia-resume.service
Dec 21 11:16:03 <PC-Name> systemd[1]: nvidia-resume.service: Deactivated successfully.
Dec 21 11:16:03 <PC-Name> systemd[1]: Finished NVIDIA system resume actions.

there's

Dec 21 11:16:21 <PC-Name> kscreenlocker_greet[171713]: QRhiGles2: Context is lost.
Dec 21 11:16:21 <PC-Name> kscreenlocker_greet[171713]: Graphics device lost, cleaning up scenegraph and releasing RHI
Dec 21 11:16:21 <PC-Name> plasmashell[4396]: QRhiGles2: Context is lost.
Dec 21 11:16:58 <PC-Name> chromium[10144]: [10144:10144:1221/111658.313034:ERROR:shared_context_state.cc(1266)] SharedContextState context lost via ARB/EXT_robustness. Reset status = GL_GUILTY_CONTEXT_RESET_KHR
Dec 21 11:17:01 <PC-Name> kwin_x11[4372]: kwin_core: XCB error: 3 (BadWindow), sequence: 49772, resource id: 161483471, major code: 129 (SHAPE), minor code: 6 (Input)
Dec 21 11:19:16 <PC-Name> ferdium[8183]: [8183:1221/111916.576860:ERROR:shared_context_state.cc(1266)] SharedContextState context lost via ARB/EXT_robustness. Reset status = GL_GUILTY_CONTEXT_RESET_KHR
Dec 21 11:19:16 <PC-Name> ferdium[8183]: [8183:1221/111916.584176:ERROR:gpu_service_impl.cc(1161)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly.

Had to kill plasma as it was stuck and didn't return at all.

The process responded to a standard SIGTERM though (kill default value)?

https://wiki.archlinux.org/title/NVIDIA … er_suspend
Do you set a special NVreg_TemporaryFilePath ?
Do you have enough (free) RAM to use a tmpfs?

What happens if you disable the suspend/resume services and add "nvidia.NVreg_PreserveVideoMemoryAllocations=0" to the kernel parameters?

In case of

Tried removing options nvidia NVreg_PreserveVideoMemoryAllocations=1 but things just got worse: I can login but I can't use the GUI.

please elaborate on whyt you can't use the GUI and in doubt post the xorg log reflecting such attempt.

Offline

Board footer

Powered by FluxBB