You are not logged in.

#1 2025-05-07 04:16:46

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

I'm also running into this issue on the most up to date

pacman -Q linux nvidia-open
linux 6.14.4.arch1-2
nvidia-open 570.144-3

I have a RTX 2070 Super.

My system is completely unresponsive when I wake from sleep, and I cannot switch ttys.  Very oddly, this issue only started occurring when I upgraded my cpu from a Ryzen 3 3100 to a Ryzen 5 5500.  I upgraded my system with `pacman -Syu` around the same time, but the kernel/nvidia versions weren't significantly changed.  In any case, I tried reverting versions with no effect.  Since then, I've also tried the lts kernel and nvidia driver, and my problem is the same.  Previously I was using the `nvidia` driver (not `nvidia-open`), but I'm having the exact same issue on both.

I'm suspecting some configuration changed in my system not related to package versions, but I can't for the life of me figure out what it could be.  I also use picom and had issues with this breaking during my version hopping that I was only able to resolve my changing the backend it uses (from `glx` to `egl`).  It also seems like my desktop environment is less responsive now (e.g. my kitty terminal seems to have more lag when I type).  Maybe this is just me going insane tho tongue.  At least games still run at a pretty good fps though. 

I can go into `s2idle` just fine, but `deep` sleep does not work.  I want to get `deep` sleep working since `s2idle` does not turn off my case fans.

`systemctl hibernate` seems to work as expected (although it does log me out of my x session).

After hard shutting down after trying to resume from sleep I get pretty much no logs in my journalctl:  http://0x0.st/8Jjh.txt.  (Btw in that log I noticed an "nvida_drm.modeset invalid option "l"" error.  Apparently it was set to l instead of 1 in my grub config.  I fixed this and the error went away.  My problem persists though.)

I have also tried just waiting but my system does not recover after 3+ minutes.

> cat /etc/mkinitcpio.conf
# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run.  Advanced users may wish to specify all system modules
# in this array.  For instance:
#     MODULES=(usbhid xhci_hcd)
MODULES=()

# BINARIES
# This setting includes any additional binaries a given user may
# wish into the CPIO image.  This is run last, so it may be used to
# override the actual binaries included by a given hook
# BINARIES are dependency parsed, so you may safely ignore libraries
BINARIES=()

# FILES
# This setting is similar to BINARIES above, however, files are added
# as-is and are not parsed in any way.  This is useful for config files.
FILES=()

# HOOKS
# This is the most important setting in this file.  The HOOKS control the
# modules and scripts added to the image, and what happens at boot time.
# Order is important, and it is recommended that you do not change the
# order in which HOOKS are added.  Run 'mkinitcpio -H <hook name>' for
# help on a given hook.
# 'base' is _required_ unless you know precisely what you are doing.
# 'udev' is _required_ in order to automatically load modules
# 'filesystems' is _required_ unless you specify your fs modules in MODULES
# Examples:
##   This setup specifies all modules in the MODULES setting above.
##   No RAID, lvm2, or encrypted root is needed.
#    HOOKS=(base)
#
##   This setup will autodetect all modules for your system and should
##   work as a sane default
#    HOOKS=(base udev autodetect modconf block filesystems fsck)
#
##   This setup will generate a 'full' image which supports most systems.
##   No autodetection is done.
#    HOOKS=(base udev modconf block filesystems fsck)
#
##   This setup assembles a mdadm array with an encrypted root file system.
##   Note: See 'mkinitcpio -H mdadm_udev' for more information on RAID devices.
#    HOOKS=(base udev modconf keyboard keymap consolefont block mdadm_udev encrypt filesystems fsck)
#
##   This setup loads an lvm2 volume group.
#    HOOKS=(base udev modconf block lvm2 filesystems fsck)
#
##   This will create a systemd based initramfs which loads an encrypted root filesystem.
#    HOOKS=(base systemd autodetect modconf kms keyboard sd-vconsole sd-encrypt block filesystems fsck)
#
##   NOTE: If you have /usr on a separate partition, you MUST include the
#    usr and fsck hooks.
HOOKS=(base udev autodetect microcode modconf keyboard keymap consolefont block filesystems fsck)

# COMPRESSION
# Use this to compress the initramfs image. By default, zstd compression
# is used for Linux ≥ 5.9 and gzip compression is used for Linux < 5.9.
# Use 'cat' to create an uncompressed image.
#COMPRESSION="zstd"
#COMPRESSION="gzip"
#COMPRESSION="bzip2"
#COMPRESSION="lzma"
#COMPRESSION="xz"
#COMPRESSION="lzop"
#COMPRESSION="lz4"

# COMPRESSION_OPTIONS
# Additional options for the compressor
#COMPRESSION_OPTIONS=()

# MODULES_DECOMPRESS
# Decompress loadable kernel modules and their firmware during initramfs
# creation. Switch (yes/no).
# Enable to allow further decreasing image size when using high compression
# (e.g. xz -9e or zstd --long --ultra -22) at the expense of increased RAM usage
# at early boot.
# Note that any compressed files will be placed in the uncompressed early CPIO
# to avoid double compression.
#MODULES_DECOMPRESS="no"

I've also tried with `NVreg_PreserveVideoMemoryAllocations=0` with no effect.

> cat /etc/modprobe.d/nvidia.conf 
options nvidia_drm modeset=1
options nvidia_drm fbdev=1
options nvidia \
    NVreg_PreserveVideoMemoryAllocations=1 \
    NVreg_TemporaryFilePath=/var/tmp
blacklist nouveau
blacklist nvidiafb

Last edited by Lone_Wolf (2025-06-04 12:02:04)

Offline

#2 2025-05-07 07:42:13

seth
Member
Registered: 2012-09-03
Posts: 64,153

Resume issue - nvme related ?

`systemctl hibernate` seems to work as expected (although it does log me out of my x session).

Means it doesn't  - you don't have th resume hook in your initramfs.

this issue only started occurring when I upgraded my cpu from a Ryzen 3 3100 to a Ryzen 5 5500

May 06 20:37:20 frostyarch kernel: DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P5.70 10/20/2022

nvidia-drm.modeset=l

"1", not "l" - get a better monospace font.

Can you sleep/wake fine when removing nvidia and running on nouveau?

Offline

#3 2025-05-07 11:46:37

Tharbad
Member
Registered: 2016-02-27
Posts: 296

Re: Resume issue - nvme related ?

Tried removing nvidia from modules. still not working.
going back to 535 (550 has problems)

Edit: 535 also has problems (nvidia utils has diff version)
550 problems: Requires new maintainer. Needs manual PKGBUILD edit for now.

Installed 525 instead
Package list for your favorite aur wrapper:

 lib32-nvidia-525xx-utils lib32-opencl-nvidia-525xx libxnvctrl-525xx nvidia-525xx-dkms nvidia-525xx-utils opencl-nvidia-525xx 

Last edited by Tharbad (2025-05-09 01:09:16)

Offline

#4 2025-05-07 23:48:46

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

Haha point noted about the monospace font!  That issue is resolved now.  I thought I could edit my post saying this before anyone noticed haha.

I tried running on nouveau, which I installed via:

sudo pacman -R lib32-nvidia-utils lib32-opencl-nvidia nvidia-open nvidia-settings nvidia-utils opencl-nvidia steam python-py3nvml gwe
sudo pacman -S mesa lib32-mesa
sudo mv /etc/modprobe.d/nvidia.conf .
sudo mv /etc/X11/xorg.conf .
sudo pacman -S mesa lib32-mesa

And verified the installation via

> inxi -F
System:
  Host: frostyarch Kernel: 6.14.4-arch1-2 arch: x86_64 bits: 64
  Desktop: Qtile v: 0.31.1.dev0+g8666bfc8.d20250312 Distro: Arch Linux
Machine:
  Type: Desktop Mobo: ASRock model: B450M Pro4 serial: <superuser required>
    UEFI: American Megatrends v: P5.70 date: 10/20/2022
CPU:
  Info: 6-core model: AMD Ryzen 5 5500 bits: 64 type: MT MCP cache: L2: 3 MiB
  Speed (MHz): avg: 2918 min/max: 400/4268 cores: 1: 2918 2: 2918 3: 2918
    4: 2918 5: 2918 6: 2918 7: 2918 8: 2918 9: 2918 10: 2918 11: 2918 12: 2918
Graphics:
  Device-1: NVIDIA TU104 [GeForce RTX 2070 SUPER] driver: nouveau v: kernel
  Display: x11 server: X.Org v: 21.1.16 with: Xwayland v: 24.1.6 driver: X:
    loaded: modesetting unloaded: vesa dri: nouveau gpu: nouveau resolution:
    1: 2560x1440~60Hz 2: N/A
  API: EGL v: 1.5 drivers: nouveau,swrast
    platforms: gbm,x11,surfaceless,device
  API: OpenGL v: 4.5 compat-v: 4.3 vendor: mesa v: 25.0.5-arch1.1
    renderer: NV164
  Info: Tools: api: eglinfo,glxinfo x11: xdriinfo, xdpyinfo, xprop, xrandr

Many things seemed wonky with this install (for instance, only one of my monitors worked), but I was able to suspend and resume properly!  At least kinda. Shortly after I resumed my desktop environment became unresponsive (I could still move my mouse, but not click on stuff)  and I could switch to TTYs, but they only showed a blinking underscore.  All my graphical programs also crashed.  Oddly, I still could change workspaces and click on tray icons in this state.  In any case, this was very different from my black screens on nvidia.  Here is the journalctl from that boot: https://0x0.st/8JyJ.txt.  I tried it again with the same result, see https://0x0.st/8Jy3.txt.

As an aside, for all the people in this thread who have downgraded to a old nvidia driver version, what method did you use?  I've been using https://gitlab.archlinux.org/archlinux/ … mmits/main as a guide to show me which nvidia drivers are compatible with which linux kernels, but when using the `downgrade  script (https://aur.archlinux.org/packages/downgrade) I find myself in this dependency mess where I need to match versions for a variety of packages.  For example:

> sudo downgrade nvidia nvidia-utils lib32-nvidia-utils linux
:: Retrieving packages...
 lib32-nvidia-utils-550.90.07-1-x86_64                                        39.4 MiB  10.7 MiB/s 00:04 [##############################################################] 100%
 linux-6.9.3.arch1-1-x86_64                                                  133.9 MiB  18.2 MiB/s 00:07 [##############################################################] 100%
 nvidia-550.90.07-1-x86_64                                                    40.8 MiB  16.3 MiB/s 00:02 [##############################################################] 100%
 nvidia-utils-550.90.07-1-x86_64                                             220.9 MiB  17.5 MiB/s 00:13 [##############################################################] 100%
loading packages...
warning: downgrading package lib32-nvidia-utils (570.144-1 => 550.90.07-1)
warning: downgrading package linux (6.14.4.arch1-2 => 6.9.3.arch1-1)
warning: downgrading package nvidia-utils (570.144-3 => 550.90.07-1)
resolving dependencies...
looking for conflicting packages...
:: nvidia-550.90.07-1 and nvidia-open-570.144-3 are in conflict. Remove nvidia-open? [y/N] y

Packages (5) nvidia-open-570.144-3 [removal]  lib32-nvidia-utils-550.90.07-1  linux-6.9.3.arch1-1  nvidia-550.90.07-1  nvidia-utils-550.90.07-1

Total Installed Size:   971.20 MiB
Net Upgrade Size:      -211.23 MiB

:: Proceed with installation? [Y/n] 
(4/4) checking keys in keyring                                                                           [##############################################################] 100%
(4/4) checking package integrity                                                                         [##############################################################] 100%
(4/4) loading package files                                                                              [##############################################################] 100%
(4/4) checking for file conflicts                                                                        [##############################################################] 100%
error: failed to commit transaction (conflicting files)
nvidia-utils: /usr/lib/libnvidia-egl-gbm.so exists in filesystem (owned by egl-gbm)
nvidia-utils: /usr/lib/libnvidia-egl-gbm.so.1 exists in filesystem (owned by egl-gbm)
nvidia-utils: /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json exists in filesystem (owned by egl-gbm)
Errors occurred, no packages were upgraded.

I'm getting a sense there is a better way to do this.  Of course, ideally I wouldn't downgrade at all, given that everything was working fine before I upgraded my cpu (and maybe did something else inadvertently at the same time to cause this)...

Last edited by a-curious-crow (2025-05-08 00:25:02)

Offline

#5 2025-05-08 06:13:17

seth
Member
Registered: 2012-09-03
Posts: 64,153

Re: Resume issue - nvme related ?

downgraded to a old nvidia driver version, what method did you use

https://aur.archlinux.org/packages?O=0&K=535xx

Offline

#6 2025-05-08 10:23:15

bertieb
Member
Registered: 2023-11-29
Posts: 19

Re: Resume issue - nvme related ?

a-curious-crow wrote:

(...) Very oddly, this issue only started occurring when I upgraded my cpu from a Ryzen 3 3100 to a Ryzen 5 5500 (...)

As another anec-data point I am also on AM4- running a 5800X on a B550 chipset. I'm not convinced that's a factor but I'll see when I changed over from an Intel CPU/board and compare dates just in case.

I still get some rare "clusters" of suspend issues. As has just happened, I can go for weeks without them and then get a couple within the space of a day or two. This was across (ie both before/after) a system update but since kernel and nvidia-related packages don't get upgraded I'm not sure that's critical.

$ pacman -Q nvidia; uname -r; nvidia-debugdump --list
nvidia-535xx-dkms 535.183.01-2
6.1.91-1-lts61
Found 1 NVIDIA devices
        Device ID:              0
        Device name:            NVIDIA GeForce GTX 970   (*PrimaryCard)
        GPU internal ID:        GPU-bf73e6ad-0567-643c-5a9d-369b88e45323

Still responsive to ssh (as mentioned before).

Last boot: https://0x0.st/8JxW.txt

journal wrote:
May 08 10:22:54 zeus kernel: ------------[ cut here ]------------
May 08 10:22:54 zeus kernel: WARNING: CPU: 14 PID: 272752 at /var/lib/dkms/nvidia/535.183.01/build/nvidia/nv.c:3947 
nv_restore_user_channels+0x4e/0x1d0 [nvidia]
May 08 10:22:54 zeus kernel: Modules linked in: 
tls snd_seq_dummy snd_hrtimer snd_seq dm_snapshot dm_bufio nft_masq nft_ct nft_reject_ipv4 nf_reject_ipv4 nft_reject act_csum cls_u32 sch_htb 
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c bridge stp llc nct6775 nct6775_core hwmon_vid 
hid_logitech_hidpp joydev mousedev snd_usb_audio  snd_usbmidi_lib snd_rawmidi snd_seq_device hid_logitech_dj mc r8169 realtek 
mdio_devres libphy intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek ccp snd_hda_codec_generic 
snd_hda_codec_hdmi kvm snd_hda_intel snd_intel_dspcfg irqbypass snd_intel_sdw_acpi crct10dif_pclmul crc32_pclmul snd_hda_codec 
polyval_clmulni polyval_generic eeepc_wmi gf128mul asus_wmi snd_hda_core ghash_clmulni_intel ledtrig_audio sha512_ssse3 sparse_keymap 
snd_hwdep sha256_ssse3 platform_profile snd_pcm sha1_ssse3 i8042 aesni_intel snd_timer serio nvidia_drm(POE) crypto_simd cryptd 
usbhid rfkill wmi_bmof nvidia_modeset(POE) snd sp5100_tco rapl
May 08 10:22:54 zeus kernel:  video soundcore k10temp pcspkr i2c_piix4 wmi gpio_amdpt acpi_cpufreq gpio_generic mac_hid
vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE)  nvidia_uvm(POE) nvidia(POE) sg crypto_user loop fuse nfnetlink bpf_preload ip_tables x_tables
ext4 crc32c_generic crc16 mbcache jbd2 dm_mod nvme crc32c_intel nvme_core xhci_pci xhci_pci_renesas nvme_common
May 08 10:22:54 zeus kernel: CPU: 14 PID: 272752 Comm: nvidia-sleep.sh Tainted: P           OE      6.1.91-1-lts61 #1 
d05288a9a86238b04a93de064045849480ab030f
May 08 10:22:54 zeus kernel: Hardware name: ASUS System Product Name/PRIME B550-PLUS, BIOS 2006 03/19/2021
May 08 10:22:54 zeus kernel: RIP: 0010:nv_restore_user_channels+0x4e/0x1d0 [nvidia]
May 08 10:22:54 zeus kernel: Code: 24 c0 05 00 00 4c 89 ef e8 df 2a ba db f6 43 10 01 74 73 48 89 de 31 ff e8 ff 1d a9 
00 41 89 c6 85 c0 0f 84 3a 01 00 00 
31 ed <0f> 0b 49 81 c4 e8 06 00 00 4c 89 e7 e8 b1 2a ba db be 01 00 00 00
May 08 10:22:54 zeus kernel: RSP: 0018:ffffaa8604adf9d8 EFLAGS: 00010206
May 08 10:22:54 zeus kernel: RAX: 0000000000000003 RBX: ffff989286407800 RCX: ffffaa8604adf958
May 08 10:22:54 zeus kernel: RDX: ffffaa86019cfe60 RSI: 0000000000000246 RDI: ffffaa8604adf908
May 08 10:22:54 zeus kernel: RBP: ffff9895a0953000 R08: 0000000000000000 R09: ffff9895a0955f60
May 08 10:22:54 zeus kernel: R10: 000000700030f231 R11: 0000000000000000 R12: ffff989286407800
May 08 10:22:54 zeus kernel: R13: ffff989286407dc0 R14: 0000000000000003 R15: 0000000000000000
May 08 10:22:54 zeus kernel: FS:  00007fb275416b80(0000) GS:ffff98a16ed80000(0000) knlGS:0000000000000000
May 08 10:22:54 zeus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 08 10:22:54 zeus kernel: CR2: 0000561449f23218 CR3: 00000008a2cea000 CR4: 0000000000750ee0
May 08 10:22:54 zeus kernel: PKRU: 55555554
May 08 10:22:54 zeus kernel: Call Trace:
May 08 10:22:54 zeus kernel:  <TASK>
May 08 10:22:54 zeus kernel:  ? nv_restore_user_channels+0x4e/0x1d0 [nvidia b2cf649bae6446ec4b5dfc55ba36538945a1757f]
May 08 10:22:54 zeus kernel:  ? __warn+0x7d/0xd0
May 08 10:22:54 zeus kernel:  ? nv_restore_user_channels+0x4e/0x1d0 [nvidia b2cf649bae6446ec4b5dfc55ba36538945a1757f]
May 08 10:22:54 zeus kernel:  ? report_bug+0x108/0x150
May 08 10:22:54 zeus kernel:  ? handle_bug+0x3c/0x80
May 08 10:22:54 zeus kernel:  ? exc_invalid_op+0x17/0x70
May 08 10:22:54 zeus kernel:  ? asm_exc_invalid_op+0x1a/0x20
May 08 10:22:54 zeus kernel:  ? nv_restore_user_channels+0x4e/0x1d0 [nvidia b2cf649bae6446ec4b5dfc55ba36538945a1757f]
May 08 10:22:54 zeus kernel:  ? nv_restore_user_channels+0x132/0x1d0 [nvidia b2cf649bae6446ec4b5dfc55ba36538945a1757f]
May 08 10:22:54 zeus kernel:  nv_set_system_power_state+0xe9/0x470 [nvidia b2cf649bae6446ec4b5dfc55ba36538945a1757f]
May 08 10:22:54 zeus kernel:  nv_procfs_write_suspend+0xef/0x170 [nvidia b2cf649bae6446ec4b5dfc55ba36538945a1757f]
May 08 10:22:54 zeus kernel:  proc_reg_write+0x5a/0xa0
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  vfs_write+0xe9/0x3e0
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? notify_change+0x265/0x570
May 08 10:22:54 zeus kernel:  ? __vfs_getxattr+0x2e/0x80
May 08 10:22:54 zeus kernel:  ksys_write+0x6d/0xf0
May 08 10:22:54 zeus kernel:  do_syscall_64+0x5a/0x80
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? get_page_from_freelist+0x14ef/0x1660
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? __mod_memcg_lruvec_state+0x45/0x90
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? __mod_lruvec_page_state+0x99/0x140
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? page_add_new_anon_rmap+0x74/0x130
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? __handle_mm_fault+0xe38/0xf80
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? handle_mm_fault+0xdd/0x2d0
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? do_user_addr_fault+0x225/0x560
May 08 10:22:54 zeus kernel:  ? srso_alias_return_thunk+0x5/0x7f
May 08 10:22:54 zeus kernel:  ? exc_page_fault+0x7c/0x180
May 08 10:22:54 zeus kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
May 08 10:22:54 zeus kernel: RIP: 0033:0x7fb275519006
May 08 10:22:54 zeus kernel: Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00
 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
May 08 10:22:54 zeus kernel: RSP: 002b:00007ffc66244830 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
May 08 10:22:54 zeus kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007fb275519006
May 08 10:22:54 zeus kernel: RDX: 0000000000000007 RSI: 0000561449f22e10 RDI: 0000000000000001
May 08 10:22:54 zeus kernel: RBP: 00007ffc66244850 R08: 0000000000000000 R09: 0000000000000000
May 08 10:22:54 zeus kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000007
May 08 10:22:54 zeus kernel: R13: 0000561449f22e10 R14: 00007fb27566e5c0 R15: 0000000000000000
May 08 10:22:54 zeus kernel:  </TASK>
May 08 10:22:54 zeus kernel: ---[ end trace 0000000000000000 ]---
...
May 08 10:22:54 zeus kernel: nvidia-modeset: ERROR: GPU:0: Failed to bind display engine notify context DMA: 0x1a 
(Ran out of a critical resource, other than memory [NV_ERR_INSUFFICIENT_RESOURCES])
May 08 10:22:54 zeus kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 08 10:22:54 zeus kernel: nvidia-modeset: ERROR: GPU:0: Failed to bind display engine notify context DMA: 0x1a 
(Ran out of a critical resource, other than memory [NV_ERR_INSUFFICIENT_RESOURCES])
May 08 10:22:54 zeus kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

Offline

#7 2025-05-09 01:17:20

Tharbad
Member
Registered: 2016-02-27
Posts: 296

Re: Resume issue - nvme related ?

bertieb wrote:
a-curious-crow wrote:

(...) Very oddly, this issue only started occurring when I upgraded my cpu from a Ryzen 3 3100 to a Ryzen 5 5500 (...)

As another anec-data point I am also on AM4- running a 5800X on a B550 chipset. I'm not convinced that's a factor but I'll see when I changed over from an Intel CPU/board and compare dates just in case.

I still get some rare "clusters" of suspend issues. As has just happened, I can go for weeks without them and then get a couple within the space of a day or two. This was across (ie both before/after) a system update but since kernel and nvidia-related packages don't get upgraded I'm not sure that's critical.

$ pacman -Q nvidia; uname -r; nvidia-debugdump --list
nvidia-535xx-dkms 535.183.01-2
6.1.91-1-lts61
Found 1 NVIDIA devices
        Device ID:              0
        Device name:            NVIDIA GeForce GTX 970   (*PrimaryCard)
        GPU internal ID:        GPU-bf73e6ad-0567-643c-5a9d-369b88e45323

Still responsive to ssh (as mentioned before).

Intersting. I also have AMD Ryzen 9 5950x. Microcode up to date. X570 chipset.
kernel is latest 6.14.5-zen1-1-zen. Had weird problems when stopped updating kernel but kept updating core packages.
GPU is NVIDIA GeForce RTX 4070 Ti SUPER

Offline

#8 2025-05-09 01:18:29

Tharbad
Member
Registered: 2016-02-27
Posts: 296

Re: Resume issue - nvme related ?

In different matter:
Anyone tried the open driver?
For my card Arch wiki recommends the closed one.
But maybe it can help?

Offline

#9 2025-05-09 03:58:30

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

Thank you so much seth for the aur link!  That is so much better than what I was trying.  Unfortunately, like 7000k I ran into some install issues building the 535 packages with the latest 6.14.5-arch1-1 kernel.  I went to try the 550 packages, but it look like they are also having issues.  I don't want to mess with patching these packages at the moment, so I'll eagerly await 550 being fixed, then try with that driver version. 

I tried both open and normal nvidia, and had the same issue with both.  But it's worth a shot I'd say!  Maybe our problems are subtly different.

Offline

#10 2025-05-09 07:20:47

seth
Member
Registered: 2012-09-03
Posts: 64,153

Re: Resume issue - nvme related ?

What install issues?
There're none reported in the AUR comments and the package got a version update mid-april.

Offline

#11 2025-05-09 23:34:27

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

When installing the 535 packages, my computer wouldn't boot.  When i looked at journalctl, I saw:

systemd-modules-load[344]: Failed to find module 'nvidia-uvm'

Then after recovering my system and trying to reinstall 535, I saw this in /var/log/pacman.log:

[2025-05-08T20:47:51-0700] [ALPM] reinstalled lib32-opencl-nvidia-535xx (535.247.01-1)
[2025-05-08T20:47:51-0700] [ALPM] installed lib32-nvidia-535xx-utils (535.247.01-1)
[2025-05-08T20:47:51-0700] [ALPM] installed nvidia-535xx-dkms (535.247.01-1)
[2025-05-08T20:47:51-0700] [ALPM] reinstalled opencl-nvidia-535xx (535.247.01-1)
[2025-05-08T20:47:51-0700] [ALPM] transaction completed
[2025-05-08T20:47:51-0700] [ALPM] running '20-systemd-sysusers.hook'...
[2025-05-08T20:47:51-0700] [ALPM] running '30-systemd-daemon-reload-system.hook'...
[2025-05-08T20:47:51-0700] [ALPM] running '30-systemd-restart-marked.hook'...
[2025-05-08T20:47:51-0700] [ALPM] running '30-systemd-udev-reload.hook'...
[2025-05-08T20:47:52-0700] [ALPM] running '30-systemd-update.hook'...
[2025-05-08T20:47:52-0700] [ALPM] running '60-depmod.hook'...
[2025-05-08T20:47:54-0700] [ALPM] running '70-dkms-install.hook'...
[2025-05-08T20:47:54-0700] [ALPM-SCRIPTLET] ==> dkms install --no-depmod nvidia/535.247.01 -k 6.14.5-arch1-1
[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] 
[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] Error! Bad return status for module build on kernel: 6.14.5-arch1-1 (x86_64)
[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] Consult /var/lib/dkms/nvidia/535.247.01/build/make.log for more information.
[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] ==> WARNING: `dkms install --no-depmod nvidia/535.247.01 -k 6.14.5-arch1-1' exited 10
[2025-05-08T20:48:09-0700] [ALPM] running '90-mkinitcpio-install.hook'...

I assume this is what 7000k was referring to in his comment on this thread about needing to patch 535 to make it work.

As far as 550, the latest instructions as of a few days ago on https://aur.archlinux.org/pkgbase/nvidia-550xx-dkms say you need to do some manual patch steps to make it work.  I'm hoping someone incorporates this into the package itself soon.

Offline

#12 2025-05-11 09:06:31

Tharbad
Member
Registered: 2016-02-27
Posts: 296

Re: Resume issue - nvme related ?

a-curious-crow wrote:

When installing the 535 packages, my computer wouldn't boot.  When i looked at journalctl, I saw:

systemd-modules-load[344]: Failed to find module 'nvidia-uvm'

Then after recovering my system and trying to reinstall 535, I saw this in /var/log/pacman.log:

[2025-05-08T20:47:51-0700] [ALPM] reinstalled lib32-opencl-nvidia-535xx (535.247.01-1)
[2025-05-08T20:47:51-0700] [ALPM] installed lib32-nvidia-535xx-utils (535.247.01-1)
[2025-05-08T20:47:51-0700] [ALPM] installed nvidia-535xx-dkms (535.247.01-1)
[2025-05-08T20:47:51-0700] [ALPM] reinstalled opencl-nvidia-535xx (535.247.01-1)
[2025-05-08T20:47:51-0700] [ALPM] transaction completed
[2025-05-08T20:47:51-0700] [ALPM] running '20-systemd-sysusers.hook'...
[2025-05-08T20:47:51-0700] [ALPM] running '30-systemd-daemon-reload-system.hook'...
[2025-05-08T20:47:51-0700] [ALPM] running '30-systemd-restart-marked.hook'...
[2025-05-08T20:47:51-0700] [ALPM] running '30-systemd-udev-reload.hook'...
[2025-05-08T20:47:52-0700] [ALPM] running '30-systemd-update.hook'...
[2025-05-08T20:47:52-0700] [ALPM] running '60-depmod.hook'...
[2025-05-08T20:47:54-0700] [ALPM] running '70-dkms-install.hook'...
[2025-05-08T20:47:54-0700] [ALPM-SCRIPTLET] ==> dkms install --no-depmod nvidia/535.247.01 -k 6.14.5-arch1-1
[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] 
[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] Error! Bad return status for module build on kernel: 6.14.5-arch1-1 (x86_64)
[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] Consult /var/lib/dkms/nvidia/535.247.01/build/make.log for more information.
[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] ==> WARNING: `dkms install --no-depmod nvidia/535.247.01 -k 6.14.5-arch1-1' exited 10
[2025-05-08T20:48:09-0700] [ALPM] running '90-mkinitcpio-install.hook'...

I assume this is what 7000k was referring to in his comment on this thread about needing to patch 535 to make it work.

As far as 550, the latest instructions as of a few days ago on https://aur.archlinux.org/pkgbase/nvidia-550xx-dkms say you need to do some manual patch steps to make it work.  I'm hoping someone incorporates this into the package itself soon.

It seems you missed nvidia-utils...

Offline

#13 2025-05-11 16:41:21

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

I didn't post my entire log, just a snippet.  `nvidia-535xx-utils` was also installed.

Offline

#14 2025-05-11 18:59:10

seth
Member
Registered: 2012-09-03
Posts: 64,153

Re: Resume issue - nvme related ?

The curious bits will be in

[2025-05-08T20:48:09-0700] [ALPM-SCRIPTLET] Consult /var/lib/dkms/nvidia/535.247.01/build/make.log for more information.

Offline

#15 2025-05-13 05:32:58

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

Ah that log is empty right now, and I don't want to try to rebuild it and debug provided 550 may work soon.

Offline

#16 2025-05-13 09:19:15

bertieb
Member
Registered: 2023-11-29
Posts: 19

Re: Resume issue - nvme related ?

a-curious-crow wrote:

550 may work soon.

Is there something being worked on in 550 which will fix suspend-related bug(s) ? I haven't been following things in the NVIDIA fora lately.

I'd like to believe we'll get a resolution to an issue from 2023...

Offline

#17 2025-05-14 02:01:30

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

Nothing is being worked on for that driver version in terms of functionality.  I was referring to the fact that the aur package itself is currently broken, and hopefully will be fixed soon.

Offline

#18 2025-05-22 00:27:07

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

Ok so today I finally tried the 550 drivers.  Upon first install, suspending did not work at all, in that my computer would try to sleep (screens would go black for a couple seconds), but wake right away.  I found some useful errors in my `sudo journalctl` that led me to https://bbs.archlinux.org/viewtopic.php?id=288181.  After starting the nvidia systemd services from that post, my computer could sleep!  Unfortunately, it could not wake again; the exact same problem I had before still presents itself on these drivers.

My `sudo journalctl` from the last sleep attempt where I had to do a hard shut down and boot again:

May 21 17:11:54 frostyarch systemd-logind[650]: Suspend key pressed short.
May 21 17:11:54 frostyarch systemd-logind[650]: Suspending...
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.4940] manager: sleep: sleep requested (sleeping: no  enabled: yes)
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5060] manager: NetworkManager state is now ASLEEP
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5061] device (enp5s0): state change: activated -> deactivating (reason 'sleeping', >
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5075] device (enp5s0): state change: deactivating -> disconnected (reason 'sleeping>
May 21 17:11:54 frostyarch systemd-networkd[609]: Foreign process 'NetworkManager[647]' changed sysctl '/proc/sys/net/ipv6/conf/enp5s0/disable_ipv6' fr>
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5198] dhcp4 (enp5s0): canceled DHCP transaction
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5198] dhcp4 (enp5s0): activation: beginning transaction (timeout in 45 seconds)
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5198] dhcp4 (enp5s0): state changed no lease
May 21 17:11:54 frostyarch systemd-timesyncd[407]: No network connectivity, watching for changes.
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5549] device (enp5s0): state change: disconnected -> unmanaged (reason 'unmanaged-s>
May 21 17:11:54 frostyarch kernel: r8169 0000:05:00.0 enp5s0: Link is Down
May 21 17:11:54 frostyarch systemd-networkd[609]: enp5s0: Link DOWN
May 21 17:11:54 frostyarch systemd-networkd[609]: enp5s0: Lost carrier
May 21 17:11:54 frostyarch systemd-networkd[609]: enp5s0: DHCP lease lost
May 21 17:11:54 frostyarch systemd[1]: Reached target Sleep.
May 21 17:11:54 frostyarch systemd[1]: Starting NVIDIA system suspend actions...
May 21 17:11:54 frostyarch suspend[1095]: nvidia-suspend.service
May 21 17:11:54 frostyarch logger[1095]: <13>May 21 17:11:54 suspend: nvidia-suspend.service
May 21 17:11:55 frostyarch systemd[1]: nvidia-suspend.service: Deactivated successfully.
May 21 17:11:55 frostyarch systemd[1]: Finished NVIDIA system suspend actions.
May 21 17:11:55 frostyarch systemd[1]: nvidia-suspend.service: Consumed 412ms CPU time, 156.2M memory peak.
May 21 17:11:55 frostyarch systemd[1]: Starting System Suspend...
May 21 17:11:55 frostyarch systemd-sleep[1105]: User sessions remain unfrozen on explicit request ($SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0).
May 21 17:11:55 frostyarch systemd-sleep[1105]: This is not recommended, and might result in unexpected behavior, particularly
May 21 17:11:50 frostyarch systemd-timesyncd[407]: Contacted time server 104.168.62.30:123 (2.arch.pool.ntp.org).
May 21 17:11:50 frostyarch systemd-timesyncd[407]: Initial clock synchronization to Wed 2025-05-21 17:11:50.873031 PDT.
May 21 17:11:50 frostyarch wireplumber[1039]: default: Failed to get percentage from UPower: org.freedesktop.DBus.Error.NameHasNoOwner
May 21 17:11:50 frostyarch wireplumber[1039]: spa.bluez5: BlueZ system service is not available
May 21 17:11:50 frostyarch wireplumber[1039]: wp-device: SPA handle 'api.libcamera.enum.manager' could not be loaded; is it installed?
May 21 17:11:50 frostyarch wireplumber[1039]: s-monitors-libcamera: PipeWire's libcamera SPA plugin is missing or broken. Some camera types may not be >
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5593] dhcp4 (enp5s0): state changed new lease, address=192.168.0.23, acd pending
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5595] dhcp4 (enp5s0): state changed new lease, address=192.168.0.23
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5601] policy: set 'Wired connection 2' (enp5s0) as default for IPv4 routing and DNS
May 21 17:11:52 frostyarch NetworkManager[647]: <warn>  [1747872712.5613] dns-sd-resolved[e59fcec9bcbdd810]: send-updates SetLinkDomains@2 failed: GDBu>
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5614] device (enp5s0): state change: ip-config -> ip-check (reason 'none', managed->
May 21 17:11:52 frostyarch systemd[1]: Starting Network Manager Script Dispatcher Service...
May 21 17:11:52 frostyarch systemd[1]: Started Network Manager Script Dispatcher Service.
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5917] device (enp5s0): state change: ip-check -> secondaries (reason 'none', manage>
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5918] device (enp5s0): state change: secondaries -> activated (reason 'none', manag>
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5920] manager: NetworkManager state is now CONNECTED_SITE
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5921] device (enp5s0): Activation: successful, device activated.
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5923] manager: NetworkManager state is now CONNECTED_GLOBAL
May 21 17:11:52 frostyarch NetworkManager[647]: <info>  [1747872712.5924] manager: startup complete
May 21 17:11:52 frostyarch systemd[815]: Starting Dunst notification daemon...
May 21 17:11:52 frostyarch systemd[815]: Started Dunst notification daemon.
May 21 17:11:52 frostyarch dunst[1084]: WARNING: Icon 'nm-device-wired' not found in icon_path
May 21 17:11:54 frostyarch systemd-logind[650]: Suspend key pressed short.
May 21 17:11:54 frostyarch systemd-logind[650]: Suspending...
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.4940] manager: sleep: sleep requested (sleeping: no  enabled: yes)
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5060] manager: NetworkManager state is now ASLEEP
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5061] device (enp5s0): state change: activated -> deactivating (reason 'sleeping', >
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5075] device (enp5s0): state change: deactivating -> disconnected (reason 'sleeping>
May 21 17:11:54 frostyarch systemd-networkd[609]: Foreign process 'NetworkManager[647]' changed sysctl '/proc/sys/net/ipv6/conf/enp5s0/disable_ipv6' fr>
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5198] dhcp4 (enp5s0): canceled DHCP transaction
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5198] dhcp4 (enp5s0): activation: beginning transaction (timeout in 45 seconds)
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5198] dhcp4 (enp5s0): state changed no lease
May 21 17:11:54 frostyarch systemd-timesyncd[407]: No network connectivity, watching for changes.
May 21 17:11:54 frostyarch NetworkManager[647]: <info>  [1747872714.5549] device (enp5s0): state change: disconnected -> unmanaged (reason 'unmanaged-s>
May 21 17:11:54 frostyarch kernel: r8169 0000:05:00.0 enp5s0: Link is Down
May 21 17:11:54 frostyarch systemd-networkd[609]: enp5s0: Link DOWN
May 21 17:11:54 frostyarch systemd-networkd[609]: enp5s0: Lost carrier
May 21 17:11:54 frostyarch systemd-networkd[609]: enp5s0: DHCP lease lost
May 21 17:11:54 frostyarch systemd[1]: Reached target Sleep.
May 21 17:11:54 frostyarch systemd[1]: Starting NVIDIA system suspend actions...
May 21 17:11:54 frostyarch suspend[1095]: nvidia-suspend.service
May 21 17:11:54 frostyarch logger[1095]: <13>May 21 17:11:54 suspend: nvidia-suspend.service
May 21 17:11:55 frostyarch systemd[1]: nvidia-suspend.service: Deactivated successfully.
May 21 17:11:55 frostyarch systemd[1]: Finished NVIDIA system suspend actions.
May 21 17:11:55 frostyarch systemd[1]: nvidia-suspend.service: Consumed 412ms CPU time, 156.2M memory peak.
May 21 17:11:55 frostyarch systemd[1]: Starting System Suspend...
May 21 17:11:55 frostyarch systemd-sleep[1105]: User sessions remain unfrozen on explicit request ($SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0).
May 21 17:11:55 frostyarch systemd-sleep[1105]: This is not recommended, and might result in unexpected behavior, particularly
May 21 17:11:55 frostyarch systemd-sleep[1105]: in suspend-then-hibernate operations or setups with encrypted home directories.
May 21 17:11:55 frostyarch systemd-sleep[1105]: Performing sleep operation 'suspend'...
May 21 17:11:55 frostyarch kernel: PM: suspend entry (deep)
-- Boot 14a3cc0129954d9b8cb613231c332646 --
May 21 17:13:23 frostyarch kernel: Linux version 6.14.6-arch1-1 (linux@archlinux) (gcc (GCC) 15.1.1 20250425, GNU ld (GNU Binutils) 2.44.0) #1 SMP PREE>
May 21 17:13:23 frostyarch kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=3e3bed23-02b9-4d03-83e2-98fd9afebf37 rw loglevel=3 quiet spla>
May 21 17:13:23 frostyarch kernel: BIOS-provided physical RAM map:

I do find it somewhat interesting that the sleep seems to be attempted twice in these logs.  Not sure what to make of that, if it means anything at all.

Anyway, I'm going to just revert back to the latest nvidia drivers and pray that someone on this thread (or somewhere else that I happen to come across) comes up with other things I should try smile.

Offline

#19 2025-05-22 00:36:11

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

I have seem some posts on the internet (like this one: https://www.reddit.com/r/pop_os/comment … ndresume/) that suggest that actually disabling the nvidia systemd services helped resolve their suspend issue.  This makes me wonder if it would be productive to debug why my system cannot suspend when those services are disabled for me.  This is what journalctl says when I try it:

May 21 17:30:51 frostyarch systemd[1]: Reached target Sleep.
May 21 17:30:51 frostyarch systemd[1]: Starting System Suspend...
May 21 17:30:51 frostyarch systemd-sleep[1925]: User sessions remain unfrozen on explicit request ($SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0).
May 21 17:30:51 frostyarch systemd-sleep[1925]: This is not recommended, and might result in unexpected behavior, particularly
May 21 17:30:51 frostyarch systemd-sleep[1925]: in suspend-then-hibernate operations or setups with encrypted home directories.
May 21 17:30:51 frostyarch systemd-sleep[1925]: Performing sleep operation 'suspend'...
May 21 17:30:51 frostyarch kernel: PM: suspend entry (deep)
May 21 17:30:51 frostyarch kernel: Filesystems sync: 0.002 seconds
May 21 17:30:58 frostyarch kernel: Freezing user space processes
May 21 17:30:58 frostyarch kernel: Freezing user space processes completed (elapsed 0.001 seconds)
May 21 17:30:58 frostyarch kernel: OOM killer disabled.
May 21 17:30:58 frostyarch kernel: Freezing remaining freezable tasks
May 21 17:30:58 frostyarch kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
May 21 17:30:58 frostyarch kernel: printk: Suspending console(s) (use no_console_suspend to debug)
May 21 17:30:58 frostyarch kernel: serial 00:05: disabled
May 21 17:30:58 frostyarch kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
May 21 17:30:58 frostyarch kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
May 21 17:30:58 frostyarch kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
May 21 17:31:01 frostyarch kernel: ata2.00: Entering standby power mode
May 21 17:31:01 frostyarch kernel: ata1.00: Entering standby power mode
May 21 17:31:01 frostyarch kernel: ata6.00: Entering standby power mode
May 21 17:31:01 frostyarch kernel: NVRM: GPU 0000:06:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted wi>
May 21 17:31:01 frostyarch kernel: nvidia 0000:06:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
May 21 17:31:01 frostyarch kernel: nvidia 0000:06:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5
May 21 17:31:01 frostyarch kernel: nvidia 0000:06:00.0: PM: failed to suspend async: error -5
May 21 17:31:01 frostyarch kernel: PM: Some devices failed to suspend, or early wake event detected
May 21 17:31:01 frostyarch kernel: serial 00:05: activated
May 21 17:31:01 frostyarch kernel: ata5: SATA link down (SStatus 0 SControl 300)
May 21 17:31:01 frostyarch kernel: ata9: SATA link down (SStatus 0 SControl 300)
May 21 17:31:01 frostyarch kernel: ata10: SATA link down (SStatus 0 SControl 300)
May 21 17:31:01 frostyarch kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 21 17:31:01 frostyarch kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 21 17:31:01 frostyarch kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 21 17:31:01 frostyarch kernel: sd 5:0:0:0: [sdc] Starting disk
May 21 17:31:01 frostyarch kernel: ata6.00: configured for UDMA/133
May 21 17:31:01 frostyarch kernel: sd 1:0:0:0: [sdb] Starting disk
May 21 17:31:01 frostyarch kernel: ata2.00: configured for UDMA/133
May 21 17:31:01 frostyarch kernel: ata2.00: Entering active power mode
May 21 17:31:01 frostyarch kernel: sd 0:0:0:0: [sda] Starting disk
May 21 17:31:01 frostyarch kernel: ata1.00: configured for UDMA/133
May 21 17:31:01 frostyarch kernel: ata1.00: Entering active power mode
May 21 17:31:01 frostyarch kernel: usb 1-4.1.3: reset high-speed USB device number 7 using xhci_hcd
May 21 17:31:01 frostyarch kernel: nvme nvme0: 12/0/0 default/read/poll queues
May 21 17:31:01 frostyarch kernel: OOM killer enabled.
May 21 17:31:01 frostyarch kernel: Restarting tasks ... done.
May 21 17:31:01 frostyarch kernel: random: crng reseeded on system resumption
May 21 17:31:01 frostyarch kernel: PM: suspend exit
May 21 17:31:01 frostyarch kernel: PM: suspend entry (s2idle)
May 21 17:31:01 frostyarch kernel: Filesystems sync: 0.000 seconds
May 21 17:31:01 frostyarch kernel: Freezing user space processes
May 21 17:31:01 frostyarch kernel: Freezing user space processes completed (elapsed 0.001 seconds)
May 21 17:31:01 frostyarch kernel: OOM killer disabled.
May 21 17:31:01 frostyarch kernel: Freezing remaining freezable tasks
May 21 17:31:01 frostyarch kernel: Freezing remaining freezable tasks completed (elapsed 0.167 seconds)
May 21 17:31:01 frostyarch kernel: printk: Suspending console(s) (use no_console_suspend to debug)
May 21 17:31:01 frostyarch kernel: serial 00:05: disabled
May 21 17:31:01 frostyarch kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
May 21 17:31:01 frostyarch kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
May 21 17:31:01 frostyarch kernel: ata1.00: Entering standby power mode
May 21 17:31:01 frostyarch kernel: ata2.00: Entering standby power mode
May 21 17:31:01 frostyarch kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
May 21 17:31:01 frostyarch kernel: ata6.00: Entering standby power mode
May 21 17:31:01 frostyarch kernel: NVRM: GPU 0000:06:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted wi>
May 21 17:31:01 frostyarch kernel: nvidia 0000:06:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
May 21 17:31:01 frostyarch kernel: nvidia 0000:06:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5
May 21 17:31:01 frostyarch kernel: nvidia 0000:06:00.0: PM: failed to suspend async: error -5
May 21 17:31:01 frostyarch kernel: PM: Some devices failed to suspend, or early wake event detected
May 21 17:31:01 frostyarch kernel: serial 00:05: activated
May 21 17:31:01 frostyarch kernel: OOM killer enabled.
May 21 17:31:01 frostyarch kernel: Restarting tasks ... done.
May 21 17:31:01 frostyarch kernel: random: crng reseeded on system resumption
May 21 17:31:01 frostyarch kernel: PM: suspend exit
May 21 17:31:01 frostyarch rtkit-daemon[1038]: The canary thread is apparently starving. Taking action.
May 21 17:31:01 frostyarch systemd-sleep[1925]: Failed to put system to sleep. System resumed again: Input/output error
May 21 17:31:01 frostyarch rtkit-daemon[1038]: Demoting known real-time threads.
May 21 17:31:01 frostyarch rtkit-daemon[1038]: Successfully demoted thread 1059 of process 1045.
May 21 17:31:01 frostyarch rtkit-daemon[1038]: Successfully demoted thread 1045 of process 1045.
May 21 17:31:01 frostyarch rtkit-daemon[1038]: Successfully demoted thread 1047 of process 1044.
May 21 17:31:01 frostyarch rtkit-daemon[1038]: Successfully demoted thread 1044 of process 1044.
May 21 17:31:01 frostyarch rtkit-daemon[1038]: Demoted 4 threads.
May 21 17:31:01 frostyarch systemd[1]: systemd-suspend.service: Main process exited, code=exited, status=1/FAILURE
May 21 17:31:01 frostyarch systemd[1]: systemd-suspend.service: Failed with result 'exit-code'.
May 21 17:31:01 frostyarch systemd[1]: Failed to start System Suspend.
May 21 17:31:01 frostyarch systemd[1]: Dependency failed for Suspend.
May 21 17:31:01 frostyarch systemd[1]: suspend.target: Job suspend.target/start failed with result 'dependency'.
May 21 17:31:01 frostyarch systemd-logind[654]: Operation 'suspend' finished.
May 21 17:31:01 frostyarch systemd-resolved[533]: Closing all remaining TCP connections.
May 21 17:31:01 frostyarch systemd-resolved[533]: Resetting learnt feature levels on all servers.
May 21 17:31:01 frostyarch systemd[1]: Stopped target Sleep.
May 21 17:31:01 frostyarch NetworkManager[651]: <info>  [1747873861.1810] manager: sleep: wake requested (sleeping: yes  enabled: yes)
May 21 17:31:01 frostyarch systemd-networkd[532]: enp2s0f0u4u2: Reconfiguring with /etc/systemd/network/20-wired.network.
May 21 17:31:01 frostyarch systemd[1]: Starting NVIDIA system resume actions...

This made me wonder what would happen if I disabled PreserveVideoMemoryAllocations AND the nvidia systemd services.

I got seemingly the same logs after changing /etc/modprobe.d/nvidia.conf to:

options nvidia_drm modeset=1
options nvidia_drm fbdev=1
# options nvidia \
#     NVreg_PreserveVideoMemoryAllocations=1 \
#     NVreg_TemporaryFilePath=/var/tmp
blacklist nouveau
blacklist nvidiafb

and rebooting (I have nothing else in /etc/modprobe.d/):

May 21 17:41:27 frostyarch systemd[1]: Reached target Sleep.
May 21 17:41:27 frostyarch systemd[1]: Starting System Suspend...
May 21 17:41:27 frostyarch systemd-sleep[1553]: User sessions remain unfrozen on explicit request ($SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0).
May 21 17:41:27 frostyarch systemd-sleep[1553]: This is not recommended, and might result in unexpected behavior, particularly
May 21 17:41:27 frostyarch systemd-sleep[1553]: in suspend-then-hibernate operations or setups with encrypted home directories.
May 21 17:41:27 frostyarch systemd-sleep[1553]: Performing sleep operation 'suspend'...
May 21 17:41:27 frostyarch kernel: PM: suspend entry (deep)
May 21 17:41:27 frostyarch kernel: Filesystems sync: 0.008 seconds
May 21 17:41:30 frostyarch kernel: Freezing user space processes
May 21 17:41:30 frostyarch kernel: Freezing user space processes completed (elapsed 0.001 seconds)
May 21 17:41:30 frostyarch kernel: OOM killer disabled.
May 21 17:41:30 frostyarch kernel: Freezing remaining freezable tasks
May 21 17:41:30 frostyarch kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
May 21 17:41:30 frostyarch kernel: printk: Suspending console(s) (use no_console_suspend to debug)
May 21 17:41:30 frostyarch kernel: serial 00:05: disabled
May 21 17:41:30 frostyarch kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
May 21 17:41:30 frostyarch kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
May 21 17:41:30 frostyarch kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
May 21 17:41:30 frostyarch kernel: ata1.00: Entering standby power mode
May 21 17:41:30 frostyarch kernel: ata2.00: Entering standby power mode
May 21 17:41:30 frostyarch kernel: ata6.00: Entering standby power mode
May 21 17:41:30 frostyarch kernel: NVRM: GPU 0000:06:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procf>
May 21 17:41:30 frostyarch kernel: nvidia 0000:06:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
May 21 17:41:30 frostyarch kernel: nvidia 0000:06:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5
May 21 17:41:30 frostyarch kernel: nvidia 0000:06:00.0: PM: failed to suspend async: error -5
May 21 17:41:30 frostyarch kernel: PM: Some devices failed to suspend, or early wake event detected
May 21 17:41:30 frostyarch kernel: serial 00:05: activated
May 21 17:41:30 frostyarch kernel: ata5: SATA link down (SStatus 0 SControl 300)
May 21 17:41:30 frostyarch kernel: ata9: SATA link down (SStatus 0 SControl 300)
May 21 17:41:30 frostyarch kernel: ata10: SATA link down (SStatus 0 SControl 300)
May 21 17:41:30 frostyarch kernel: OOM killer enabled.
May 21 17:41:30 frostyarch kernel: Restarting tasks ... done.
May 21 17:41:30 frostyarch kernel: random: crng reseeded on system resumption
May 21 17:41:30 frostyarch kernel: PM: suspend exit
May 21 17:41:30 frostyarch kernel: PM: suspend entry (s2idle)
May 21 17:41:30 frostyarch kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 21 17:41:30 frostyarch kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 21 17:41:30 frostyarch kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 21 17:41:30 frostyarch kernel: sd 5:0:0:0: [sdc] Starting disk
May 21 17:41:30 frostyarch kernel: ata6.00: configured for UDMA/133
May 21 17:41:30 frostyarch kernel: sd 1:0:0:0: [sdb] Starting disk
May 21 17:41:30 frostyarch kernel: ata2.00: configured for UDMA/133
May 21 17:41:30 frostyarch kernel: ata2.00: Entering active power mode
May 21 17:41:30 frostyarch kernel: sd 0:0:0:0: [sda] Starting disk
May 21 17:41:30 frostyarch kernel: ata1.00: configured for UDMA/133
May 21 17:41:30 frostyarch kernel: ata1.00: Entering active power mode
May 21 17:41:34 frostyarch kernel: nvme nvme0: 12/0/0 default/read/poll queues
May 21 17:41:34 frostyarch kernel: Filesystems sync: 3.592 seconds
May 21 17:41:37 frostyarch kernel: Freezing user space processes
May 21 17:41:37 frostyarch kernel: Freezing user space processes completed (elapsed 0.000 seconds)
May 21 17:41:37 frostyarch kernel: OOM killer disabled.
May 21 17:41:37 frostyarch kernel: Freezing remaining freezable tasks
May 21 17:41:37 frostyarch kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
May 21 17:41:37 frostyarch kernel: printk: Suspending console(s) (use no_console_suspend to debug)
May 21 17:41:37 frostyarch kernel: serial 00:05: disabled
May 21 17:41:37 frostyarch kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
May 21 17:41:37 frostyarch kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
May 21 17:41:37 frostyarch kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
May 21 17:41:37 frostyarch kernel: ata1.00: Entering standby power mode
May 21 17:41:37 frostyarch kernel: ata2.00: Entering standby power mode
May 21 17:41:37 frostyarch kernel: ata6.00: Entering standby power mode
May 21 17:41:37 frostyarch kernel: NVRM: GPU 0000:06:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procf>
May 21 17:41:37 frostyarch kernel: nvidia 0000:06:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
May 21 17:41:37 frostyarch kernel: nvidia 0000:06:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5
May 21 17:41:37 frostyarch kernel: nvidia 0000:06:00.0: PM: failed to suspend async: error -5
May 21 17:41:30 frostyarch kernel: sd 1:0:0:0: [sdb] Starting disk
May 21 17:41:30 frostyarch kernel: ata2.00: configured for UDMA/133
May 21 17:41:30 frostyarch kernel: ata2.00: Entering active power mode
May 21 17:41:30 frostyarch kernel: sd 0:0:0:0: [sda] Starting disk
May 21 17:41:30 frostyarch kernel: ata1.00: configured for UDMA/133
May 21 17:41:30 frostyarch kernel: ata1.00: Entering active power mode
May 21 17:41:34 frostyarch kernel: nvme nvme0: 12/0/0 default/read/poll queues
May 21 17:41:34 frostyarch kernel: Filesystems sync: 3.592 seconds
May 21 17:41:37 frostyarch kernel: Freezing user space processes
May 21 17:41:37 frostyarch kernel: Freezing user space processes completed (elapsed 0.000 seconds)
May 21 17:41:37 frostyarch kernel: OOM killer disabled.
May 21 17:41:37 frostyarch kernel: Freezing remaining freezable tasks
May 21 17:41:37 frostyarch kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
May 21 17:41:37 frostyarch kernel: printk: Suspending console(s) (use no_console_suspend to debug)
May 21 17:41:37 frostyarch kernel: serial 00:05: disabled
May 21 17:41:37 frostyarch kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
May 21 17:41:37 frostyarch kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
May 21 17:41:37 frostyarch kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
May 21 17:41:37 frostyarch kernel: ata1.00: Entering standby power mode
May 21 17:41:37 frostyarch kernel: ata2.00: Entering standby power mode
May 21 17:41:37 frostyarch kernel: ata6.00: Entering standby power mode
May 21 17:41:37 frostyarch kernel: NVRM: GPU 0000:06:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procf>
May 21 17:41:37 frostyarch kernel: nvidia 0000:06:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
May 21 17:41:37 frostyarch kernel: nvidia 0000:06:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5
May 21 17:41:37 frostyarch kernel: nvidia 0000:06:00.0: PM: failed to suspend async: error -5
May 21 17:41:37 frostyarch kernel: PM: Some devices failed to suspend, or early wake event detected
May 21 17:41:37 frostyarch kernel: serial 00:05: activated
May 21 17:41:37 frostyarch kernel: ata9: SATA link down (SStatus 0 SControl 300)
May 21 17:41:37 frostyarch kernel: ata5: SATA link down (SStatus 0 SControl 300)
May 21 17:41:37 frostyarch kernel: ata10: SATA link down (SStatus 0 SControl 300)
May 21 17:41:37 frostyarch kernel: OOM killer enabled.
May 21 17:41:37 frostyarch kernel: Restarting tasks ... done.
May 21 17:41:37 frostyarch kernel: random: crng reseeded on system resumption
May 21 17:41:37 frostyarch kernel: PM: suspend exit
May 21 17:41:37 frostyarch systemd-sleep[1553]: Failed to put system to sleep. System resumed again: Input/output error
May 21 17:41:37 frostyarch kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 21 17:41:37 frostyarch kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 21 17:41:37 frostyarch kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 21 17:41:37 frostyarch kernel: sd 1:0:0:0: [sdb] Starting disk
May 21 17:41:37 frostyarch kernel: sd 5:0:0:0: [sdc] Starting disk
May 21 17:41:37 frostyarch kernel: ata6.00: configured for UDMA/133
May 21 17:41:37 frostyarch kernel: ata2.00: configured for UDMA/133
May 21 17:41:37 frostyarch kernel: ata2.00: Entering active power mode
May 21 17:41:37 frostyarch kernel: sd 0:0:0:0: [sda] Starting disk
May 21 17:41:37 frostyarch kernel: ata1.00: configured for UDMA/133
May 21 17:41:37 frostyarch kernel: ata1.00: Entering active power mode
May 21 17:41:39 frostyarch kernel: nvme nvme0: 12/0/0 default/read/poll queues
May 21 17:41:39 frostyarch systemd[1]: systemd-suspend.service: Main process exited, code=exited, status=1/FAILURE
May 21 17:41:39 frostyarch systemd[1]: systemd-suspend.service: Failed with result 'exit-code'.
May 21 17:41:39 frostyarch systemd[1]: Failed to start System Suspend.
May 21 17:41:39 frostyarch systemd[1]: Dependency failed for Suspend.
May 21 17:41:39 frostyarch systemd[1]: suspend.target: Job suspend.target/start failed with result 'dependency'.
May 21 17:41:39 frostyarch systemd-logind[659]: Operation 'suspend' finished.
May 21 17:41:39 frostyarch systemd-resolved[531]: Closing all remaining TCP connections.
May 21 17:41:39 frostyarch systemd-resolved[531]: Resetting learnt feature levels on all servers.
May 21 17:41:39 frostyarch systemd-networkd[530]: enp2s0f0u4u2: Reconfiguring with /etc/systemd/network/20-wired.network.
May 21 17:41:39 frostyarch NetworkManager[656]: <info>  [1747874499.6121] manager: sleep: wake requested (sleeping: yes  enabled: yes)
May 21 17:41:39 frostyarch systemd[1]: Stopped target Sleep.
May 21 17:41:39 frostyarch systemd[1]: Starting NVIDIA system resume actions...

Maybe my change didn't take for some reason?

Last edited by a-curious-crow (2025-05-22 00:51:57)

Offline

#20 2025-05-22 01:06:55

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

Yep, my change didn't take.  With

options nvidia_drm modeset=1
options nvidia_drm fbdev=1
options nvidia \
    NVreg_PreserveVideoMemoryAllocations=0 \
    NVreg_TemporaryFilePath=/var/tmp
blacklist nouveau
blacklist nvidiafb

and systemd modules disabled:

> systemctl list-unit-files | grep nvidia
nvidia-hibernate.service                                                  disabled        disabled
nvidia-persistenced.service                                               disabled        disabled
nvidia-powerd.service                                                     disabled        disabled
nvidia-resume.service                                                     enabled         disabled
nvidia-suspend-then-hibernate.service                                     disabled        disabled
nvidia-suspend.service                                                    disabled        disabled

my computer will sleep but not wake.  This is true even if I disable nvidia-resume.service.

I think I've already testing this setup, I feel like I'm going in circles sad.

Last edited by a-curious-crow (2025-05-22 01:13:02)

Offline

#21 2025-05-22 01:41:25

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

With the above config (PreserveVideoMemoryAllocations=0 and all systemd modules disabled except for resume), I can get into a strange state where I can resume my machine in a wierd way.  All TTYs are just a blinking '_' and my window manager is completely unresponsive (but I can move my mouse and click on tray icons of all things).  The machine is effectively unusable in this state, and I see nothing in sudo journalctl after I reboot while the machine is in this state.

FWIW I thought https://github.com/yshui/picom/issues/1398 might be related, so i disabled picom for all these tests.

Last edited by a-curious-crow (2025-05-22 01:48:22)

Offline

#22 2025-05-23 07:51:44

seth
Member
Registered: 2012-09-03
Posts: 64,153

Re: Resume issue - nvme related ?

window manager is completely unresponsive (but I can move my mouse and click on tray icons

Then how is what particular WM "unrepsonsive"? Does it suffice to kill/restart it or the X11 server?

Offline

#23 2025-05-23 18:05:40

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

I cannot open any programs (including a terminal).  And my TTY terminals are unresponsive (blinking '_').

Offline

#24 2025-05-23 19:47:25

seth
Member
Registered: 2012-09-03
Posts: 64,153

Re: Resume issue - nvme related ?

But running programs work fine-ish?
Keep "dmesg -w" running, you're probably loosing the root device - for an nvme see https://wiki.archlinux.org/title/Solid_ … leshooting and afford it the entire package:
"nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off iommu=soft"

Offline

#25 2025-06-03 02:20:13

a-curious-crow
Member
Registered: 2024-09-15
Posts: 24

Re: Resume issue - nvme related ?

Running programs do not all work fine.  For instance, all my windows will close right after waking from sleep.  I tried adding those kernel parameters as well as amd_iommu=off and nvme.noacpi=1 (to my /etc/default/grub, then running grub-mkconfig -o /boot/grub/grub.cfg), and they don't seem to have any effect.  I am running off an nvme drive fwiw (maybe you already knew that).

The only thing I can interact with is my QTile window manager bar.  Namely, I can click on the different workspaces to switch to them, and I can click on some icons in the tray.  I can right-click on them to get a menu that I can interact with, but if I ever open a window via these icons the window and the icon immediately crash (perhaps this is because they are trying to use disk?). 

Partly because of this issue, I got a hold of an AMD GPU (6700xt) and installed it, removing all nvidia drivers and settings.  With the new card, I always get the unresponsive wake behavior, making me think that (1) the issues I had before with a black screen when waking were due to the nvidia suspend/resume systemd services and PreserveVideoMemoryAllocations=0, and (2) the current wake from sleep issue is not GPU related.  FWIW sleep/wake on windows on the same machine works fine.

I am using "deep" sleep as per https://wiki.archlinux.org/title/Power_ … end_method.  I just tried switching to s2idle and now I just have a black screen when i try to resume.

I wish I could access logs for after my machine wakes, but `sudo journalctl` is totally empty.  Are there other places I could look?  Likely not if the issue is that my drive is failing to work after wake haha...

This NVME explanation is making more and more sense, given it started when i updated my cpu and motherboard bios, and it seems like a somewhat known issue.  AND there are no logs written to this drive after resume.  Do you know anything else I could try to fix it?  I would really like to keep using this drive if possible.

Last edited by a-curious-crow (2025-06-03 03:33:19)

Offline

Board footer

Powered by FluxBB