[SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

navidmafi · 2025-11-27 22:10:34

The title is quite condensed. Here's some background:
- I have an nvidia optimus laptop with an Intel 11th Gen iGPU and an RTX3050 dGPU
- I am able to fully power down to D3cold given that all nvidia* kernel modules are unloaded and nvidia-powerd together with nvidia-persistenced are stopped.
- I'm using a wayland compositor (hyprland)
- I have nvidia_drm.modeset=0 set in kernel cmdline
- I am able to hotload the dGPU and use for CUDA (works perfectly) and graphical output as long as I use Xwayland
- I am NOT able to get graphical output on wayland natively from the dGPU. I'd either get coredumps (gdb bt shows enumeration failure) or errors about how there isn't any suitable presentable surface available.

vulkaninfo doesn't show VK_KHR_wayland_surface ext available when booted with nvidia_drm.modeset=0. However, VK_KHR_xcb_surface and VK_KHR_xlib_surface are available. I can render vulkan/opengl apps on dGPU if i'm willing to use Xwayland.

If I do set nvidia_drm.modeset=1 then I'd lose my hot-unloading capabilities.

Besides just sticking with Xwayland, do I have any other way to both be able to power down to D3Cold and also have VK_KHR_wayland_surface for wayland-native vulkan dGPU surfaces? My knowledge is limited about DRI so please accept my apologies if I'm asking for a logically impossible mixture.

Alternative title: Is D3Cold removal ever possible with nvidia_drm.modeset=1

Last edited by navidmafi (2025-12-04 17:19:30)

V1del · 2025-11-27 22:55:23

Nope. You can't wayland without modesetting. What I'm more inclined to think the problem is that hyprland will access the card when it's available and thus block unloading. You could try whether configuring hyprland away from that helps, but to make it grab it again you'd have to restart as afaik this only works via environment env, which I'm not sure how well hyprland can hotload those.

If you want to attempt that, set https://wiki.hypr.land/Configuring/Envi … -variables AQ_DRM_DEVICES= to your integrated card by default and try to add the dedicated one explicitly when trying to use it directly.

navidmafi · 2025-11-28 08:04:29

Thank you for your decisive reply.
I have removed nvidia_drm.modeset=0 and will try to make my hotswap workflow compatible with that. I have also limited AQ to use card0 (iGPU) only.

I can wayland on nvidia now as expected. When booted with nvidia loaded, hyprland does not seem to access any nvidia* or card1 device but wayland vulkan surfaces work fine and I have a nice rotating cube rendered by nvidia.

 # lsof /dev/dri/card1 /dev/nvidia*
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
COMMAND  PID  USER  FD   TYPE  DEVICE SIZE/OFF NODE NAME
vkcube  6838 navid mem    CHR 195,255          1248 /dev/nvidiactl
vkcube  6838 navid mem    CHR   195,0          1250 /dev/nvidia0
vkcube  6838 navid   4u   CHR 195,255      0t0 1248 /dev/nvidiactl
vkcube  6838 navid   5u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid   6u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid   7u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid   8u   CHR 195,255      0t0 1248 /dev/nvidiactl
vkcube  6838 navid   9u   CHR 195,255      0t0 1248 /dev/nvidiactl
vkcube  6838 navid  10u   CHR 195,254      0t0 1252 /dev/nvidia-modeset
vkcube  6838 navid  11u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  12u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  14u   CHR 195,255      0t0 1248 /dev/nvidiactl
vkcube  6838 navid  16u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  18u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  22u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  23u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  24u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  25u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  26u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  27u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  28u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  29u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  30u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  31u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  32u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  35u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  36u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  37u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  40u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  42u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  45u   CHR   195,0      0t0 1250 /dev/nvidia0
vkcube  6838 navid  47u   CHR   195,0      0t0 1250 /dev/nvidia0

Removing nvidia from pci and rescanning works and brings the dGPU online again.

 echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove 
 sleep 10
 echo 1 > /sys/bus/pci/rescan

With a huge caveat that hyprland now has an fd open on the card? Even though it could render just fine without such handles before?

# lsof /dev/nvidia* /dev/dri/card1
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
COMMAND    PID  USER  FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd      1  root 203u   CHR  226,1      0t0 1329 /dev/dri/card1
systemd-l  926  root  57u   CHR  226,1      0t0 1329 /dev/dri/card1
Hyprland  5711 navid 173u   CHR  226,1      0t0 1329 /dev/dri/card1
Hyprland  5711 navid 178u   CHR  226,1      0t0 1329 /dev/dri/card1
Hyprland  5711 navid 181u   CHR  226,1      0t0 1329 /dev/dri/card1

And that means I won't be able to remove nvidia once again without restarting hyprland.

navidmafi · 2025-11-28 09:25:55

Well I could just ignore it. I know this is theoretically dangerous and could leave devices/kernel in a bad state but it does the trick. I can hotswap nvidia (down to D3) and have wayland native surfaces. I'm happy for now.

My platform also provides a dedicated interface for dGPU switching but it is a bit unreliable at times. I have commented it out but included it for reference. Through purely empirical evidence (CPU performance on battery improving noticeably after removal) this should suffice.
Is there anything I should worry about when doing it this way?

#!/bin/bash
# needs su
set -euo pipefail

case "$1" in
  0)
    # Ideally you would also include /dev/dri/card1 in the following
    out="$(lsof /dev/nvidia* 2>/dev/null || true)"
    if [ -n "$out" ]; then
      echo "device still in use"
      echo "$out"
      exit 1
    fi
    echo 0000:01:00.0 > /sys/bus/pci/drivers/nvidia/unbind
    modprobe -r nvidia_drm nvidia_modeset nvidia_uvm nvidia
    echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
    echo 0 > /sys/bus/pci/slots/1/power

    # The following seems to removes uncleanly
    # echo 1 | sudo tee /sys/devices/platform/asus-nb-wmi/dgpu_disable
    ;;
  1)
    # echo 0 > /sys/devices/platform/asus-nb-wmi/dgpu_disable
    echo 1 > /sys/bus/pci/slots/1/power
    echo 1 > /sys/bus/pci/rescan
    ;;
  *)
    echo "either 0 or 1"
    exit 1
    ;;
esac

V1del · 2025-11-28 19:21:08

Sounds alrightish to me, but don't really have much hands on experience here. Technically a "stale" FD being opened and the nvidia modules loaded should not inherently prevent d3cold ("old" xorg PRIME worked the same way, xorg would access the card and have an FD open regardless of what you're doing, but whether the card would suspend/spin up depended on actual load and vram usage little/no CPU load and less than 20MB of active VRAM usage would "mark" the card as suspendable), speaking of which, you've seen and tried to apply https://wiki.archlinux.org/title/PRIME#NVIDIA ?

Last edited by V1del (2025-11-28 19:21:31)

navidmafi · 2025-11-30 20:02:34

Oh that bit on the old PRIME behavior is a very useful piece of information. My ultimate goal is just a bit more battery life when on the go and some marginal thermal improvements when on AC. I would prefer the OS to manage the card itself.

I have added those udev rules although they are now commented them out to test that script. How do I make sure the card is in d3cold without some kind of observer effect waking the card up? Do I just

cat /sys/class/drm/card1/device/power_state

Last edited by navidmafi (2025-12-01 19:44:50)

V1del · 2025-12-02 10:49:16

As far as I know cat-ing that should work without inducing a wakeup, yes

navidmafi · 2025-12-02 11:13:36

I'm worried about the topic diverging from the title. But I can't get the card to d3cold automatically.

 ~ cat /proc/driver/nvidia/gpus/0000:01:00.0/power                 
Runtime D3 status:          Enabled (fine-grained)
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Supported
 Video Memory Off:          Supported

S0ix Power Management:
 Platform Support:          Supported
 Status:                    Disabled

Notebook Dynamic Boost:     Supported
 ~

~  sudo lsof /dev/nvidia* /dev/dri/card1
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
~

I have added the udev rules. Have restarted multiple times since.

~ cat /etc/udev/rules.d/80-nvidia-pm.rules                        
# Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind
 ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
 ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"

# Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind
 ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
 ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"

# Enable runtime PM for NVIDIA VGA/3D controller devices on adding device
 ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
 ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
~

I have also disabled GSP as instructed in the wiki. Have also restarted:

~ cat /etc/modprobe.d/nvidia-pm.conf 
options nvidia "NVreg_DynamicPowerManagement=0x03" <----- My card is Ampere (3050 Mobile) hence the 0x03 instead of 0x02
options nvidia "NVreg_PreserveVideoMemoryAllocations=1"
~ cat  /etc/modprobe.d/nvidia-gsp.conf  
options nvidia "NVreg_EnableGpuFirmware=0"
~

I have nvidia-powerd and nvidia-persistenced both unloaded. But having them loaded doesn't change the ultimate result (they show up in lsof though.)

But ultimately

~ cat /sys/class/drm/card1/device/power_state                     
D0

~ cat /sys/class/drm/card1/device/power/runtime_status            
active
~

No increment is shown here:

~ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_suspended_time
492
~

What could I be doing wrong here?

Last edited by navidmafi (2025-12-02 11:16:53)

navidmafi · 2025-12-04 17:18:28

The D3 issue is getting a bit complicated. I will open a new topic to preserve order.

Marking my original issue as solved since I have wayland native rendering now!
I have also added two new hints in the wiki about this.

Thank you.

Arch Linux

#1 2025-11-27 22:10:34

[SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

#2 2025-11-27 22:55:23

Re: [SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

#3 2025-11-28 08:04:29

Re: [SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

#4 2025-11-28 09:25:55

Re: [SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

#5 2025-11-28 19:21:08

Re: [SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

#6 2025-11-30 20:02:34

Re: [SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

#7 2025-12-02 10:49:16

Re: [SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

#8 2025-12-02 11:13:36

Re: [SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

#9 2025-12-04 17:18:28

Re: [SOLVED] VK_KHR_wayland_surface possible without nvidia modesetting?

Board footer