You are not logged in.
The title is quite condensed. Here's some background:
- I have an nvidia optimus laptop with an Intel 11th Gen iGPU and an RTX3050 dGPU
- I am able to fully power down to D3cold given that all nvidia* kernel modules are unloaded and nvidia-powerd together with nvidia-persistenced are stopped.
- I'm using a wayland compositor (hyprland)
- I have nvidia_drm.modeset=0 set in kernel cmdline
- I am able to hotload the dGPU and use for CUDA (works perfectly) and graphical output as long as I use Xwayland
- I am NOT able to get graphical output on wayland natively from the dGPU. I'd either get coredumps (gdb bt shows enumeration failure) or errors about how there isn't any suitable presentable surface available.
vulkaninfo doesn't show VK_KHR_wayland_surface ext available when booted with nvidia_drm.modeset=0. However, VK_KHR_xcb_surface and VK_KHR_xlib_surface are available. I can render vulkan/opengl apps on dGPU if i'm willing to use Xwayland.
If I do set nvidia_drm.modeset=1 then I'd lose my hot-unloading capabilities.
Besides just sticking with Xwayland, do I have any other way to both be able to power down to D3Cold and also have VK_KHR_wayland_surface for wayland-native vulkan dGPU surfaces? My knowledge is limited about DRI so please accept my apologies if I'm asking for a logically impossible mixture.
Alternative title: Is D3Cold removal ever possible with nvidia_drm.modeset=1
Last edited by navidmafi (2025-12-04 17:19:30)
Offline
Nope. You can't wayland without modesetting. What I'm more inclined to think the problem is that hyprland will access the card when it's available and thus block unloading. You could try whether configuring hyprland away from that helps, but to make it grab it again you'd have to restart as afaik this only works via environment env, which I'm not sure how well hyprland can hotload those.
If you want to attempt that, set https://wiki.hypr.land/Configuring/Envi … -variables AQ_DRM_DEVICES= to your integrated card by default and try to add the dedicated one explicitly when trying to use it directly.
Offline
Thank you for your decisive reply.
I have removed nvidia_drm.modeset=0 and will try to make my hotswap workflow compatible with that. I have also limited AQ to use card0 (iGPU) only.
I can wayland on nvidia now as expected. When booted with nvidia loaded, hyprland does not seem to access any nvidia* or card1 device but wayland vulkan surfaces work fine and I have a nice rotating cube rendered by nvidia.
# lsof /dev/dri/card1 /dev/nvidia*
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
vkcube 6838 navid mem CHR 195,255 1248 /dev/nvidiactl
vkcube 6838 navid mem CHR 195,0 1250 /dev/nvidia0
vkcube 6838 navid 4u CHR 195,255 0t0 1248 /dev/nvidiactl
vkcube 6838 navid 5u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 6u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 7u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 8u CHR 195,255 0t0 1248 /dev/nvidiactl
vkcube 6838 navid 9u CHR 195,255 0t0 1248 /dev/nvidiactl
vkcube 6838 navid 10u CHR 195,254 0t0 1252 /dev/nvidia-modeset
vkcube 6838 navid 11u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 12u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 14u CHR 195,255 0t0 1248 /dev/nvidiactl
vkcube 6838 navid 16u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 18u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 22u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 23u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 24u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 25u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 26u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 27u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 28u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 29u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 30u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 31u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 32u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 35u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 36u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 37u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 40u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 42u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 45u CHR 195,0 0t0 1250 /dev/nvidia0
vkcube 6838 navid 47u CHR 195,0 0t0 1250 /dev/nvidia0Removing nvidia from pci and rescanning works and brings the dGPU online again.
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
sleep 10
echo 1 > /sys/bus/pci/rescanWith a huge caveat that hyprland now has an fd open on the card? Even though it could render just fine without such handles before?
# lsof /dev/nvidia* /dev/dri/card1
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
systemd 1 root 203u CHR 226,1 0t0 1329 /dev/dri/card1
systemd-l 926 root 57u CHR 226,1 0t0 1329 /dev/dri/card1
Hyprland 5711 navid 173u CHR 226,1 0t0 1329 /dev/dri/card1
Hyprland 5711 navid 178u CHR 226,1 0t0 1329 /dev/dri/card1
Hyprland 5711 navid 181u CHR 226,1 0t0 1329 /dev/dri/card1And that means I won't be able to remove nvidia once again without restarting hyprland.
Offline
Well I could just ignore it. I know this is theoretically dangerous and could leave devices/kernel in a bad state but it does the trick. I can hotswap nvidia (down to D3) and have wayland native surfaces. I'm happy for now.
My platform also provides a dedicated interface for dGPU switching but it is a bit unreliable at times. I have commented it out but included it for reference. Through purely empirical evidence (CPU performance on battery improving noticeably after removal) this should suffice.
Is there anything I should worry about when doing it this way?
#!/bin/bash
# needs su
set -euo pipefail
case "$1" in
0)
# Ideally you would also include /dev/dri/card1 in the following
out="$(lsof /dev/nvidia* 2>/dev/null || true)"
if [ -n "$out" ]; then
echo "device still in use"
echo "$out"
exit 1
fi
echo 0000:01:00.0 > /sys/bus/pci/drivers/nvidia/unbind
modprobe -r nvidia_drm nvidia_modeset nvidia_uvm nvidia
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
echo 0 > /sys/bus/pci/slots/1/power
# The following seems to removes uncleanly
# echo 1 | sudo tee /sys/devices/platform/asus-nb-wmi/dgpu_disable
;;
1)
# echo 0 > /sys/devices/platform/asus-nb-wmi/dgpu_disable
echo 1 > /sys/bus/pci/slots/1/power
echo 1 > /sys/bus/pci/rescan
;;
*)
echo "either 0 or 1"
exit 1
;;
esacOffline
Sounds alrightish to me, but don't really have much hands on experience here. Technically a "stale" FD being opened and the nvidia modules loaded should not inherently prevent d3cold ("old" xorg PRIME worked the same way, xorg would access the card and have an FD open regardless of what you're doing, but whether the card would suspend/spin up depended on actual load and vram usage little/no CPU load and less than 20MB of active VRAM usage would "mark" the card as suspendable), speaking of which, you've seen and tried to apply https://wiki.archlinux.org/title/PRIME#NVIDIA ?
Last edited by V1del (2025-11-28 19:21:31)
Offline
Oh that bit on the old PRIME behavior is a very useful piece of information. My ultimate goal is just a bit more battery life when on the go and some marginal thermal improvements when on AC. I would prefer the OS to manage the card itself.
I have added those udev rules although they are now commented them out to test that script. How do I make sure the card is in d3cold without some kind of observer effect waking the card up? Do I just
cat /sys/class/drm/card1/device/power_stateLast edited by navidmafi (2025-12-01 19:44:50)
Offline
As far as I know cat-ing that should work without inducing a wakeup, yes
Offline
I'm worried about the topic diverging from the title. But I can't get the card to d3cold automatically.
~ cat /proc/driver/nvidia/gpus/0000:01:00.0/power
Runtime D3 status: Enabled (fine-grained)
Video Memory: Active
GPU Hardware Support:
Video Memory Self Refresh: Supported
Video Memory Off: Supported
S0ix Power Management:
Platform Support: Supported
Status: Disabled
Notebook Dynamic Boost: Supported
~ ~ sudo lsof /dev/nvidia* /dev/dri/card1
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
~ I have added the udev rules. Have restarted multiple times since.
~ cat /etc/udev/rules.d/80-nvidia-pm.rules
# Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind
ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
# Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind
ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"
# Enable runtime PM for NVIDIA VGA/3D controller devices on adding device
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
~I have also disabled GSP as instructed in the wiki. Have also restarted:
~ cat /etc/modprobe.d/nvidia-pm.conf
options nvidia "NVreg_DynamicPowerManagement=0x03" <----- My card is Ampere (3050 Mobile) hence the 0x03 instead of 0x02
options nvidia "NVreg_PreserveVideoMemoryAllocations=1"
~ cat /etc/modprobe.d/nvidia-gsp.conf
options nvidia "NVreg_EnableGpuFirmware=0"
~I have nvidia-powerd and nvidia-persistenced both unloaded. But having them loaded doesn't change the ultimate result (they show up in lsof though.)
But ultimately
~ cat /sys/class/drm/card1/device/power_state
D0
~ cat /sys/class/drm/card1/device/power/runtime_status
active
~No increment is shown here:
~ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_suspended_time
492
~What could I be doing wrong here?
Last edited by navidmafi (2025-12-02 11:16:53)
Offline
Offline