You are not logged in.

#1 2024-11-26 08:41:51

thepigeongenerator
Member
From: Netherlands
Registered: 2024-11-26
Posts: 3

[SOLVED] Need to kill Xorg on (almost) every boot.

Hello, so I have been experiencing this problem in regards to Xorg.
Firstly, the problem: when I boot into my computer, (often) it displays a blank screen with a blinking cursor. Usually indicative to a graphics error.
Killing Xorg solves the problem im the short term. Though, the issue resurfaces on each boot.
Xorg does not produce any meaningful logs. Filtering for just warnings and errors using the regex \((ww|ee)\) only produces 2 warnings from NVIDIA(0):

	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  1516.391] (WW) NVIDIA(0): Unable to get display device for DPI computation.
[  1516.478] (WW) NVIDIA(0): ACPI: failed to determine the system's current power source

These warnings appear whether the display shows something or not.
Checking journalctl on my display manager also yields no errors. I believe I also checked the log file, which wasn't too meaningful. Either-way, it shouldn't be the display manager as I've tried using another one.
Aditionally, I tried adding the nVidia modules to the MODULES array in /etc/mkinitcpio.conf, rebuilt initframs. But the issue remained. I also tried using the nvidia or nvidia-dkms packages, as opposed to the nvidia-open package. This yielded no results.
I currently just wrote some bash code which just executes killall Xorg once on boot.
Though, this isn't the most elegant solution.

some aditional information:
I am running the linux kernel. My laptop uses hybrid graphics, one of which is nVidia. I configured Xorg to use my dedicated graphics only, as I prefer that over manually setting it for each application using something like optimus. Because I know I will forget this. This works fine, as once I do boot, my nVidia graphics are used. Which I know through nvidia-smi.

I am really at a loss here, I have no idea what could be causing the problem and I feel like I have tried everything that is within my knowledge.
I think I have excluded the possibility that this is an issue with nVidia drivers loading, but I don't know.

I am typing this on my phone, but let me know if there are some log files you'd like me to share here, and I'll try my best to get back to you as soon as I can.

Last edited by thepigeongenerator (2024-11-26 17:07:23)

Offline

#2 2024-11-26 09:04:01

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 23,289

Re: [SOLVED] Need to kill Xorg on (almost) every boot.

Grepped xorg files are not really useful, post a full xorg log and a full

sudo journalctl -b | curl -F 'file=@-' 0x0.st

one immediate suggestion I have, especially since you are using implied offloading, is to disable fbdev on the nvidia driver by adding to the kernel parameters (needs to be cmdline, modprobe.conf won't do for a secondary benefit)

nvidia-drm.modeset=1 nvidia-drm.fbdev=0

Offline

#3 2024-11-26 10:51:48

thepigeongenerator
Member
From: Netherlands
Registered: 2024-11-26
Posts: 3

Re: [SOLVED] Need to kill Xorg on (almost) every boot.

Here you go:
journal:
http://0x0.st/X57N.txt

xorg log:
http://0x0.st/X57c.txt

I hope this helps!! ^w^

note:
I did just boot the system with the automatic kill script in place. The xorg log is thus the /var/log/Xorg.0.log.old file. The journal will contain both, considering it's just from this boot.

Offline

#4 2024-11-26 13:01:15

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 23,289

Re: [SOLVED] Need to kill Xorg on (almost) every boot.

mhm.. simpledrm is alive and kicking which is definitely not what you want, did you follow the wiki properly and removed the kms hook from mkinitcpio? While normally a good reflex, in the case of a multi-GPU system that actually needs the integrated card as well to be able to do anything, you probably should either keep that or more precisely add i915 to the MODULES= array in the mkinitcpio.conf before listing all the nvidia modules, so the i915 driver can be loaded early as well.

There are also some crashes of the intel driver that it eventually recovers from, but I'm suspecting them to be contributing to the issue.

So initial suggestion, set the kernel parameters I mentioned above, and add i915 to the MODULES= section, regenerate the initramfs (disable your kill script) and see whether you still have an issue.

Also somewhat OT, if you're sure you're happy and want this kind of setup feel free to ignore but I think it's important to mention:

These laptops are designed to run the majority of things on the integrated card and you will generally have a net loss in performance if you use the dedicated GPU despite the integrated one being more than sufficient for this task, you can't avoid the iGPU anyway, since that what your laptop screen is attached to, so at best you just render everything on nvidia and make the iGPU a dumb device plastering the screen with what got rendered. This will have drawbacks especially when e.g. doing video decoding, since the main benefit of being able to decode and present in one go is lost if the rendering device is also not the displaying device. As for the fear of having to remember to explicitly prime-run every program, that's really only true for native games/a one way  adjustment in a desktop file for blender or so. Everything that uses DXVK/VKD3D like proton and wine has code built in and will automatically land on the dedicated card without you having to explicitly remember to prime-run them, so that's probably the major reason one might not want this that will implicitly be taken care of.

Last edited by V1del (2024-11-26 13:10:28)

Offline

#5 2024-11-26 17:05:47

thepigeongenerator
Member
From: Netherlands
Registered: 2024-11-26
Posts: 3

Re: [SOLVED] Need to kill Xorg on (almost) every boot.

You're totally right! Adding i915 seems to have solved the problem.
I also remember noticing some flickering when booting, even before the kill Xorg script. Which probably had to do with the intel driver struggling to load.
It wasn't enough to feel like something was off but it was enough to take a note of it (apparently). But that is gone now.
I have now power-cycled the device a couple of times, to verify that the issue has indeed been solved. But the fact that it was happening somewhat irregularly means that it was probably indeed a race condition here.
I am wondering what in my logs exactly pointed you to this? So I can understand for myself what lead you to this conclusion.

Lastly, about the dedicated graphics vs intergrated graphics:
I didn't know that proton would automatically select the correct GPU! I do know about editing .desktop files... And if I am completely honest; I can't be bothered to do that, it'd introduce some thinking work that I don't want to do when I am trying to work on projects. But yeah, I am aware that it's worse for the computer, generally just power draw, but also the lifespan. I might look into it more in the future, probably not though.

Offline

#6 2024-11-27 00:22:27

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 23,289

Re: [SOLVED] Need to kill Xorg on (almost) every boot.

nvidia loads here

Nov 26 11:39:18 arch-laptop-q kernel: nvidia: loading out-of-tree module taints kernel.
Nov 26 11:39:18 arch-laptop-q kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Nov 26 11:39:18 arch-laptop-q kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241
Nov 26 11:39:18 arch-laptop-q kernel: 
Nov 26 11:39:18 arch-laptop-q kernel: nvidia 0000:01:00.0: enabling device (0006 -> 0007)
Nov 26 11:39:18 arch-laptop-q kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  565.57.01  Release Build  (archlinux-builder@)  
Nov 26 11:39:18 arch-laptop-q kernel: wmi_bus wmi_bus-PNP0C14:02: [Firmware Info]: B7F3CA0A-ACDC-42D2-9217-77C6C628FBD2 has zero instances
Nov 26 11:39:18 arch-laptop-q kernel: wmi_bus wmi_bus-PNP0C14:02: [Firmware Bug]: WQAE data block query control method not found
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-1: new full-speed USB device number 2 using xhci_hcd
Nov 26 11:39:18 arch-laptop-q kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  565.57.01  Release Build  (archlinux-builder@)  
Nov 26 11:39:18 arch-laptop-q kernel: nvidia-uvm: Loaded the UVM driver, major device number 239.
Nov 26 11:39:18 arch-laptop-q kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-1: New USB device found, idVendor=046d, idProduct=c547, bcdDevice= 4.02
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-1: Product: USB Receiver
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-1: Manufacturer: Logitech
Nov 26 11:39:18 arch-laptop-q kernel: usbcore: registered new interface driver usbhid
Nov 26 11:39:18 arch-laptop-q kernel: usbhid: USB HID core driver
Nov 26 11:39:18 arch-laptop-q kernel: input: Logitech USB Receiver as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.0/0003:046D:C547.0001/input/input3
Nov 26 11:39:18 arch-laptop-q kernel: hid-generic 0003:046D:C547.0001: input,hidraw0: USB HID v1.11 Mouse [Logitech USB Receiver] on usb-0000:00:14.0-1/input0
Nov 26 11:39:18 arch-laptop-q kernel: input: Logitech USB Receiver Keyboard as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.1/0003:046D:C547.0002/input/input4
Nov 26 11:39:18 arch-laptop-q kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11.
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-8: new full-speed USB device number 3 using xhci_hcd
Nov 26 11:39:18 arch-laptop-q kernel: hid-generic 0003:046D:C547.0002: input,hidraw1: USB HID v1.11 Keyboard [Logitech USB Receiver] on usb-0000:00:14.0-1/input1
Nov 26 11:39:18 arch-laptop-q kernel: hid-generic 0003:046D:C547.0003: hiddev96,hidraw2: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:00:14.0-1/input2
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-8: New USB device found, idVendor=048d, idProduct=c966, bcdDevice=14.00
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-8: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-8: Product: ITE Device(8176)
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-8: Manufacturer: ITE Tech. Inc.
Nov 26 11:39:18 arch-laptop-q kernel: input: ITE Tech. Inc. ITE Device(8176) Keyboard as /devices/pci0000:00/0000:00:14.0/usb1/1-8/1-8:1.0/0003:048D:C966.0004/input/input5
Nov 26 11:39:18 arch-laptop-q kernel: input: ITE Tech. Inc. ITE Device(8176) Wireless Radio Control as /devices/pci0000:00/0000:00:14.0/usb1/1-8/1-8:1.0/0003:048D:C966.0004/input/input6
Nov 26 11:39:18 arch-laptop-q kernel: hid-generic 0003:048D:C966.0004: input,hiddev97,hidraw3: USB HID v1.10 Keyboard [ITE Tech. Inc. ITE Device(8176)] on usb-0000:00:14.0-8/input0
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-10: new full-speed USB device number 4 using xhci_hcd
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-10: New USB device found, idVendor=8087, idProduct=0026, bcdDevice= 0.02
Nov 26 11:39:18 arch-laptop-q kernel: usb 1-10: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Nov 26 11:39:18 arch-laptop-q kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1
Nov 26 11:39:18 arch-laptop-q kernel: nvidia 0000:01:00.0: [drm] No compatible format found
Nov 26 11:39:18 arch-laptop-q kernel: nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes

and intel starts to show up a whooping 2 seconds later, which is basically an eternity in modern computer land

Nov 26 11:39:20 arch-laptop-q kernel: Console: switching to colour dummy device 80x25
Nov 26 11:39:20 arch-laptop-q kernel: i915 0000:00:02.0: vgaarb: deactivate vga console
Nov 26 11:39:20 arch-laptop-q kernel: i915 0000:00:02.0: [drm] Using Transparent Hugepages
Nov 26 11:39:20 arch-laptop-q kernel: i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem

So that's pretty clearly visible, and since the nvidia gpu can't display anything before the intel GPU is also ready, you likely land in the situation where xorg first starts before the intel card has fully come up. With early loading you ensure they both start earlier, so they're hopefully ready by the time a GUI wants to actually make use of them. These race conditions can happen even without such a constellation, which is why it's always reccommended to load the GPU drivers early  (and which is why the KMS hook got enabled by default, which will work properly with all the in kernel drivers)

Offline

Board footer

Powered by FluxBB