You are not logged in.

#1 2018-09-12 15:11:31

NLisa
Member
Registered: 2016-08-23
Posts: 10

Xorg configuration for 2(SLI) NVIDIA GTX 1080ti - Possible Bug Report?

Good day,

I wanted to first gauage whether anyone else is experiencing these issues before making a formal Bug Report and the correpsonding Wiki Updates.

As stipulated in the titled, my configuration consists of 2 x Nvidia GTX 1080 Ti's, with an SLI bridge, the main card has an HDMI connection to a single monitor, and this has been tested against a freashly deployed Arch system with the NVidia drivers,  Gnome, GDM, (Non-Wayland) Xorg configuration.

Following the Nvidia, Gnome, and GDM ArchWiki's, I disabled Wayland:

/etc/gdm/custom.conf

Wayland=false

My system would intermittently hang on boot. Stopping on the TTY Terminal at around Graphical Interface Reached, Started GDM, etc.... I'd either need to reboot into the LiveUSB or sometimes I'd be able to switch to another TTY terminal.

After spending several hours of my employers' time, removing the SLI Bridge and with inspiration from Ubuntu LaunchPad Bug #1752053 and Ubuntu LaunchPad Bug #1756226, commenting out the PrimaryGPU option, allowed me to at least get into a graphical enironment, albeit without SLI enabled.

/usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf

...

Section "OutputClass"
...
    # Option "PrimaryGPU" "yes"
...

Should this be reported as a bug? Is there a race condition between the Intel graphics drivers and multiple GPU's and the NVIDIA drivers? Is there anywhere that this configuration option is needed other than Optimus / Bumblebee configurations?

Next on connecting the SLI bridge, Xorg would crash during startup, but the freezing issues were fortunately fixed with the above. The steps to reproduce this particular crash follow from the ArchWiki NVIDIA Tips and Tricks - Enabling SLI,

  1. Determine PCIBusId of primary GPU

    lspci -k | grep -i vga
    
    17:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
    65:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
  2. Create / Modify xorg.conf

    nvidia-xconfig --busid=PCI:17:0:0 --sli=AA
  3. Resulting in the following modifiations to Device and Screen sections

    /etc/X11/xorg.conf                                                                                              
    
    # nvidia-xconfig: X configuration file generated by nvidia-xconfig
    # nvidia-xconfig:  version 396.54  (buildmeister@swio-display-x64-rhel04-14)  Wed Aug 15 00:22:27 PDT 2018
    
    Section "ServerLayout"
        Identifier     "Layout0"
        Screen      0  "Screen0"
        InputDevice    "Keyboard0" "CoreKeyboard"
        InputDevice    "Mouse0" "CorePointer"
    EndSection
    
    Section "Files"
    EndSection
    
    Section "InputDevice"
        # generated from default
        Identifier     "Mouse0"
        Driver         "mouse"
        Option         "Protocol" "auto"
        Option         "Device" "/dev/psaux"
        Option         "Emulate3Buttons" "no"
        Option         "ZAxisMapping" "4 5"
    EndSection
    
    Section "InputDevice"
        # generated from default
        Identifier     "Keyboard0"
        Driver         "kbd"
    EndSection
    
    Section "Monitor"
        Identifier     "Monitor0"
        VendorName     "Unknown"
        ModelName      "Unknown"
        HorizSync       28.0 - 33.0
        VertRefresh     43.0 - 72.0
        Option         "DPMS"
    EndSection
    
    Section "Device"
        Identifier     "Device0"
        Driver         "nvidia"
        VendorName     "NVIDIA Corporation"
        BusID          "PCI:17:0:0"
    EndSection
    
    Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
        Monitor        "Monitor0"
        DefaultDepth    24
        Option         "SLI" "AA"
        SubSection     "Display"
            Depth       24
        EndSubSection
    EndSection

This results in a broken xorg configuration, that crashes due to no devices being found or screens being deteceted:

cat /var/log/Xorg.0.log.old 

[     5.825] (--) Log file renamed from "/var/log/Xorg.pid-712.log" to "/var/log/Xorg.0.log"
[     5.825] (WW) Failed to open protocol names file lib/xorg/protocol.txt
[     5.825] 
X.Org X Server 1.20.1
X Protocol Version 11, Revision 0
[     5.825] Build Operating System: Linux Arch Linux
[     5.825] Current Operating System: Linux host 4.18.6-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 5 11:54:09 UTC 2018 x86_64
[     5.825] Kernel command line: root=PARTUUID=XXXX-XXXX-XXXX rw initrd=\intel-ucode.img initrd=\initramfs-linux.img resume=/dev/sda2
[     5.825] Build Date: 09 August 2018  06:37:34PM
[     5.825]  
[     5.825] Current version of pixman: 0.34.0
[     5.825] 	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
[     5.825] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[     5.825] (==) Log file: "/var/log/Xorg.0.log", Time: Wed Sep 12 13:59:57 2018
[     5.825] (==) Using config file: "/etc/X11/xorg.conf"
[     5.825] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[     5.825] (==) ServerLayout "Layout0"
[     5.825] (**) |-->Screen "Screen0" (0)
[     5.825] (**) |   |-->Monitor "Monitor0"
[     5.825] (**) |   |-->Device "Device0"
[     5.825] (**) |-->Input Device "Keyboard0"
[     5.825] (**) |-->Input Device "Mouse0"
[     5.825] (==) Automatically adding devices
[     5.825] (==) Automatically enabling devices
[     5.825] (==) Automatically adding GPU devices
[     5.825] (==) Automatically binding GPU devices
[     5.825] (==) Max clients allowed: 256, resource mask: 0x1fffff
[     5.825] (WW) `fonts.dir' not found (or not valid) in "/usr/share/fonts/misc".
[     5.825] 	Entry deleted from font path.
[     5.825] 	(Run 'mkfontdir' on "/usr/share/fonts/misc").
[     5.825] (WW) The directory "/usr/share/fonts/OTF" does not exist.
[     5.825] 	Entry deleted from font path.
[     5.825] (WW) The directory "/usr/share/fonts/Type1" does not exist.
[     5.825] 	Entry deleted from font path.
[     5.825] (==) FontPath set to:
	/usr/share/fonts/TTF,
	/usr/share/fonts/100dpi,
	/usr/share/fonts/75dpi
[     5.825] (==) ModulePath set to "/usr/lib/xorg/modules"
[     5.825] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[     5.825] (WW) Disabling Keyboard0
[     5.825] (WW) Disabling Mouse0
[     5.826] (II) Module ABI versions:
[     5.826] 	X.Org ANSI C Emulation: 0.4
[     5.826] 	X.Org Video Driver: 24.0
[     5.826] 	X.Org XInput driver : 24.1
[     5.826] 	X.Org Server Extension : 10.0
[     5.826] (++) using VT number 1

[     5.827] (II) systemd-logind: took control of session /org/freedesktop/login1/session/c5
[     5.828] (II) xfree86: Adding drm device (/dev/dri/card0)
[     5.828] (II) systemd-logind: got fd for /dev/dri/card0 226:0 fd 11 paused 0
[     5.828] (II) xfree86: Adding drm device (/dev/dri/card1)
[     5.829] (II) systemd-logind: got fd for /dev/dri/card1 226:1 fd 12 paused 0
[     5.832] (**) OutputClass "nvidia" ModulePath extended to "/usr/lib/nvidia/xorg,/usr/lib/xorg/modules,/usr/lib/xorg/modules"
[     5.832] (**) OutputClass "nvidia" ModulePath extended to "/usr/lib/nvidia/xorg,/usr/lib/xorg/modules,/usr/lib/nvidia/xorg,/usr/lib/xorg/modules,/usr/lib/xorg/modules"
[     5.834] (--) PCI: (23@0:0:0) 10de:1b06:1043:85e5 rev 161, Mem @ 0xb4000000/16777216, 0xa0000000/268435456, 0xb0000000/33554432, I/O @ 0x00007000/128, BIOS @ 0x????????/524288
[     5.835] (--) PCI:*(101@0:0:0) 10de:1b06:1043:85e5 rev 161, Mem @ 0xd7000000/16777216, 0xc0000000/268435456, 0xd0000000/33554432, I/O @ 0x0000b000/128, BIOS @ 0x????????/131072
[     5.835] (WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory)
[     5.835] (II) LoadModule: "glx"
[     5.835] (II) Loading /usr/lib/nvidia/xorg/libglx.so
[     5.838] (II) Module glx: vendor="NVIDIA Corporation"
[     5.838] 	compiled for 4.0.2, module version = 1.0.0
[     5.838] 	Module class: X.Org Server Extension
[     5.838] (II) NVIDIA GLX Module  396.54  Tue Aug 14 22:37:05 PDT 2018
[     5.838] (II) LoadModule: "nvidia"
[     5.838] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[     5.839] (II) Module nvidia: vendor="NVIDIA Corporation"
[     5.839] 	compiled for 4.0.2, module version = 1.0.0
[     5.839] 	Module class: X.Org Video Driver
[     5.839] (II) NVIDIA dlloader X Driver  396.54  Tue Aug 14 22:15:03 PDT 2018
[     5.839] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[     5.839] (II) systemd-logind: releasing fd for 226:0
[     5.839] (II) systemd-logind: releasing fd for 226:1
[     5.840] (EE) No devices detected.
[     5.840] (EE) 
Fatal server error:
[     5.840] (EE) no screens found(EE) 
[     5.840] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[     5.840] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[     5.840] (EE) 
[     5.843] (EE) Server terminated with error (1). Closing log file.

Interestingly, we see above that the GPU's have now been assigned  new PCIBusIds...

nvidia-settings -q all | grep -i pcibus

  Attribute 'PCIBus' (host:0[gpu:0]): 101.
    'PCIBus' is an integer attribute.
    'PCIBus' is a read-only attribute.
    'PCIBus' can use the following target types: GPU, SDI Input Device.
  Attribute 'PCIBus' (host:0[gpu:1]): 23.
    'PCIBus' is an integer attribute.
    'PCIBus' is a read-only attribute.
    'PCIBus' can use the following target types: GPU, SDI Input Device.

With the appropriate configuration of xorg.conf, I can successfully  start a desktop environment session in SLI mode:

/etc/X11/xorg.conf

Section "Device"
...
    BusID     "PCI:101:0:0"
...

From within an xsession or any desktop environment:

nvidia-settings -q all | grep -i slimode

  Attribute 'SLIMode' (thegeforce:0.0): AA 
    'SLIMode' is a string attribute.
    'SLIMode' is a read-only attribute.
    'SLIMode' can use the following target types: X Screen, GPU.
  Attribute 'SLIMode' (host:0[gpu:0]): AA 
    'SLIMode' is a string attribute.
    'SLIMode' is a read-only attribute.
    'SLIMode' can use the following target types: X Screen, GPU.
  Attribute 'SLIMode' (host:0[gpu:1]): AA 
    'SLIMode' is a string attribute.
    'SLIMode' is a read-only attribute.
    'SLIMode' can use the following target types: X Screen, GPU.

Why are there conflicting PCIBusID's? Does one set belong exclusively to kernel allocated ID's that the PCI connections established with the SLI bridge?

And does this annecdotal evidence warrant an update to the Wiki? I'd be happy to make the changes.

Regards,
NLisa

Offline

Board footer

Powered by FluxBB