You are not logged in.

#1 2018-07-07 21:40:06

nnt
Member
Registered: 2018-06-27
Posts: 19

[Solved] Bumblebee broken since update

So, everything was working fine earlier today, but then I upgraded, and now I can't use my video card anymore.

I'm using the nvidia-390xx drivers (since I have a nvidia 820m) and the latest kernel in the repos.

Output of "primusrun glxgears":

 primus: fatal: Bumblebee daemon reported: error: Could not load GPU driver 

Output of "optirun -vvv pwd":

[  281.504619] [DEBUG]Reading file: /etc/bumblebee/bumblebee.conf
[  281.504974] [DEBUG]optirun version 3.2.1 starting...
[  281.504987] [DEBUG]Active configuration:
[  281.504991] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf
[  281.504995] [DEBUG] X display: :8
[  281.504999] [DEBUG] LD_LIBRARY_PATH: /usr/lib/nvidia:/usr/lib32/nvidia:/usr/lib:/usr/lib32
[  281.505003] [DEBUG] Socket path: /var/run/bumblebee.socket
[  281.505010] [DEBUG] Accel/display bridge: auto
[  281.505016] [DEBUG] VGL Compression: proxy
[  281.505024] [DEBUG] VGLrun extra options: 
[  281.505028] [DEBUG] Primus LD Path: /usr/lib/primus:/usr/lib32/primus
[  281.505083] [DEBUG]Using auto-detected bridge virtualgl
[  281.597495] [INFO]Response: No - error: Could not load GPU driver

[  281.597506] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver

[  281.597510] [DEBUG]Socket closed.
[  281.597519] [ERROR]Aborting because fallback start is disabled.
[  281.597523] [DEBUG]Killing all remaining processes.

"systemctl status bumblebeed":

Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597451] [ERROR][XORG] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597456] [ERROR][XORG] (EE) NVIDIA(0): Failing initialization of X screen 0
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597460] [ERROR][XORG] (EE) Screen(s) found, but none have a usable configuration.
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597462] [ERROR][XORG] (EE)
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597465] [ERROR][XORG] (EE) no screens found(EE)
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597468] [ERROR][XORG] (EE)
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597472] [ERROR][XORG] (EE) Please also check the log file at "/var/log/Xorg.8.log" for additional information.
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597474] [ERROR][XORG] (EE)
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597477] [ERROR][XORG] (EE) Server terminated with error (1). Closing log file.
Jul 07 23:33:50 hyperborea bumblebeed[821]: [  281.597480] [ERROR]X did not start properly

Checked the Xorg.8.log file but nothing interesting in there besides what's written above. (Note: my xorg file only contains intel backlight stuff, and i haven't changed it in over 8 months)

And finally, dmesg:

[   80.496226] bbswitch: enabling discrete graphics
[   86.688915] ipmi message handler version 39.2
[   86.809756] ipmi device interface
[  189.427953] nvidia: module license 'NVIDIA' taints kernel.
[  189.427956] Disabling lock debugging due to kernel taint
[  189.441205] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[  189.441486] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.67  Fri Jun  1 04:04:27 PDT 2018 (using threaded interrupts)
[  190.377614] NVRM: RmInitAdapter failed! (0x26:0xffff:1123)
[  190.377644] NVRM: rm_init_adapter failed for device bearing minor number 0
[  281.595631] NVRM: RmInitAdapter failed! (0x26:0xffff:1123)
[  281.595656] NVRM: rm_init_adapter failed for device bearing minor number 0

I tried the tips in the wiki but nothing helped me, including the PCI rescan thing (echo 1 > /sys/bus/pci/rescan). I really don't know what to do. I tried downgrading kernel and nvidia drivers but that didn't work either.



edit

After a couple more tries, now it says

[ 1123.745566] [ERROR][XORG] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
[ 1123.745572] [ERROR][XORG] (EE) NVIDIA(0): Failed to allocate push buffer
[ 1123.745577] [ERROR][XORG] (EE) 
[ 1123.745581] [DEBUG][XORG] Fatal server error:
[ 1123.745589] [ERROR][XORG] (EE) AddScreen/ScreenInit failed for driver 0
[ 1123.745594] [ERROR][XORG] (EE) 
[ 1123.745599] [ERROR][XORG] (EE) 

And I noticed that if I switch to tty8 (ctrl+alt+F8) I see some error codes regarding the PCI of the graphic card... I have no idea what's going on.

Also I'm not running powertop, if anyone was going to reference this thread.

Last edited by nnt (2018-07-12 19:16:13)

Offline

#2 2018-07-07 23:07:15

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

https://wiki.archlinux.org/index.php/NV … iled.21.29

Ceterum censeo: ensure "uname -a" and "pacman -Qs linux" are in sync.

Offline

#3 2018-07-07 23:18:01

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

seth wrote:

https://wiki.archlinux.org/index.php/NV … iled.21.29

Ceterum censeo: ensure "uname -a" and "pacman -Qs linux" are in sync.

uname-a and pacman -Qs linux both show that I'm running Linux 4.17.4-1.

Setting rcutree.rcu_idle_gp-delay=1 in /etc/default/grub and subsequently running grub-mkconfig as written in the topic linked on the wiki unfortunately didn't fix the problem. Same with adding the BusID line to the bumblebee config.

Last edited by nnt (2018-07-07 23:21:59)

Offline

#4 2018-07-07 23:29:22

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

Can you just modprobe the nvidia module (getting bumblebee out of the equation)?
Do you have "nvidia-drm.modeset=1" in the kernel parameters or any other custom modprobe configs for the nvidia module?

Since the HW isn't responding:
Try the behavior w/ some live distro; maybe also whether nouveau can talk to the device.
For a mobile chip: try the behavior w/ external power supply.

Offline

#5 2018-07-07 23:36:02

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

seth wrote:

Can you just modprobe the nvidia module (getting bumblebee out of the equation)?

I'm admittedly not sure how to do that. But this is a laptop, so having the graphic card always on is not an option due to overheating and battery time.
Doing "modprobe nvidia" returns nothing, and nothing seems to change.

Do you have "nvidia-drm.modeset=1" in the kernel parameters or any other custom modprobe configs for the nvidia module?

No.


I'm going to try and switch to Nouveau tomorrow, I'll report back then.

Offline

#6 2018-07-07 23:43:36

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

"modprobe nvidia" will load the module, the question is whether you get errors in dmesg from this.
Also check "cat /proc/acpi/bbswitch " (assuming you've bbswitch installed for power saving)

Referencing the other thread: TLP or laptop-mode-tools proabably have a similar potential to cause related trouble.

Offline

#7 2018-07-07 23:49:47

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

I don't have laptop mode tools.

Only thing in dmesg is the following, repeated multiple times:

[   90.616084] NVRM: RmInitAdapter failed! (0x26:0xffff:1123)
[   90.616154] NVRM: rm_init_adapter failed for device bearing minor number 0

The cat command for bbswitch returns

0000:01:00.0 ON

and I recognize 01:00.0 as the address for the nvidia card. No idea how to interpret this however.

Offline

#8 2018-07-08 00:28:46

loqs
Member
Registered: 2014-03-06
Posts: 17,367

Re: [Solved] Bumblebee broken since update

Does the following have any effect?

# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
# echo 1 > /sys/bus/pci/rescan

Online

#9 2018-07-08 06:40:06

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

It means that the card is believed to be power supplied.

@loqs, OP suggests he did try no no avail.

@nnt, if it's not optirun (you need to "modprobe nvidia" w/o the module being loaded, check "lsmod | grep nvidia" and drop it w/ "modprobe -r nvidia") nor the kernel (the acpi subsystem could cause this - try installing the LTS kernel and the nvidia-390xx-lts package, you can keep the normal kernel):
* you should check the general HW sanity
* Also inspect your BIOS settings on whether some BIOS update severed the GPU off
  - (do you have a parallel windows installation?)

It's about time to check the HW sanity, ie. whether it works on a completely different system…

Offline

#10 2018-07-08 07:32:45

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

I installed the lts packages, and now... I get "primus: fatal: Bumblebee daemon reported: error: [XORG] (EE) Failed to load module "nouveau" (module does not exist, 0)".

Adding

blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
blacklist nv
blacklist uvcvideo

to /etc/modprobe.d/blacklist.conf did not do anything. I don't even have nouveau installed.

Offline

#11 2018-07-08 07:37:18

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

seth wrote:

and the nvidia-390xx-lts package

Offline

#12 2018-07-08 08:12:47

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

Yes, that's installed too. I wrote "packages" on purpose.

Offline

#13 2018-07-08 09:17:16

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

Please post a complete dmesg and the xorg.8.log

Offline

#14 2018-07-08 09:25:09

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

Ok some updates. Poking around, purely by accident, I found that the /lib/modprobe.d/bumblebee.conf file (which is different from the /etc/modprobe.d files) had a series of blacklist lines, including a nouveau and nvidia blacklist. I am positive I have never touched that file, so it must've been changed with one of the latest package updates.


I removed the nvidia blacklist, rebooted, and now I get a different error:

dmesg:

[   71.101027] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.67  Fri Jun  1 04:04:27 PDT 2018 (using threaded interrupts)
[   85.444052] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.67  Fri Jun  1 03:15:43 PDT 2018
[   85.445087] nvidia-modeset: Allocated GPU:0 (GPU-2bb85763-d52e-791e-0d30-8cb58df6cbb0) @ PCI:0000:01:00.0
[   85.445266] nvidia-modeset: Freed GPU:0 (GPU-2bb85763-d52e-791e-0d30-8cb58df6cbb0) @ PCI:0000:01:00.0
[   85.914920] nvidia-modeset: Allocated GPU:0 (GPU-2bb85763-d52e-791e-0d30-8cb58df6cbb0) @ PCI:0000:01:00.0
[   85.915181] nvidia-modeset: Freed GPU:0 (GPU-2bb85763-d52e-791e-0d30-8cb58df6cbb0) @ PCI:0000:01:00.0
[   86.311769] glxgears[1673]: segfault at 74 ip 00007f6825369081 sp 00007f682162db20 error 4 in i965_dri.so[7f68251a1000+8a1000]
[   86.681803] nvidia-modeset: Unloading

A damn segfault.

Xorg logs report this:

[  436.826905] [DEBUG][XORG] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[  436.826909] [DEBUG][XORG] (II) NVIDIA(0):     may not be running or the "AcpidSocketPath" X
[  436.826912] [DEBUG][XORG] (II) NVIDIA(0):     configuration option may not be set correctly.  When the
[  436.826915] [DEBUG][XORG] (II) NVIDIA(0):     ACPI event daemon is available, the NVIDIA X driver will
[  436.826917] [DEBUG][XORG] (II) NVIDIA(0):     try to use it to receive ACPI event notifications.  For
[  436.826922] [DEBUG][XORG] (II) NVIDIA(0):     details, please see the "ConnectToAcpid" and
[  436.826925] [DEBUG][XORG] (II) NVIDIA(0):     "AcpidSocketPath" X configuration options in Appendix B: X
[  436.826928] [DEBUG][XORG] (II) NVIDIA(0):     Config Options in the README.
[  436.826931] [DEBUG][XORG] (II) NVIDIA(0): Setting mode "NULL"

.


Edit:

Now optirun works, but primusrun segfaults. Well, at least now I can kinda access my graphic card.

Last edited by nnt (2018-07-08 09:28:01)

Offline

#15 2018-07-08 09:56:54

loqs
Member
Registered: 2014-03-06
Posts: 17,367

Re: [Solved] Bumblebee broken since update

primusrun is a known issue https://bugs.archlinux.org/task/58933

Online

#16 2018-07-08 09:59:16

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

Ah, I was starting to go crazy. Good to know it's not an issue on my end. I guess there's nothing left but to wait for a fix then.

Offline

#17 2018-07-08 18:00:39

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

...and now it's broken again, after a reboot. I'm starting to suspect it might be the hardware failing...

dmesg result in this when I launch optirun:

[  100.030555] NVRM: GPU at PCI:0000:01:00: GPU-2bb85763-d52e-791e-0d30-8cb58df6cbb0
[  100.030559] NVRM: Xid (PCI:0000:01:00): 44, Ch 00000001, engmask 00000101, intr 10000000
[  104.708067] NVRM: GPU at PCI:0000:01:00: GPU-2bb85763-d52e-791e-0d30-8cb58df6cbb0
[  104.708070] NVRM: Xid (PCI:0000:01:00): 44, Ch 00000001, engmask 00000101, intr 10000000
[  150.524965] NVRM: GPU at PCI:0000:01:00: GPU-2bb85763-d52e-791e-0d30-8cb58df6cbb0
[  150.524968] NVRM: Xid (PCI:0000:01:00): 44, Ch 00000001, engmask 00000101, intr 10000000

and bumblebeed just returns

Fatal server error:
[   150.525] (EE) NVIDIA: A GPU exception occurred during X server initialization(EE) 
[   150.525] (EE) 



After a couple of tries with modprobe -r and remounting the module now I'm back to square one with

[  699.315115] [ERROR][XORG] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
[  699.315121] [ERROR][XORG] (EE) NVIDIA(0): Failed to allocate push buffer
[  699.315125] [ERROR][XORG] (EE)
[  699.315129] [ERROR][XORG] (EE) AddScreen/ScreenInit failed for driver 0
[  699.315133] [ERROR][XORG] (EE)

Last edited by nnt (2018-07-08 18:05:53)

Offline

#18 2018-07-08 19:52:44

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

XID 44 is a driver error "Graphics Engine fault during context switch", see https://docs.nvidia.com/deploy/xid-errors/index.html

seth wrote:

Please post a complete dmesg and the xorg.8.log

Offline

#19 2018-07-08 20:02:10

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

Offline

#20 2018-07-08 20:58:11

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

The nvidia kernel module seems to load fine before bbswitch disables it, only afterwards it causes trouble.
Try w/o bumblebee/bbswitch ie. just run the nvidia GPU all the time and see whether other errors pop up. Likewise, if possible, check a completely different software stack. (Since the behavior seems random, try a couple of boots and also to stress the GPU, good excuse to frag some bots ;-)

I think™ the issue occurs when re-powering the device. Is this on battery or on external power supply (or both)?
Also see whether there're possibly BIOS/UEFI updates.

Offline

#21 2018-07-08 21:03:16

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

I haven't updated the BIOS in a couple of months so that can't be it. I did check earlier and there doesn't seem to be anything I can do in there to tackle the problem.

I've tested this issue both while attached to a power supply and using the laptop's battery. I did try to launch it without bumblebee before creating this topic, but X just ended up displaying a black screen and I had to switch to TTY to revert to bbswitch/bumblebee. I guess I'll try again later.

Also I don't have windows on this laptop so unfortunately I'm not sure how to test it in any other way.

Offline

#22 2018-07-09 17:58:47

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

Alright, disabling bumblebee is no dice. Xorg will simply hang. I had to launch the system in single-user mode to reenable it, since it even stopped me from accessing the ttys.

The most bizarre part of all of this, however, is that at start some applications can launch through primusrun (no segfault), It's shortly after launch that everything stops working. So yeah I think you are correct in the issue with re-powering. I really don't know however which is at fault between bbswitch, bumblebee, the nvidia drivers, the kernel or Xorg (which gave me a couple "timeout" errors last night while I was experimenting).

For example if I run "primusrun steam" right at start, I'll be able to run any steam game after that using primusrun as long as steam remains online. However once I stop the process I won't be able to reopen it because of the DMA error I mentioned in one of my tracebacks. And any other process opened in the meanwhile with primusrun will fail. It's like it only accepts the first process, and everything else afterwards fail, even if the first process is still going.

Offline

#23 2018-07-09 18:16:08

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

Xorg will simply hang.

Why would it? The only way I can see this happen is if the server tries to run on the nvidia GPU and that does not respond - in which case the re-powering would be insignificant and a HW failure more likely…
Do you have a log from that "hanging" X11 server?

Offline

#24 2018-07-09 19:24:40

nnt
Member
Registered: 2018-06-27
Posts: 19

Re: [Solved] Bumblebee broken since update

Well, I just discovered something bizarre. Turns out some programs do run with optirun/primusrun.

Steam through wine, GIMP, feh and geeqie all run perfectly even after other programs have displayed the Xid traceback on dmesg.  The only reason I didn't notice this sooner is because those are all programs I don't use often, and I hadn't bothered testing even more programs after all the ones I tried did crash.

On the other hand native Steam, mpv, krita (why krita and not GIMP?) and a couple of other compiled binaries fail to launch due to the previously mentioned DMA crash.

I don't even know what's going on anymore. I feel like I'm being trolled by my own computer.

I don't see how this could be a HW problem. It feels like it could be a problem with some library, but I really don't know enough about any of those programs to be able to tell.

Offline

#25 2018-07-09 19:27:00

seth
Member
Registered: 2012-09-03
Posts: 51,156

Re: [Solved] Bumblebee broken since update

What happens for "optirun glxinfo"?

Offline

Board footer

Powered by FluxBB