You are not logged in.
Hi, I am running an NVIDIA 730M on a T540p.
Unfortunately, if i try to run something using primusrun, i get a segfault and nothing else. With optirun, i get nothing if the -debug or -vvv option is not used.
If i run "primusrun minecraft" the minecraft launcher shows up, i can login and click play, but then the game crashes with this error message (java error message) :
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fc93db4e9e0, pid=12367, tid=0x00007fc93df40700
#
# JRE version: OpenJDK Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: OpenJDK 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libpthread.so.0+0x99e0] pthread_mutex_lock+0x0
While the launcher is openned, if i run
cat /proc/acpi/bbswitch
i get
0000:01:00.0 OFF
(which means the GPU never turns on at all using primusrun, usually it would say ON).
Running
primusrun glxinfo | grep OpenGL
gives no output at all.
Running
dmesg
gives this output :
[ 9873.515510] bbswitch: enabling discrete graphics
[ 9873.743214] thinkpad_acpi: EC reports that Thermal Table has changed
[ 9873.844062] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[ 9873.844373] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[ 9873.844533] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 390.25 Wed Jan 24 20:02:43 PST 2018 (using threaded interrupts)
[ 9874.173844] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 390.25 Wed Jan 24 19:29:37 PST 2018
[ 9874.175420] nvidia-modeset: Allocated GPU:0 (GPU-1b40de8d-1360-ab89-3aa3-e1a01ce26d9e) @ PCI:0000:01:00.0
[ 9874.175788] nvidia-modeset: Freed GPU:0 (GPU-1b40de8d-1360-ab89-3aa3-e1a01ce26d9e) @ PCI:0000:01:00.0
[ 9874.211727] glxinfo[9917]: segfault at 10 ip 00007f7aae65e9e0 sp 00007ffea6a7c278 error 4 in libpthread-2.26.so[7f7aae655000+19000]
[ 9874.433455] nvidia-modeset: Unloading
[ 9874.438022] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[ 9874.446389] bbswitch: disabling discrete graphics
[ 9874.458848] pci 0000:01:00.0: Refused to change power state, currently in D0
[ 9874.459679] thinkpad_acpi: EC reports that Thermal Table has changed
Running
optirun -debug -vvv glxgears
gives
[jape@T540p ~]$ optirun -debug -vvv glxgears
[10120.207703] [DEBUG]Reading file: /etc/bumblebee/bumblebee.conf
[10120.208021] [INFO]Configured driver: nvidia
[10120.208228] [DEBUG]optirun version 3.2.1 starting...
[10120.208247] [DEBUG]Active configuration:
[10120.208262] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf
[10120.208281] [DEBUG] X display: ebug
[10120.208296] [DEBUG] LD_LIBRARY_PATH: /usr/lib/nvidia:/usr/lib32/nvidia
[10120.208311] [DEBUG] Socket path: /var/run/bumblebee.socket
[10120.208330] [DEBUG] Accel/display bridge: auto
[10120.208346] [DEBUG] VGL Compression: proxy
[10120.208355] [DEBUG] VGLrun extra options:
[10120.208366] [DEBUG] Primus LD Path: /usr/lib/primus:/usr/lib32/primus
[10120.208427] [DEBUG]Using auto-detected bridge virtualgl
[10120.898716] [INFO]Response: Yes. X is active.
[10120.898732] [INFO]Running application using virtualgl.
[10120.898798] [DEBUG]Process vglrun started, PID 10154.
[10121.081729] [DEBUG]SIGCHILD received, but wait failed with No child processes
[10121.081747] [DEBUG]Socket closed.
[10121.081764] [DEBUG]Killing all remaining processes.
What would you try next?
Many thanks.
ThinkPad P16s AMD / KDE
Offline
Is your microcode properly applied?
If that wasn't the issue, can you post a dmesg and investigate the coredumps?
Offline
Hi V1del and many thanks for assisting me.
I will be investigating your solution as soon as i get home later this day, i'll let you know with the results. Thanks!
ThinkPad P16s AMD / KDE
Offline
Since the crash seems to be precisely after VGL is started, can you also try using the primus bridge? (make sure it is installed).
optirun -debug -vvv -b primus glxgears
Just trying to narrow down the problem.
Offline
Hi to both of you, many thanks for assisting me.
@Stunts when i run your command i get the following :
[jape@T540p ~]$ optirun -debug -vvv -b primus glxgears
[16672.617189] [DEBUG]Reading file: /etc/bumblebee/bumblebee.conf
[16672.617296] [INFO]Configured driver: nvidia
[16672.617373] [DEBUG]optirun version 3.2.1 starting...
[16672.617379] [DEBUG]Active configuration:
[16672.617385] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf
[16672.617390] [DEBUG] X display: ebug
[16672.617395] [DEBUG] LD_LIBRARY_PATH: /usr/lib/nvidia:/usr/lib32/nvidia
[16672.617427] [DEBUG] Socket path: /var/run/bumblebee.socket
[16672.617432] [DEBUG] Accel/display bridge: primus
[16672.617438] [DEBUG] VGL Compression: proxy
[16672.617442] [DEBUG] VGLrun extra options:
[16672.617448] [DEBUG] Primus LD Path: /usr/lib/primus:/usr/lib32/primus
[16673.332292] [INFO]Response: Yes. X is active.
[16673.332311] [INFO]Running application using primus.
[16673.332392] [DEBUG]Process glxgears started, PID 24051.
[16673.495699] [DEBUG]SIGCHILD received, but wait failed with No child processes
[16673.495717] [DEBUG]Socket closed.
[16673.495736] [DEBUG]Killing all remaining processes.
It seems not to affect the issue when changing the bridge.
@V1del
I have read the wiki about the microcode but i would like to ask if i try running the update is there a risk of bricking my device? (i kinda need it for work lmao).
I know that primusrun has been running without any problems previously and i would like to ask you what could change if i update the microcode?
Other than that i've also been checking about the coredumps and running coredumpctl. One of the PID i had yesterday gives the following results :
[jape@T540p ~]$ coredumpctl gdb 10154
PID: 10154 (glxgears)
UID: 1000 (jape)
GID: 100 (users)
Signal: 11 (SEGV)
Timestamp: Sun 2018-02-18 22:46:00 EST (23h ago)
Command Line: glxgears
Executable: /usr/bin/glxgears
Control Group: /user.slice/user-1000.slice/session-c1.scope
Unit: session-c1.scope
Slice: user-1000.slice
Session: c1
Owner UID: 1000 (jape)
Boot ID: 71b089c7d8b3457a91a373c0485de529
Machine ID: aece2c68cd5f4b479d4aeb1831f4c5c9
Hostname: T540p
Storage: /var/lib/systemd/coredump/core.glxgears.1000.71b089c7d8b3457a91a373c0485de529.10154.1519011960000000.lz4
Message: Process 10154 (glxgears) of user 1000 dumped core.
Stack trace of thread 10154:
#0 0x00007fb0a7a3d9e0 __pthread_mutex_lock (libpthread.so.0)
#1 0x00007fb0a78093de __glXGLLoadGLXFunction (libGLX.so.0)
#2 0x00007fb0a8f68c99 n/a (libGL.so.1)
#3 0x00007fb0a967847a call_init.part.0 (ld-linux-x86-64.so.2)
#4 0x00007fb0a9678586 _dl_init (ld-linux-x86-64.so.2)
#5 0x00007fb0a9669f6a _dl_start_user (ld-linux-x86-64.so.2)
Somehow if i run it without a PID i get the following :
[jape@T540p ~]$ coredumpctl gdb
PID: 24051 (glxgears)
UID: 1000 (jape)
GID: 100 (users)
Signal: 11 (SEGV)
Timestamp: Mon 2018-02-19 22:42:36 EST (8min ago)
Command Line: glxgears
Executable: /usr/bin/glxgears
Control Group: /user.slice/user-1000.slice/session-c1.scope
Unit: session-c1.scope
Slice: user-1000.slice
Session: c1
Owner UID: 1000 (jape)
Boot ID: 71b089c7d8b3457a91a373c0485de529
Machine ID: aece2c68cd5f4b479d4aeb1831f4c5c9
Hostname: T540p
Storage: /var/lib/systemd/coredump/core.glxgears.1000.71b089c7d8b3457a91a373c0485de529.24051.1519098156000000.lz4
Message: Process 24051 (glxgears) of user 1000 dumped core.
Stack trace of thread 24051:
#0 0x00007fecf03ad9e0 __pthread_mutex_lock (libpthread.so.0)
#1 0x00007feceece03de __glXGLLoadGLXFunction (libGLX.so.0)
#2 0x00007feceef4cc99 n/a (libGL.so.1)
#3 0x00007fecf125747a call_init.part.0 (ld-linux-x86-64.so.2)
#4 0x00007fecf1257586 _dl_init (ld-linux-x86-64.so.2)
#5 0x00007fecf125ba5e dl_open_worker (ld-linux-x86-64.so.2)
#6 0x00007fecf06f3b64 _dl_catch_error (libc.so.6)
#7 0x00007fecf125b27a _dl_open (ld-linux-x86-64.so.2)
#8 0x00007fecef7d1e86 n/a (libdl.so.2)
#9 0x00007fecf06f3b64 _dl_catch_error (libc.so.6)
#10 0x00007fecef7d2587 n/a (libdl.so.2)
#11 0x00007fecef7d1f22 dlopen (libdl.so.2)
#12 0x00007fecf102c82c n/a (libGL.so.1)
#13 0x00007fecf102c8cc n/a (libGL.so.1)
#14 0x00007fecf102ee57 n/a (libGL.so.1)
#15 0x00007fecf1017200 n/a (libGL.so.1)
#16 0x00007fecf125747a call_init.part.0 (ld-linux-x86-64.so.2)
#17 0x00007fecf1257586 _dl_init (ld-linux-x86-64.so.2)
#18 0x00007fecf1248f6a _dl_start_user (ld-linux-x86-64.so.2)
Which is longer.
I have very limited knowledge of coredumps and i would really appreciate if you could lend me a hand on how to interpret these results.
I see a lot of libgl n/a in the coredump but libgl is installed and updated to latest version (maybe that's the problem?)
Again, many, many thanks for assisting me.
JPBD.
ThinkPad P16s AMD / KDE
Offline
No The n/a's aren't the problem, that simply tells you that the Arch package lacks debug symbols (which they all do, you'd have to recompile affected packages with debug symbols back in) which would be interesting to investigate if there really was some specific issue.
However that looks like the general info output. If you don't have it installed you'd have to install gdb and run the bt command to get a more specific trace.
The microcode shouldn't be able to brick your machine, in fact it is applied in a volatile way, so should it not work for some reason, you could simply remove the package/the relevant binary again from a livedisk (or even simpler adjust the bootloader conf to not load the ucode.img). When was the last time primusrun "used to work"? The issue that I'm thinking of can surface depending on compiler options/nvidia driver version. If you google for TSX/TSC bug (and look at the beginning of your dmesg, as the kernel should tell you about the fact that it has detected an incompatible microcode) you should find some more references.
If your libgl installation in general is correct, (while we are at it, what do you get for
pacman -Qo /usr/lib/libGL*.so
) I find it likely for that to be the issue.
Offline
Good morning V1del, many thanks for the follow up.
I will try upgrading the microcode and running your command at home this evening.
The last time i can recall primusrun with nvidia working was back in mid october last year (i was playing CS:GO on the laptop until i would get all of my parts to build up my new rig hehe). I have not used my laptop's GPU since if i recall correctly. I realized it was broken when i tried launching minecraft last week during a break in a job formation.
I regularly update my machine so i don't know exactly when it broke. All i have is this large timeframe in between October 20th 2017 and right now.
I'll keep you posted with the testings!
Again, thanks for assisting me, i really appreciate the efforts.
JPBD.
ThinkPad P16s AMD / KDE
Offline
@Quardah
And since you are at it, please also post the output of
pacman -Qs nvidia
To make sure every package version matches.
Offline
Output for both commands :
[jape@T540p ~]$ pacman -Qo /usr/lib/libGL*.so
/usr/lib/libGLdispatch.so is owned by libglvnd 1.0.0-1
/usr/lib/libGLESv1_CM_nvidia.so is owned by nvidia-utils 390.25-2
/usr/lib/libGLESv1_CM.so is owned by libglvnd 1.0.0-1
/usr/lib/libGLESv2_nvidia.so is owned by nvidia-utils 390.25-2
/usr/lib/libGLESv2.so is owned by libglvnd 1.0.0-1
/usr/lib/libGLEW.so is owned by glew 2.1.0-1
/usr/lib/libGL.so is owned by libglvnd 1.0.0-1
/usr/lib/libGLU.so is owned by glu 9.0.0-4
/usr/lib/libGLX_mesa.so is owned by mesa 17.3.3-2
/usr/lib/libGLX_nvidia.so is owned by nvidia-utils 390.25-2
/usr/lib/libGLX.so is owned by libglvnd 1.0.0-1
[jape@T540p ~]$ pacman -Qs nvidia
local/bumblebee 3.2.1-16
NVIDIA Optimus support for Linux through VirtualGL
local/lib32-nvidia-utils 390.25-1
NVIDIA drivers utilities (32-bit)
local/libvdpau 1.1.1+3+ga21bf7a-1
Nvidia VDPAU library
local/libxnvctrl 390.25-1
NVIDIA NV-CONTROL X extension
local/nvidia 390.25-11
NVIDIA drivers for linux
local/nvidia-settings 390.25-1
Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 390.25-2
NVIDIA drivers utilities
[jape@T540p ~]$
I am going to update the microcode right now, i'll post right afterwards.
ThinkPad P16s AMD / KDE
Offline
Hi, this was surprisingly much easier than expected.
Updating the microcode was successful and i worried for nothing lmao.
But it did not change anything so far :
[jape@T540p ~]$ dmesg | grep microcode
[ 0.000000] microcode: microcode updated early to revision 0x23, date = 2017-11-20
[ 0.000000] Intel Spectre v2 broken microcode detected; disabling Speculation Control
[ 0.877412] microcode: sig=0x306c3, pf=0x10, revision=0x23
[ 0.877608] microcode: Microcode Update Driver: v2.2.
[jape@T540p ~]$ primusrun glxgears
Segmentation fault (core dumped)
[jape@T540p ~]$
When i run "gdb primusrun" i get the following error :
"/usr/bin/primusrun": not in executable format: File format not recognized
Gonna be honest i am having a hard time putting my head around this right now, i worked the last 12 hours straight lol but i'll be back tomorrow full energy to try something new with you guys.
Thanks again i really appreciate the efforts. We'll find the solution, i worry not.
JPBD
ThinkPad P16s AMD / KDE
Offline
Hi, this was surprisingly much easier than expected.
Updating the microcode was successful and i worried for nothing lmao.
But it did not change anything so far :
[jape@T540p ~]$ dmesg | grep microcode [ 0.000000] microcode: microcode updated early to revision 0x23, date = 2017-11-20 [ 0.000000] Intel Spectre v2 broken microcode detected; disabling Speculation Control [ 0.877412] microcode: sig=0x306c3, pf=0x10, revision=0x23 [ 0.877608] microcode: Microcode Update Driver: v2.2. [jape@T540p ~]$ primusrun glxgears Segmentation fault (core dumped) [jape@T540p ~]$
When i run "gdb primusrun" i get the following error :
"/usr/bin/primusrun": not in executable format: File format not recognized
Gonna be honest i am having a hard time putting my head around this right now, i worked the last 12 hours straight lol but i'll be back tomorrow full energy to try something new with you guys.
Thanks again i really appreciate the efforts. We'll find the solution, i worry not.
JPBD
Do you know exactly which of these packages have been updated in your latest upgrade(s) (which broken this)? If yes you would downgrade some of them to check if it changes something.. (I would start with libglvnd
Out of curiosity, booting into linux-lts has the same issue?
Just guessing, mostly based on the stacktrace you've sent. Never used optimus stuff..
Nick
Offline
Hi Nickyamane and thanks for jumping in.
Sadly as i mentionned before it is very hard to pin point the exact moment it stopped working. I ceased using primusrun back in mid october 2017 but i have still been updating the laptop since (i update it at least once a week). Therefore i know for a fact that this issue appeared in between today and around october 20th (maybe even sooner).
I cannot say an exact date for sure :[
Otherwise, what do you mean by booting into a linux-lts? Do you mean a livecd of another distribution? I could certainly try that, but i know that the laptop requires specific boot parameters in GRUB to be able to use bumblebee (that was a very hard setup back in the days). I can check later at home what was required because i know i saved all the work but i do not have it at hands right now. Still, if that's what you meant, i will try it tonight when i get home, that'll allow us to narrow down the problem.
Thanks!
JPBD
ThinkPad P16s AMD / KDE
Offline
You can run gdb on the coredump you get, the output you posted looked like the output of
coredumpctl info
as opposed to
coredumpctl gdb
with a bt executed. By booting into linux-lts he means to install the linux-lts and nvidia-lts packages to check if there might be a bug in linux 4.15. However as you mention specific boot parameters, those might well be a cause. Can you post a entire
dmesg
primusrun ldd /usr/bin/glxinfo #Not entirely sure if that works but might help
pacman -Qm
.
Last edited by V1del (2018-02-21 18:04:51)
Offline
Here is the output of the three commands : https://pastebin.com/uV25MeSW
It will be available for a week. If you need it afterwards, PM me.
Thanks!
JPBD
ThinkPad P16s AMD / KDE
Offline
hmm, don't see anything immediately off, here other than that you should try and see whether this and/or other functionality still works the same without the ACPI override kernel parameters. Can you repost a gdb/coredump trace from after you applied the microcode update? I suspect we should see a different stacktrace now that the microcode issues are out of the way. If you want to directly run gdb you would still have to run primusrun first, e.g.
primusrun gdb glxgears
run #Let it crash
bt
Last edited by V1del (2018-02-22 16:34:10)
Offline
Hi V1del, sorry for the late response. I will be checking this later this evening. Yesterday i had to do overtime so i just didn't get the chance to fumble around on the laptop.
I'll keep you posted. Thanks!
JPBD
ThinkPad P16s AMD / KDE
Offline
[jape@T540p ~]$ primusrun gdb glxgears
GNU gdb (GDB) 8.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from glxgears...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/glxgears
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6f3d9e0 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
(gdb) bt
#0 0x00007ffff6f3d9e0 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
#1 0x00007ffff58703de in __glXGLLoadGLXFunction () from /usr/lib/libGLX.so.0
#2 0x00007ffff5adcc99 in ?? () from /usr/lib/nvidia/libGL.so.1
#3 0x00007ffff7de747a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#4 0x00007ffff7de7586 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#5 0x00007ffff7deba5e in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#6 0x00007ffff7283b64 in _dl_catch_error () from /usr/lib/libc.so.6
#7 0x00007ffff7deb27a in _dl_open () from /lib64/ld-linux-x86-64.so.2
#8 0x00007ffff6361e86 in ?? () from /usr/lib/libdl.so.2
#9 0x00007ffff7283b64 in _dl_catch_error () from /usr/lib/libc.so.6
#10 0x00007ffff6362587 in ?? () from /usr/lib/libdl.so.2
#11 0x00007ffff6361f22 in dlopen () from /usr/lib/libdl.so.2
#12 0x00007ffff7bbc82c in ?? () from /usr/lib/primus/libGL.so.1
#13 0x00007ffff7bbc8cc in ?? () from /usr/lib/primus/libGL.so.1
#14 0x00007ffff7bbee57 in ?? () from /usr/lib/primus/libGL.so.1
#15 0x00007ffff7ba7200 in ?? () from /usr/lib/primus/libGL.so.1
#16 0x00007ffff7de747a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#17 0x00007ffff7de7586 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#18 0x00007ffff7dd8f6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#19 0x0000000000000001 in ?? ()
#20 0x00007fffffffec35 in ?? ()
#21 0x0000000000000000 in ?? ()
(gdb)
I will also try removing the boot params in grub, see if that helps.
Thanks.
ThinkPad P16s AMD / KDE
Offline
Ok i tried without the special boot parameters and basically the card is always on (cannot be turned off by bumblebee or bbswitch), the laptop runs very hot and i still get segfaults if i try to run something with primus.
If you are interested into which boot params i am using here is where i documented it at first : https://github.com/Bumblebee-Project/bb … -265587864
Fourth to last comment :
@leon9923
Yes.
Either add into /etc/default/grub like this : (and run "# grub-mkconfig -o /boot/grub/grub.cfg" after)
GRUB_CMDLINE_LINUX_DEFAULT="quiet 'acpi_osi=!Windows\x202013' acpi_osi=Linux nogpumanager intel_iommu=on"
Or while booting in grub press "e" to add cmdlines (won't persist after reboot, best for testing)
ThinkPad P16s AMD / KDE
Offline
Hi. This morning i updated everything in the laptop and i got a new nvidia driver version.
I still have a segfault when using primusrun.
ThinkPad P16s AMD / KDE
Offline
Please don't bump your thread and append new information to previous posts if there hasn't been an answer in between
Unfortunately I'm somewhat stumped at this point. Other than that maybe some of your packages got corrupted for various reasons, what do you get for
pacman -Qkk primus bumblebee xorg-server nvidia-utils
pacman -Qs xorg
That the new nvidia version didn't fix anything isn't surprising, that is only a rebuild for the new kernel but will not have a bearing on your issue, if it's an actual issue in the code/user space library setup.
Offline
Please don't bump your thread and append new information to previous posts if there hasn't been an answer in between
You are right i will not do this anymore.
I am also stuck, i do not know what to try next.
EDIT: Spoke to a friend of mine today and he advised me to try using nouveau instead. This is most probably what i will try this weekend and i will report back.
Sadly nouveau does not offer the raw performances of nvidia but it'll work nonetheless. Gotta take off the shackles of nvidia anyone someday soon because look otherwise what kind of shit can happen due to them giving little to no fucks about the longevity of their products. Bad support, terrible proprietary software solution. 0/10 will buy AMD next time.
Last edited by Quardah (2018-02-28 01:04:53)
ThinkPad P16s AMD / KDE
Offline