You are not logged in.

#1 2020-10-12 21:33:04

Corubba
Member
From: Germany
Registered: 2010-11-14
Posts: 85

Xorg segfaults very early when compiled with optimisation level 2

Hello everybody,

I have a wierd problem and am not sure if this is a bug in Xorg, gcc, glibc or a compilation/packaging issue or even a hardware/firmware/microcode bug, which is why I am posting here first before opening a upstream/arch bug. It is a bit of text, but I would like to present as much info and context as I can so maybe one of you lovely people has an idea or can give me some guidance for further investigation.

I recently got my hands on a "Lenovo Thinkpad X1 Tablet 2nd Gen" (cpu: Intel i5-7Y57, gpu: Intel HD615), and promptly proceeded to installing arch on it. The installation itself went smoothly, and I went on to install lightdm as a login manager and xfce as desktop environment but the lighdm systemd service failed to start. A call to `journalctl` revealed the reason to be Xorg coredumping:

Oct 11 14:19:16 columbia systemd[1]: Starting Light Display Manager...
Oct 11 14:19:16 columbia systemd[1]: Started Light Display Manager.
Oct 11 14:19:16 columbia systemd[1]: Started Process Core Dump (PID 781/UID 0). 
Oct 11 14:19:17 columbia systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
Oct 11 14:19:17 columbia systemd[1]: lightdm.service: Failed with result 'exit-code'.
Oct 11 14:19:17 columbia systemd-coredump[782]: Process 780 (Xorg) of user 0 dumped core.

                                                Stack trace of thread 780:
                                                #0  0x00007f9c795c1a85 __strlen_avx2 (libc.so.6 + 0x162a85)
                                                #1  0x00007f9c78b87706 get_cie_encoding (libgcc_s.so.1 + 0x11706)
                                                #2  0x00007f9c78b88663 get_fde_encoding (libgcc_s.so.1 + 0x12663)
                                                #3  0x00007f9c79598f95 dl_iterate_phdr (libc.so.6 + 0x139f95)
                                                #4  0x00007f9c78b892c6 _Unwind_Find_FDE (libgcc_s.so.1 + 0x132c6)
                                                #5  0x00007f9c78b85319 uw_frame_state_for (libgcc_s.so.1 + 0xf319)
                                                #6  0x00007f9c78b8733b _Unwind_Backtrace (libgcc_s.so.1 + 0x1133b)
                                                #7  0x00007f9c7956c116 __backtrace (libc.so.6 + 0x10d116)
                                                #8  0x000055d9acbaebd3 xorg_backtrace (Xorg + 0x146bd3)
                                                #9  0x000055d9acbb9a15 n/a (Xorg + 0x151a15)
Oct 11 14:19:17 columbia systemd[1]: systemd-coredump@5-781-0.service: Succeeded.
Oct 11 14:19:17 columbia systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 1.
Oct 11 14:19:17 columbia systemd[1]: Stopped Light Display Manager.

I was surprised that there was no Xorg log file in `/var/log`, not even an empty one. At this point I back-paddled a bit and tried to get Xorg alone working before moving onto lightdm again. But even very simple calls to Xorg like `Xorg :0 -configure` would coredump (`Xorg -version` works thou) with exactly the same stacktrace and no logfile. Also the console output was of no use, at all.

(EE)
(EE) Backtrace:
Segmentation fault (core dumped)

I tried to install/re-install/remove various xorg-related packages and drivers to no avail. What really throws me of is that in 1 of ~20 reboots Xorg worked, but that was absolutely "random" and I could not reproduce it. At this point I wiped the disk and started again, and was faced with the same coredump again. Using the ABS I locally rebuild the glibc and xorg-server packages with debug symbols to see what's going on/wrong.

GNU gdb (GDB) 9.2 

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file /usr/lib/Xorg
Reading symbols from /usr/lib/Xorg...
(gdb) run :0 -configure
Starting program: /usr/lib/Xorg :0 -configure
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65    VPCMPEQ (%rdi), %ymm0, %ymm1
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
#1  0x00007ffff71c3706 in get_cie_encoding (cie=0x55553c311d03) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.c:300
#2  0x00007ffff71c4663 in get_fde_encoding (f=0x55555577da88) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.h:157
#3  _Unwind_IteratePhdrCallback (info=info@entry=0x7fffffffd310, size=size@entry=64, ptr=ptr@entry=0x7fffffffd3a0) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde-dip.c:418
#4  0x00007ffff7bd4f95 in __GI___dl_iterate_phdr (callback=callback@entry=0x7ffff71c41c0 <_Unwind_IteratePhdrCallback>, data=data@entry=0x7fffffffd3a0) at dl-iteratephdr.c:75
#5  0x00007ffff71c52c6 in _Unwind_Find_FDE (pc=0x5555556a1be2 <OsInit+754>, bases=bases@entry=0x7fffffffd518) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde-dip.c:469
#6  0x00007ffff71c1319 in uw_frame_state_for (context=0x7fffffffd470, fs=0x7fffffffd560) at /build/gcc/src/gcc/libgcc/unwind-dw2.c:1263
#7  0x00007ffff71c333b in _Unwind_Backtrace (trace=0x7ffff7ba7f90 <backtrace_helper>, trace_argument=0x7fffffffd720) at /build/gcc/src/gcc/libgcc/unwind.inc:302
#8  0x00007ffff7ba8116 in __GI___backtrace (array=array@entry=0x7fffffffc4e8, size=size@entry=1) at backtrace.c:116
#9  0x00005555556a1be3 in OsInit () at ../xorg-server-1.20.9/os/osinit.c:217
#10 0x0000000000000001 in ?? ()
#11 0x00000000000081ed in ?? ()
#12 0x0000000300000000 in ?? ()
#13 0x0000000800000004 in ?? ()
#14 0x0000000600000007 in ?? ()
#15 0x000000180000001f in ?? ()
#16 0x0000000000000019 in ?? ()
#17 0x000000005f84957f in ?? ()
#18 0x00005555556a59f0 in ?? () at ../xorg-server-1.20.9/dri3/dri3_request.c:118
#19 0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65    VPCMPEQ (%rdi), %ymm0, %ymm1
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
#1  0x00007ffff71c3706 in get_cie_encoding (cie=0x5555a2e267c9) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.c:300
#2  0x00007ffff71c4663 in get_fde_encoding (f=0x55555577c67c) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.h:157
#3  _Unwind_IteratePhdrCallback (info=info@entry=0x7fffffffc4b0, size=size@entry=64, ptr=ptr@entry=0x7fffffffc540) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde-dip.c:418
#4  0x00007ffff7bd4f95 in __GI___dl_iterate_phdr (callback=callback@entry=0x7ffff71c41c0 <_Unwind_IteratePhdrCallback>, data=data@entry=0x7fffffffc540) at dl-iteratephdr.c:75
#5  0x00007ffff71c52c6 in _Unwind_Find_FDE (pc=0x55555569abd2 <xorg_backtrace+82>, bases=bases@entry=0x7fffffffc6b8) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde-dip.c:469
#6  0x00007ffff71c1319 in uw_frame_state_for (context=0x7fffffffc610, fs=0x7fffffffc700) at /build/gcc/src/gcc/libgcc/unwind-dw2.c:1263
#7  0x00007ffff71c333b in _Unwind_Backtrace (trace=0x7ffff7ba7f90 <backtrace_helper>, trace_argument=0x7fffffffc8c0) at /build/gcc/src/gcc/libgcc/unwind.inc:302
#8  0x00007ffff7ba8116 in __GI___backtrace (array=array@entry=0x7fffffffc8f0, size=size@entry=64) at backtrace.c:116
#9  0x000055555569abd3 in xorg_backtrace () at ../xorg-server-1.20.9/os/backtrace.c:126
#10 0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit

So there are two segfaults, the second being the one from the lightdm service coredump. The second segfault occurs when Xorg's own signal handler tries to handle the first segfault and unwinds the stack in order to print a stacktrace. The real issue is the first one, that occurs when initializing the backtracer (see osinit.c:217).

My first suspicion was avx2. I enabled early microcode updates, but I already had the newest via BIOS (confirmed via iucode-tool, microcode 0xd6 in `/proc/cpuinfo`). Next I recompiled glibc without avx2 support, but Xorg still segfaulted at the same position but this time it was using `__strlen_sse2`. So that's not it.

Next up was DRI3 since that was to top-most source in the stacktrace. I played around with various configurations and compile-flags to disable DRI3 and use DRI2, but it was always a segfault with the same stacktrace. Dead end too.

The offending call in Xorg is surrounded by a `ifdef` and since the backtrace is not mission critical, I unset that macro by removing this line from the meson config so it simply wouldn't execute the offending part. And indeed Xorg runs without any problems without backtrace support.

The source code comment above that backtrace initialization didn't make sense to me, so I searched why that is needed at all. The answer was found in the glibc manpage (4th bullet): On the first use of `backtrace()`, libgcc is loaded dynamically which shall not be done in a signal handler. The first call has to be made outside the signal handler so libgcc is already loaded. From the source of `backtrce()` I found it's shortcuted when called with 0 as its second argument, so I changed the initilizer call to that and Xorg runs fine. In hindsight that's obvious because in that case it's just a normal c function; the init() function isn't called and libgcc isn't loaded.
Using gdb I set breakpoints in the two known code lines in Xorg (osinit.c:217 and dri3_request.c:118) to "take a look around", but it only ever stopped in osinit.c and never in dri3_request.c even thou it is further up the stack and should be called first. Since the offending feature revolves around stacktraces and fails at unwinding it, I started to question the correctness of that stacktrace; maybe we are dealing with a stack corruption. After a bit of fiddling that suspicion was confirmed by setting a breakpoint to the `OsInit` function and stepping through.

(GDB) 9.2 

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file /usr/lib/Xorg
Reading symbols from /usr/lib/Xorg...
(gdb) break OsInit
Breakpoint 1 at 0x14d8f0: file ../xorg-server-1.20.9/os/osinit.c, line 165.
(gdb) run :0 -configure
Starting program: /usr/lib/Xorg :0 -configure
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Breakpoint 1, OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165    {
(gdb) bt
#0  OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
#1  0x000055555558d2ff in dix_main (envp=<optimized out>, argv=<optimized out>, argc=<optimized out>) at ../xorg-server-1.20.9/dix/main.c:154
#2  main (argc=61, argv=0x100000000, envp=<optimized out>) at ../xorg-server-1.20.9/dix/stubmain.c:34
(gdb) n
OsInit () at ../xorg-server-1.20.9/os/osinit.c:172
172     if (!been_here) {
(gdb) bt
#0  OsInit () at ../xorg-server-1.20.9/os/osinit.c:172
#1  0x0000000000000001 in ?? ()
#2  0x00000000000081ed in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) n
177         int siglist[] = { SIGSEGV, SIGQUIT, SIGILL, SIGFPE, SIGBUS,
(gdb) bt
#0  OsInit () at ../xorg-server-1.20.9/os/osinit.c:177
#1  0x0000000000000001 in ?? ()
#2  0x00000000000081ed in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65    VPCMPEQ (%rdi), %ymm0, %ymm1
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65    VPCMPEQ (%rdi), %ymm0, %ymm1
(gdb) c
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit

gdb also stops if a breakpoint is set to dix/main.c:154 which confirms it. At the very start of the function the stack is correct, and when reaching the first command (in this case the if) it's corrupted. The optimizer came to mind. By default makepkg compiles using `-O2`, so I rebuild xorg-server with `-O1` and it it works. The same with `-O0`. I haven't tried `-O3` but I assume it segfaults too.

I also tried to start live cds of different other distributions (namely fedora 32, debian buster (xfce) and xubuntu 20.04) and they all start graphically fine. I haven't checked what version of glibc, gcc and xorg all of them use thou, you also had to factor in compile flags and applied patches. For funsies I even tried Manjaro (something I never thought I would say or do), and it worked too.

And that is basically the point I am currently at. I use the rebuilt `-O1` package as a workaround, but have no idea what exactly the problem is and how to get it fixed so I can stop rebuilding.

Last edited by Corubba (2020-10-12 21:35:28)

Offline

#2 2020-10-13 13:56:08

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 8,497

Re: Xorg segfaults very early when compiled with optimisation level 2

Have you tried
- boot to multi-user.target
- login to console as user
- copy /etc/x11/xinit/xinitrc to $HOME/.xinitrc
- execute startx

If there's no xorg log under /var/log , look for  ~/.local/share/xorg/Xorg.log


Multi-init booting with apg Openrc and systemd coexisting
Automounting : not needed, i prefer pmount
Aur helpers : makepkg + my own local repo === rarely need them

Offline

#3 2020-10-13 14:44:43

seth
Member
Registered: 2012-09-03
Posts: 16,682

Re: Xorg segfaults very early when compiled with optimisation level 2

https://www.google.com/search?q=%22Xorg … 2+segfault

That's "normal", what's the lightdm backtrace?

Offline

#4 2020-10-13 20:06:33

Corubba
Member
From: Germany
Registered: 2010-11-14
Posts: 85

Re: Xorg segfaults very early when compiled with optimisation level 2

Lone_Wolf wrote:

Have you tried
- boot to multi-user.target
- login to console as user
- copy /etc/x11/xinit/xinitrc to $HOME/.xinitrc
- execute startx

If there's no xorg log under /var/log , look for  ~/.local/share/xorg/Xorg.log

Yes, tried that earlier a few times and again today but with the same result: segfault/coredump with the same stacktrace as above, very short console output and no log file (neither in /var/log nor in ~/.local).

seth wrote:

https://www.google.com/search?q=%22Xorg … 2+segfault

That's "normal", what's the lightdm backtrace?

Not really sure what you are looking for, but there you go.

GNU gdb (GDB) 9.2 

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file /usr/bin/lightdm
Reading symbols from /usr/bin/lightdm...
(gdb) break process.c:223
Breakpoint 1 at 0x13772: file process.c, line 223.
(gdb) run 
Starting program: /usr/bin/lightdm·
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff773e640 (LWP 803)]
[New Thread 0x7ffff6f3d640 (LWP 804)]
[New Thread 0x7ffff6728640 (LWP 805)]
[New Thread 0x7ffff5f27640 (LWP 806)]

Thread 1 "lightdm" hit Breakpoint 1, process_start (process=0x5555555f6c60, block=block@entry=0) at process.c:223
223     pid_t pid = fork (); 
(gdb) bt
#0  process_start (process=0x5555555f6c60, block=block@entry=0) at process.c:223
#1  0x000055555557809e in x_server_local_start (display_server=0x5555555dd4a0) at x-server-local.c:545
#2  0x000055555556d831 in start_display_server (display_server=0x5555555dd4a0, seat=0x5555555d7460) at seat.c:1393
#3  seat_real_start (seat=0x5555555d7460) at seat.c:1744
#4  0x000055555556a0db in seat_start (seat=0x5555555d7460) at seat.c:220
#5  0x000055555555e1c9 in display_manager_add_seat (manager=0x5555555a7470, seat=0x5555555d7460) at display-manager.c:113
#6  0x0000555555564786 in add_login1_seat (login1_seat=0x7fffec006a60) at lightdm.c:427
#7  update_login1_seat (login1_seat=0x7fffec006a60) at lightdm.c:464
#8  0x000055555555c963 in main (argc=<optimized out>, argv=<optimized out>) at lightdm.c:881
(gdb) set follow-fork-mode child
(gdb) break process.c:252
Breakpoint 2 at 0x5555555679cf: file process.c, line 252.
(gdb) n
[Thread 0x7ffff5f27640 (LWP 806) exited]
[New inferior 2 (process 808)]
[Inferior 1 (process 799) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[Switching to Thread 0x7ffff773f780 (LWP 808)]
process_start (process=0x5555555f6c60, block=block@entry=0) at process.c:224
224     if (pid == 0)
(gdb) c
Continuing.

Thread 2.1 "lightdm" hit Breakpoint 2, process_start (process=<optimized out>, block=block@entry=0) at process.c:252
252         execvp (argv[0], argv);
(gdb) bt
#0  process_start (process=<optimized out>, block=block@entry=0) at process.c:252
#1  0x000055555557809e in x_server_local_start (display_server=0x5555555dd4a0) at x-server-local.c:545
#2  0x000055555556d831 in start_display_server (display_server=0x5555555dd4a0, seat=0x5555555d7460) at seat.c:1393
#3  seat_real_start (seat=0x5555555d7460) at seat.c:1744
#4  0x000055555556a0db in seat_start (seat=0x5555555d7460) at seat.c:220
#5  0x000055555555e1c9 in display_manager_add_seat (manager=0x5555555a7470, seat=0x5555555d7460) at display-manager.c:113
#6  0x0000555555564786 in add_login1_seat (login1_seat=0x7fffec006a60) at lightdm.c:427
#7  update_login1_seat (login1_seat=0x7fffec006a60) at lightdm.c:464
#8  0x000055555555c963 in main (argc=<optimized out>, argv=<optimized out>) at lightdm.c:881
(gdb) set follow-exec-mode new
(gdb) n
process 808 is executing new program: /usr/bin/bash
[New inferior 3]
[New process 808]
process 808 is executing new program: /usr/lib/Xorg.wrap
[New inferior 4]
[New process 808]
process 808 is executing new program: /usr/lib/Xorg
[New inferior 5]
[New process 808]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Thread 5.1 "Xorg" received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65    VPCMPEQ (%rdi), %ymm0, %ymm1
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
#1  0x00007f42a47c8706 in get_cie_encoding (cie=0x55c94a4eb0fb) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.c:300
#2  0x00007f42a47c9663 in get_fde_encoding (f=0x55c9bf65ca88) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.h:157
#3  _Unwind_IteratePhdrCallback (info=info@entry=0x7ffd1c6b9510, size=size@entry=64, ptr=ptr@entry=0x7ffd1c6b95a0) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde-dip.c:418
#4  0x00007f42a51d9f95 in __GI___dl_iterate_phdr (callback=callback@entry=0x7f42a47c91c0 <_Unwind_IteratePhdrCallback>, data=data@entry=0x7ffd1c6b95a0) at dl-iteratephdr.c:75
#5  0x00007f42a47ca2c6 in _Unwind_Find_FDE (pc=0x55c9bf580be2 <OsInit+754>, bases=bases@entry=0x7ffd1c6b9718) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde-dip.c:469
#6  0x00007f42a47c6319 in uw_frame_state_for (context=0x7ffd1c6b9670, fs=0x7ffd1c6b9760) at /build/gcc/src/gcc/libgcc/unwind-dw2.c:1263
#7  0x00007f42a47c833b in _Unwind_Backtrace (trace=0x7f42a51acf90 <backtrace_helper>, trace_argument=0x7ffd1c6b9920) at /build/gcc/src/gcc/libgcc/unwind.inc:302
#8  0x00007f42a51ad116 in __GI___backtrace (array=array@entry=0x7ffd1c6b86e8, size=size@entry=1) at backtrace.c:116
#9  0x000055c9bf580be3 in OsInit () at ../xorg-server-1.20.9/os/osinit.c:217
#10 0x0000000000000001 in ?? ()
#11 0x00000000000081ed in ?? ()
#12 0x0000000300000000 in ?? ()
#13 0x0000000800000004 in ?? ()
#14 0x0000000600000007 in ?? ()
#15 0x000000180000001f in ?? ()
#16 0x0000000000000019 in ?? ()
#17 0x000000005f85ea49 in ?? ()
#18 0x000055c9bf5849f0 in ?? () at ../xorg-server-1.20.9/dri3/dri3_request.c:118
#19 0x0000000000000000 in ?? ()

The process.c I set breakpoints in is this one. lightdm first forks, and the child process then exec's into Xorg. And Xorg then segfaults at the same position as with -configure. The process tree at the time of the last backtrace looks like this:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         793  0.5  1.4 192292 115500 tty1    Sl+  19:48   0:00   gdb
root         799  0.0  0.0 305044  6760 tty1     Sl   19:49   0:00     /usr/bin/lightdm
root         808  0.0  0.0  17280  6728 tty1     t    19:49   0:00       /usr/lib/Xorg :0 -seat seat0 -auth /run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch

Offline

#5 2020-10-16 17:49:27

Corubba
Member
From: Germany
Registered: 2010-11-14
Posts: 85

Re: Xorg segfaults very early when compiled with optimisation level 2

Since it is somehow optimizer-related, by try-and-error I pinpointed the relevant option. Up to (and including) `-O1` gcc uses ´reorder-blocks-algorithm=simple´, while `-O2` and upwards uses ´reorder-blocks-algorithm=stc´ (see also the manpage for gcc). When building Xorg with a "downgraded" -O2 by using `-O2 -freorder-blocks-algorithm=simple` it works. Interestingly enough, `-O1 reorder-blocks-algorithm=stc` works too, so that may not be the whole story.

Nevertheless I stepped through both on a instruction granularity and looked what's happening to the registers between the point of having a intact stack at the start of the OsInit function, and the first LoC.

This is "vanila" -O2 with stc which does not work:

GNU gdb (GDB) 9.2

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file /usr/lib/Xorg
Reading symbols from /usr/lib/Xorg...
(gdb) break OsInit 
Breakpoint 1 at 0x14d8f0: file ../xorg-server-1.20.9/os/osinit.c, line 165.
(gdb) run :0 -configure
Starting program: /usr/lib/Xorg :0 -configure
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Breakpoint 1, OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
(gdb) bt
#0  OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
#1  0x000055555558d2ff in dix_main (envp=<optimized out>, argv=<optimized out>, argc=<optimized out>) at ../xorg-server-1.20.9/dix/main.c:154
#2  main (argc=61, argv=0x100000000, envp=<optimized out>) at ../xorg-server-1.20.9/dix/stubmain.c:34
(gdb) info registers
rax            0x0                 0
rbx            0x3                 3
rcx            0x7ffff7b68e4b      140737349324363
rdx            0x0                 0
rsi            0x55555573b702      93824994227970
rdi            0x0                 0
rbp            0x55555571f0d0      0x55555571f0d0 <__libc_csu_init>
rsp            0x7fffffffe9a8      0x7fffffffe9a8
r8             0x1999999999999999  1844674407370955161
r9             0x1                 1
r10            0x555555567918      93824992311576
r11            0x7ffff7b272f0      140737349055216
r12            0x5555557aa700      93824994682624
r13            0x0                 0
r14            0x0                 0
r15            0x7fffffffeb28      140737488349992
rip            0x5555556a18f0      0x5555556a18f0 <OsInit>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) display/i $pc
1: x/i $pc
=> 0x5555556a18f0 <OsInit>: push   %r15
(gdb) si
0x00005555556a18f2 in OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
1: x/i $pc
=> 0x5555556a18f2 <OsInit+2>: push   %r14
(gdb) si
0x00005555556a18f4 in OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
1: x/i $pc
=> 0x5555556a18f4 <OsInit+4>: push   %r13
(gdb) si
0x00005555556a18f6 in OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
1: x/i $pc
=> 0x5555556a18f6 <OsInit+6>: push   %r12
(gdb) si
0x00005555556a18f8 in OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
1: x/i $pc
=> 0x5555556a18f8 <OsInit+8>: push   %rbp
(gdb) si
0x00005555556a18f9 in OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
1: x/i $pc
=> 0x5555556a18f9 <OsInit+9>: push   %rbx
(gdb) si
0x00005555556a18fa in OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
1: x/i $pc
=> 0x5555556a18fa <OsInit+10>:  sub    $0x1238,%rsp
(gdb) si
0x00005555556a1901 in OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
1: x/i $pc
=> 0x5555556a1901 <OsInit+17>:  mov    0x116aa9(%rip),%edi        # 0x5555557b83b0 <been_here.1>
(gdb) si
0x00005555556a1907  165 {
1: x/i $pc
=> 0x5555556a1907 <OsInit+23>:  mov    %fs:0x28,%rax
(gdb) si
0x00005555556a1910  165 {
1: x/i $pc
=> 0x5555556a1910 <OsInit+32>:  mov    %rax,0x1228(%rsp)
(gdb) si
0x00005555556a1918  165 {
1: x/i $pc
=> 0x5555556a1918 <OsInit+40>:  xor    %eax,%eax
(gdb) si
172     if (!been_here) {
1: x/i $pc
=> 0x5555556a191a <OsInit+42>:  test   %edi,%edi
(gdb) bt
#0  OsInit () at ../xorg-server-1.20.9/os/osinit.c:172
#1  0x0000000000000001 in ?? ()
#2  0x00000000000081ed in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) info registers
rax            0x0                 0
rbx            0x3                 3
rcx            0x7ffff7b68e4b      140737349324363
rdx            0x0                 0
rsi            0x55555573b702      93824994227970
rdi            0x0                 0
rbp            0x55555571f0d0      0x55555571f0d0 <__libc_csu_init>
rsp            0x7fffffffd740      0x7fffffffd740
r8             0x1999999999999999  1844674407370955161
r9             0x1                 1
r10            0x555555567918      93824992311576
r11            0x7ffff7b272f0      140737349055216
r12            0x5555557aa700      93824994682624
r13            0x0                 0
r14            0x0                 0
r15            0x7fffffffeb28      140737488349992
rip            0x5555556a191a      0x5555556a191a <OsInit+42>
eflags         0x246               [ PF ZF IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:96
96    VPCMPEQ (%rdi), %ymm0, %ymm1
1: x/i $pc
=> 0x7ffff7bfdab7 <__strlen_avx2+71>: vpcmpeqb (%rdi),%ymm0,%ymm1
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65    VPCMPEQ (%rdi), %ymm0, %ymm1
1: x/i $pc
=> 0x7ffff7bfda85 <__strlen_avx2+21>: vpcmpeqb (%rdi),%ymm0,%ymm1
(gdb) c
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit

And this is "downgraded" -O2 with simple which does work:

GNU gdb (GDB) 9.2

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file /usr/lib/Xorg
Reading symbols from /usr/lib/Xorg...
(gdb) break OsInit
Breakpoint 1 at 0x149620: file ../xorg-server-1.20.9/os/osinit.c, line 165.
(gdb) run :0 -configure
Starting program: /usr/lib/Xorg :0 -configure
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Breakpoint 1, OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
165 {
(gdb) bt
#0  OsInit () at ../xorg-server-1.20.9/os/osinit.c:165
#1  0x000055555558e3d7 in dix_main (envp=<optimized out>, argv=<optimized out>, argc=<optimized out>) at ../xorg-server-1.20.9/dix/main.c:154
#2  main (argc=3, argv=0x7fffffffeb28, envp=<optimized out>) at ../xorg-server-1.20.9/dix/stubmain.c:34
(gdb) info registers
rax            0x0                 0
rbx            0x3                 3
rcx            0x7ffff7b68e4b      140737349324363
rdx            0x0                 0
rsi            0x555555734702      93824994199298
rdi            0x0                 0
rbp            0x555555718c10      0x555555718c10 <__libc_csu_init>
rsp            0x7fffffffe9a8      0x7fffffffe9a8
r8             0x1999999999999999  1844674407370955161
r9             0x1                 1
r10            0x555555567918      93824992311576
r11            0x7ffff7b272f0      140737349055216
r12            0x5555557a0700      93824994641664
r13            0x0                 0
r14            0x0                 0
r15            0x7fffffffeb28      140737488349992
rip            0x55555569d620      0x55555569d620 <OsInit>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) display/i $pc
1: x/i $pc
=> 0x55555569d620 <OsInit>: push   %r15
(gdb) si
0x000055555569d622  165 {
1: x/i $pc
=> 0x55555569d622 <OsInit+2>: push   %r14
(gdb) si
0x000055555569d624  165 {
1: x/i $pc
=> 0x55555569d624 <OsInit+4>: push   %r13
(gdb) si
0x000055555569d626  165 {
1: x/i $pc
=> 0x55555569d626 <OsInit+6>: push   %r12(gdb) si
0x000055555569d628  165 {
1: x/i $pc
=> 0x55555569d628 <OsInit+8>: push   %rbp
(gdb) si
0x000055555569d629  165 {
1: x/i $pc
=> 0x55555569d629 <OsInit+9>: push   %rbx
(gdb) si
0x000055555569d62a  165 {
1: x/i $pc
=> 0x55555569d62a <OsInit+10>:  sub    $0x1238,%rsp
(gdb) si
0x000055555569d631  165 {
1: x/i $pc
=> 0x55555569d631 <OsInit+17>:  mov    0x110d79(%rip),%edi        # 0x5555557ae3b0 <been_here.1>
(gdb) si
0x000055555569d637  165 {
1: x/i $pc
=> 0x55555569d637 <OsInit+23>:  mov    %fs:0x28,%rax
(gdb) si
0x000055555569d640  165 {
1: x/i $pc
=> 0x55555569d640 <OsInit+32>:  mov    %rax,0x1228(%rsp)
(gdb) si
0x000055555569d648  165 {
1: x/i $pc
=> 0x55555569d648 <OsInit+40>:  xor    %eax,%eax
(gdb) si
172     if (!been_here) {
1: x/i $pc
=> 0x55555569d64a <OsInit+42>:  test   %edi,%edi
(gdb) bt
#0  OsInit () at ../xorg-server-1.20.9/os/osinit.c:172
#1  0x000055555558e3d7 in dix_main (envp=<optimized out>, argv=<optimized out>, argc=<optimized out>) at ../xorg-server-1.20.9/dix/main.c:154
#2  main (argc=3, argv=0x7fffffffeb28, envp=<optimized out>) at ../xorg-server-1.20.9/dix/stubmain.c:34
(gdb) info registers
rax            0x0                 0
rbx            0x3                 3
rcx            0x7ffff7b68e4b      140737349324363
rdx            0x0                 0
rsi            0x555555734702      93824994199298
rdi            0x0                 0
rbp            0x555555718c10      0x555555718c10 <__libc_csu_init>
rsp            0x7fffffffd740      0x7fffffffd740
r8             0x1999999999999999  1844674407370955161
r9             0x1                 1
r10            0x555555567918      93824992311576
r11            0x7ffff7b272f0      140737349055216
r12            0x5555557a0700      93824994641664
r13            0x0                 0
r14            0x0                 0
r15            0x7fffffffeb28      140737488349992
rip            0x55555569d64a      0x55555569d64a <OsInit+42>
eflags         0x246               [ PF ZF IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) c
Continuing.
[New Thread 0x7fffee98f640 (LWP 699)]
[Thread 0x7ffff71cf540 (LWP 695) exited]
[Inferior 1 (process 695) exited normally]
(gdb) kill

The instructions are the same, the only difference is that constant in the one `mov` instruction (`mov 0x110d79(%rip), %edi` vs `mov 0x116aa9(%rip), %edi`). The registers look also identical, apart from the expected differences of memory addresses.

Not really sure where to go from here.

Last edited by Corubba (2020-10-16 17:50:05)

Offline

#6 2020-10-17 08:48:46

seth
Member
Registered: 2012-09-03
Posts: 16,682

Re: Xorg segfaults very early when compiled with optimisation level 2

bactrace manpage wrote:

NOTES
       These functions make some assumptions about how a function's return address is stored on the stack.  Note the following:
       *  Omission of the frame pointers (as implied by any of gcc(1)'s nonzero optimization levels) may cause these assumptions to be violated.

       *  Tail-call optimization causes one stack frame to replace another.

You could try to build "-O2 -fno-omit-frame-pointer", mabe also restrict max-tail-merge-*, https://gcc.gnu.org/onlinedocs/gcc/Opti … tions.html

Offline

#7 2020-10-17 12:13:02

Corubba
Member
From: Germany
Registered: 2010-11-14
Posts: 85

Re: Xorg segfaults very early when compiled with optimisation level 2

Thanks! I even referenced these same notes earlier. No idea how I could overlook that, the words "optimisation level" in there should have alerted me.

Optimizations                                                               | Works?
-------------------------------------------------------------------------------------
-O2                                                                         | N
-O1                                                                         | Y
-O2 -freorder-blocks-algorithm=simple                                       | Y
-O1 -freorder-blocks-algorithm=stc                                          | Y
-O2 -fno-omit-frame-pointer                                                 | Y
-O1 -fomit-frame-pointer                                                    | Y 
-O2 -fno-tree-tail-merge                                                    | N
-O1 -ftree-tail-merge                                                       | Y
-O1 -freorder-blocks-algorithm=stc -fomit-frame-pointer -ftree-tail-merge   | Y

I did not play around with the tree-tail-merge-* parameters since enabling/disabling it entirely didn't make a difference in my case. It is interesting that -O1 with all the "reverted -O2 repairing" flags still works; but ultimatly I want to "repair" -O2 and not "break" -O1.

For me, these notes essentially boil down to "if you want the backtrace function to work reliably and have meaning results, compile with `-fno-omit-frame-pointer -fno-tree-tail-merge`"; and this means I am looking at a packaging issue. In Xorg the backtrace feature is enabled by the mere presence of the `backtrace()` function, as I already found out while trying to disable it (see my first post). It stands to reason that the backtrace support is desired in the arch package, so it would have to be fixed by using the aforementioned flags. Since all of the other distributions I tested are working, I will take a look at their package sources if they use anything similar in this regard. Also I am curious if using these flags entails a noticable degradation in performance.

Offline

#8 2020-10-17 16:09:24

seth
Member
Registered: 2012-09-03
Posts: 16,682

Re: Xorg segfaults very early when compiled with optimisation level 2

Also I am curious if using these flags entails a noticable degradation in performance.

"Depends" - https://stackoverflow.com/questions/130 … ormance-an

Apparently this wasn't always part of O2, https://bbs.archlinux.org/viewtopic.php?id=117812 and being sufficiently old and so far (maybe until recent compilers/architectures) not a problem (since it would only occur when the server crashes anyway) nobody cared.
I assume we want to maintain the backtrace feature, because nobody wants to run Xorg in gdb…

Offline

#9 2020-10-19 22:16:50

Corubba
Member
From: Germany
Registered: 2010-11-14
Posts: 85

Re: Xorg segfaults very early when compiled with optimisation level 2

seth wrote:

[...] nobody wants to run Xorg in gdb…

I second that.

I created bug FS#68340 to hopefully get this fixed.

Offline

Board footer

Powered by FluxBB