You are not logged in.

#1 2019-09-29 01:04:23

dmidge
Member
Registered: 2015-04-04
Posts: 69

Server out of reach

Hi folks!

I have a question to start my debugging process.
I have a computer, that I use as a server. I often connect to it through ssh to start some operation. My server is running archlinux. And time to time, it stops responding. A

nmap -sS

on another computer of the local network shows that it is not connected to the network, but the computer is still running (CPU fan is on).
I then try to connect a screen on the HDMI, and nothing goes on the display (knowing it is a KDE/plasma desktop which is normally running). I can't bring anything up with the sysreq keys. So what am I missing?


Cheers!

Offline

#2 2019-09-29 01:18:21

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,332
Website

Re: Server out of reach

dmidge wrote:

kSo what am I missing?

Any diagnostic information.  Given that you've said this has happened multiple times, you've clearly gotten back into the server after one of these events, so get the logs and journal and share it here.

dmidge wrote:

knowing it is a KDE/plasma desktop which is normally running

Why are you running a DE on a headless server?


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#3 2019-10-05 21:02:14

dmidge
Member
Registered: 2015-04-04
Posts: 69

Re: Server out of reach

Hi @Trilby,


Thanks for your reply.

You are right, sorry. I get nothing from a dmesg since I have to hard reboot the computer everytime.
From journalctl, I don't find anything useful. The thing is - since I don't know exactly when the system stops responding as it usually takes days before I notice, I don't know what to look for exactly.
If I look at something weird from the last boot, I for instance see a coredump on a konsole:

systemd-coredump[7334]: Process 6524 (konsole) of user 1000 dumped core.
                                                       
                                                       Stack trace of thread 6524:
                                                       #0  0x00007f50bf314755 raise (libc.so.6)
                                                       #1  0x00007f50bee30bef _ZN6KCrash19defaultCrashHandlerEi (libKF5Crash.so.5)
                                                       #2  0x00007f50bf3147e0 __restore_rt (libc.so.6)
                                                       #3  0x00007f50bdf7ab51 _ZN22QGuiApplicationPrivate21processActivatedEventEPN29QWindowSystemInterfacePrivate20ActivatedWindowEventE (l>
                                                       #4  0x00007f50bdf7ff80 _ZN22QGuiApplicationPrivate24processWindowSystemEventEPN29QWindowSystemInterfacePrivate17WindowSystemEventE (l>
                                                       #5  0x00007f50bdf5837c _ZN22QWindowSystemInterface22sendWindowSystemEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Gui.so.5)
                                                       #6  0x00007f50b7d92fec n/a (libQt5XcbQpa.so.5)
                                                       #7  0x00007f50bbccfcf4 g_main_context_dispatch (libglib-2.0.so.0)
                                                       #8  0x00007f50bbcd1b11 n/a (libglib-2.0.so.0)
                                                       #9  0x00007f50bbcd1b51 g_main_context_iteration (libglib-2.0.so.0)
                                                       #10 0x00007f50bdc3a9a3 _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5)
                                                       #11 0x00007f50bdbe15ec _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5)
                                                       #12 0x00007f50bdbe9326 _ZN16QCoreApplication4execEv (libQt5Core.so.5)
                                                       #13 0x00007f50bf4c5ec8 kdemain (libkdeinit5_konsole.so)
                                                       #14 0x00007f50bf300ee3 __libc_start_main (libc.so.6)
                                                       #15 0x00005594aa84605e _start (konsole)
                                                       
                                                       Stack trace of thread 6529:
                                                       #0  0x00007f50bf3cb667 __poll (libc.so.6)
                                                       #1  0x00007f50bbcd1a80 n/a (libglib-2.0.so.0)
                                                       #2  0x00007f50bbcd1b51 g_main_context_iteration (libglib-2.0.so.0)
                                                       #3  0x00007f50bdc3a9bc _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5)
                                                       #4  0x00007f50bdbe15ec _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5)
                                                       #5  0x00007f50bda142f5 _ZN7QThread4execEv (libQt5Core.so.5)
                                                       #6  0x00007f50bd151b37 n/a (libQt5DBus.so.5)
                                                       #7  0x00007f50bda15520 n/a (libQt5Core.so.5)
                                                       #8  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #9  0x00007f50bf3d60e3 __clone (libc.so.6)
                                                       
                                                       Stack trace of thread 6530:
                                                       #0  0x00007f50bc837415 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                       #1  0x00007f50aef48c44 n/a (swrast_dri.so)
                                                       #2  0x00007f50aef48a98 n/a (swrast_dri.so)
                                                       #3  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #4  0x00007f50bf3d60e3 __clone (libc.so.6)
                                                       
                                                       Stack trace of thread 6533:
                                                       #0  0x00007f50bc837415 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                       #1  0x00007f50aef48c44 n/a (swrast_dri.so)
                                                       #2  0x00007f50aef48a98 n/a (swrast_dri.so)
                                                       #3  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #4  0x00007f50bf3d60e3 __clone (libc.so.6)
                                                       
                                                       Stack trace of thread 6532:
                                                       #0  0x00007f50bc837415 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                       #1  0x00007f50aef48c44 n/a (swrast_dri.so)
                                                       #2  0x00007f50aef48a98 n/a (swrast_dri.so)
                                                       #3  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #4  0x00007f50bf3d60e3 __clone (libc.so.6)
                                                       
                                                       Stack trace of thread 6531:
                                                       #0  0x00007f50bc837415 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                       #1  0x00007f50aef48c44 n/a (swrast_dri.so)
                                                       #2  0x00007f50aef48a98 n/a (swrast_dri.so)
                                                       #3  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #3  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #4  0x00007f50bf3d60e3 __clone (libc.so.6)
                                                       
                                                       Stack trace of thread 6533:
                                                       #0  0x00007f50bc837415 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                       #1  0x00007f50aef48c44 n/a (swrast_dri.so)
                                                       #2  0x00007f50aef48a98 n/a (swrast_dri.so)
                                                       #3  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #4  0x00007f50bf3d60e3 __clone (libc.so.6)
                                                       
                                                       Stack trace of thread 6532:
                                                       #0  0x00007f50bc837415 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                       #1  0x00007f50aef48c44 n/a (swrast_dri.so)
                                                       #2  0x00007f50aef48a98 n/a (swrast_dri.so)
                                                       #3  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #4  0x00007f50bf3d60e3 __clone (libc.so.6)
                                                       
                                                       Stack trace of thread 6531:
                                                       #0  0x00007f50bc837415 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                       #1  0x00007f50aef48c44 n/a (swrast_dri.so)
                                                       #2  0x00007f50aef48a98 n/a (swrast_dri.so)
                                                       #3  0x00007f50bc83157f start_thread (libpthread.so.0)
                                                       #4  0x00007f50bf3d60e3 __clone (libc.so.6)

.

Some glx rendering issue also. I don't think that any of this is related to that specific problem even though it may be good to report it for later fixes.
However, the think is: I don't know what to look for...

The reason I have a desktop is that I actually remotely connect on it, to do some work. Like writing a small script. It is not a server that I use to run a website - or at least not a professional one, but more a computer where I offload some computation sometimes. Thus, it is just more convenient to have a desktop there. The same way that I find it more convenient to have a DE on my everyday machine where I send mail, etc. That is also why I have an archlinux and not a more stable distro for servers, such as a debian, on it.

Offline

#4 2019-10-05 22:49:36

seth
Member
Registered: 2012-09-03
Posts: 61,171

Re: Server out of reach

swrast_dri.so is suspicious - Xorg log?

can't bring anything up with the sysreq keys

Is it enabled?

cat /proc/sys/kernel/sysrq

I offload some computation sometimes. Thus, it is just more convenient to have a desktop there

Häh??

I'd try to disable the compositor, the screensaver/locker and all of powerdevil. Also monitor the RAM load (whether there're maybe leaks that could build up over a week or so)

it usually takes days before I notice

You could loop ping it from a client and have a dialog or so when it stops responding.

while ping -c1 <server>; do sleep 30m; done; notify-send "Server died"

Offline

#5 2019-10-06 00:36:58

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,332
Website

Re: Server out of reach

Do you run some sort of VNC?  If you are just shelling in, there need be no desktop on the server - you can have one on the client.


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#6 2019-10-07 00:15:33

dmidge
Member
Registered: 2015-04-04
Posts: 69

Re: Server out of reach

Hi @Seth and @Trilby,

Thank you both for your interest on the problem.

@Seth:
The sysreq keys should be enabled:

% cat /proc/sys/kernel/sysrq
1

I like the continuous ping idea. I may still take time to notice that it went off. I may want to send myself a mail - which is a bit more complicated. I may try to do that at some point, when I'll have time.

What is this shared library supposed to do? What is odd about swrast_dri.so?
I don't know what to look for in the Xorg file. But I'll past a part of it, which may be interesting (that is the end of the file):

[     7.104] (II) config/udev: Adding input device Power Button (/dev/input/event1)
[     7.104] (**) Power Button: Applying InputClass "evdev keyboard catchall"
[     7.104] (**) Power Button: Applying InputClass "libinput keyboard catchall"
[     7.104] (**) Power Button: Applying InputClass "system-keyboard"
[     7.104] (II) LoadModule: "libinput"
[     7.104] (II) Loading /usr/lib/xorg/modules/input/libinput_drv.so
[     7.108] (II) Module libinput: vendor="X.Org Foundation"
[     7.108]    compiled for 1.20.3, module version = 0.28.2
[     7.108]    Module class: X.Org XInput Driver
[     7.108]    ABI class: X.Org XInput driver, version 24.1
[     7.108] (II) Using input driver 'libinput' for 'Power Button'
[     7.108] (**) Power Button: always reports core events
[     7.108] (**) Option "Device" "/dev/input/event1"
[     7.108] (**) Option "_source" "server/udev"
[     7.113] (II) event1  - Power Button: is tagged by udev as: Keyboard
[     7.113] (II) event1  - Power Button: device is a keyboard
[     7.114] (II) event1  - Power Button: device removed
[     7.160] (**) Option "config_info" "udev:/sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1"
[     7.160] (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 6)
[     7.160] (**) Option "xkb_model" "pc105"
[     7.160] (**) Option "xkb_layout" "ca"
[     7.160] (**) Option "xkb_options" "terminate:ctrl_alt_bksp"
[     7.178] (II) event1  - Power Button: is tagged by udev as: Keyboard
[     7.178] (II) event1  - Power Button: device is a keyboard
[     7.178] (II) config/udev: Adding input device Power Button (/dev/input/event0)
[     7.178] (**) Power Button: Applying InputClass "evdev keyboard catchall"
[     7.178] (**) Power Button: Applying InputClass "libinput keyboard catchall"
[     7.178] (**) Power Button: Applying InputClass "system-keyboard"
[     7.178] (II) Using input driver 'libinput' for 'Power Button'
[     7.178] (**) Power Button: always reports core events
[     7.178] (**) Option "Device" "/dev/input/event0"
[     7.178] (**) Option "_source" "server/udev"
[     7.179] (II) event0  - Power Button: is tagged by udev as: Keyboard
[     7.179] (II) event0  - Power Button: device is a keyboard
[     7.179] (II) event0  - Power Button: device removed
[     7.204] (**) Option "config_info" "udev:/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0/event0"
[     7.204] (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 7)
[     7.204] (**) Option "xkb_model" "pc105"
[     7.204] (**) Option "xkb_layout" "ca"
[     7.204] (**) Option "xkb_options" "terminate:ctrl_alt_bksp"
[     7.205] (II) event0  - Power Button: is tagged by udev as: Keyboard
[     7.205] (II) event0  - Power Button: device is a keyboard
[     7.206] (II) config/udev: Adding input device HDA NVidia HDMI/DP,pcm=3 (/dev/input/event11)
[     7.206] (II) No input driver specified, ignoring this device.
[     7.206] (II) This device may have been added with another device file.
[     7.206] (II) config/udev: Adding input device HDA NVidia HDMI/DP,pcm=7 (/dev/input/event12)
[     7.206] (II) No input driver specified, ignoring this device.
[     7.206] (II) This device may have been added with another device file.
[     7.207] (II) config/udev: Adding input device HDA NVidia HDMI/DP,pcm=8 (/dev/input/event13)
[     7.207] (II) No input driver specified, ignoring this device.
[     7.207] (II) This device may have been added with another device file.
[     7.207] (II) config/udev: Adding input device HD-Audio Generic Front Headphone (/dev/input/event10)
[     7.207] (II) No input driver specified, ignoring this device.
[     7.207] (II) This device may have been added with another device file.
[     7.208] (II) config/udev: Adding input device HD-Audio Generic Front Mic (/dev/input/event4)
[     7.208] (II) No input driver specified, ignoring this device.
[     7.208] (II) This device may have been added with another device file.
[     7.208] (II) config/udev: Adding input device HD-Audio Generic Rear Mic (/dev/input/event5)
[     7.208] (II) No input driver specified, ignoring this device.
[     7.208] (II) This device may have been added with another device file.
[     7.209] (II) config/udev: Adding input device HD-Audio Generic Line (/dev/input/event6)
[     7.209] (II) No input driver specified, ignoring this device.
[     7.209] (II) This device may have been added with another device file.
[     7.209] (II) config/udev: Adding input device HD-Audio Generic Line Out Front (/dev/input/event7)
[     7.209] (II) No input driver specified, ignoring this device.
[     7.209] (II) This device may have been added with another device file.
[     7.210] (II) config/udev: Adding input device HD-Audio Generic Line Out Surround (/dev/input/event8)
[     7.210] (II) No input driver specified, ignoring this device.
[     7.210] (II) This device may have been added with another device file.
[     7.210] (II) config/udev: Adding input device HD-Audio Generic Line Out CLFE (/dev/input/event9)
[     7.210] (II) No input driver specified, ignoring this device.
[     7.210] (II) This device may have been added with another device file.
[     7.211] (II) config/udev: Adding input device Eee PC WMI hotkeys (/dev/input/event3)
[     7.211] (**) Eee PC WMI hotkeys: Applying InputClass "evdev keyboard catchall"
[     7.211] (**) Eee PC WMI hotkeys: Applying InputClass "libinput keyboard catchall"
[     7.211] (**) Eee PC WMI hotkeys: Applying InputClass "system-keyboard"
[     7.211] (II) Using input driver 'libinput' for 'Eee PC WMI hotkeys'
[     7.211] (**) Eee PC WMI hotkeys: always reports core events
[     7.211] (**) Option "Device" "/dev/input/event3"
[     7.211] (**) Option "_source" "server/udev"
[     7.211] (II) event3  - Eee PC WMI hotkeys: is tagged by udev as: Keyboard
[     7.211] (II) event3  - Eee PC WMI hotkeys: device is a keyboard
[     7.211] (II) event3  - Eee PC WMI hotkeys: device removed
[     7.270] (**) Option "config_info" "udev:/sys/devices/platform/eeepc-wmi/input/input3/event3"
[     7.270] (II) XINPUT: Adding extended input device "Eee PC WMI hotkeys" (type: KEYBOARD, id 8)
[     7.270] (**) Option "xkb_model" "pc105"
[     7.270] (**) Option "xkb_layout" "ca"
[     7.270] (**) Option "xkb_options" "terminate:ctrl_alt_bksp"
[     7.271] (II) event3  - Eee PC WMI hotkeys: is tagged by udev as: Keyboard
[     7.271] (II) event3  - Eee PC WMI hotkeys: device is a keyboard
[     7.271] (II) config/udev: Adding input device PC Speaker (/dev/input/event2)
[     7.271] (II) No input driver specified, ignoring this device.
[     7.271] (II) This device may have been added with another device file.
[     7.281] (EE) Failed to open authorization file "/var/run/sddm/{441f4146-64c3-48ce-9a69-cd038534780e}": No such file or directory

@Trilby:
Indeed. I could try to use a ssh -X for instance. But some softwares don't work great with that - for instance Pycharm. I think it is because they want to incentive to buy a professional version.
So I run some sort of VNC.

Offline

#7 2019-10-07 06:00:10

seth
Member
Registered: 2012-09-03
Posts: 61,171

Re: Server out of reach

No, the interesting part is the entire file.
swrast_dri means that you're using software GL and most likely the vesa driver or that sth. else is terribly off about the server config - that's why I wanted to see the xorg log.

Offline

Board footer

Powered by FluxBB