You are not logged in.
Pages: 1
Hi,
After a while, my computer crashes whenever I don't use it for a while but the computer is still turned on, possibly doing computations. I am trying to understand why, but I have limited informations. So here is what I know:
- I can't access tty with shortcut
- the screen is black, like if there is no output power and the monitor doesn't detect anything
- the sysreq keys, which are enabled, are unresponsive
- journalctl doesn't contain much but a spam like this at the end:
déc. 09 10:17:14 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:14 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:14 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:24 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:24 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:24 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:24 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:34 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:34 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:34 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:34 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:44 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:44 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)
déc. 09 10:17:44 this-pc kscreenlocker_greet[154588]: Could not create AF_NETLINK socket (Opération non permise)(The end is French for Operation not allowed)
I don't have much else debug information.
The lspci is the following:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex
        Subsystem: ASUSTeK Computer Inc. Family 17h (Models 00h-0fh) Root Complex
        Flags: fast devsel
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit
        Subsystem: ASUSTeK Computer Inc. Family 17h (Models 00h-0fh) I/O Memory Management Unit
        Flags: bus master, fast devsel, latency 0, IRQ 25
        Capabilities: <access denied>
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
        Flags: fast devsel
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 26
        Bus: primary=00, secondary=01, subordinate=07, sec-latency=0
        I/O behind bridge: 0000e000-0000efff [size=4K]
        Memory behind bridge: f7500000-f76fffff [size=2M]
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
        DeviceName:  Onboard IGD
        Flags: fast devsel
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
        Flags: fast devsel
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 27
        Bus: primary=00, secondary=08, subordinate=08, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff [size=4K]
        Memory behind bridge: f6000000-f70fffff [size=17M]
        Prefetchable memory behind bridge: 00000000e0000000-00000000f1ffffff [size=288M]
        Capabilities: <access denied>
        Kernel driver in use: pcieport
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
        Flags: fast devsel
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
        Flags: fast devsel
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 28
        Bus: primary=00, secondary=09, subordinate=09, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: f7200000-f74fffff [size=3M]
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
        Flags: fast devsel
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 30
        Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: f7700000-f77fffff [size=1M]
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
        Subsystem: ASUSTeK Computer Inc. FCH SMBus Controller
        Flags: 66MHz, medium devsel
        Kernel driver in use: piix4_smbus
        Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
        Subsystem: ASUSTeK Computer Inc. FCH LPC Bridge
        Flags: bus master, 66MHz, medium devsel, latency 0
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
        Flags: fast devsel
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
        Flags: fast devsel
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
        Flags: fast devsel
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
        Flags: fast devsel
        Kernel driver in use: k10temp
        Kernel modules: k10temp
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
        Flags: fast devsel
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
        Flags: fast devsel
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
        Flags: fast devsel
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
        Flags: fast devsel
01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller (rev 01) (prog-if 30 [XHCI])
        Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller
        Flags: bus master, fast devsel, latency 0, IRQ 58
        Memory at f76a0000 (64-bit, non-prefetchable) [size=32K]
        Capabilities: <access denied>
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller (rev 01) (prog-if 01 [AHCI 1.0])
        Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller
        Flags: bus master, fast devsel, latency 0, IRQ 40
        Memory at f7680000 (32-bit, non-prefetchable) [size=128K]
        Expansion ROM at f7600000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: ahci
        Kernel modules: ahci
01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 32
        Bus: primary=01, secondary=02, subordinate=07, sec-latency=0
        I/O behind bridge: 0000e000-0000efff [size=4K]
        Memory behind bridge: f7500000-f75fffff [size=1M]
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 33
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: 0000e000-0000efff [size=4K]
        Memory behind bridge: f7500000-f75fffff [size=1M]
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
02:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 35
        Bus: primary=02, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 36
        Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
02:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 37
        Bus: primary=02, secondary=06, subordinate=06, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
02:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 39
        Bus: primary=02, secondary=07, subordinate=07, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
        Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
        Flags: bus master, fast devsel, latency 0, IRQ 29
        Memory at f7500000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at e000 [size=32]
        Memory at f7520000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: igb
        Kernel modules: igb
08:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: eVga.com. Corp. GP107 [GeForce GTX 1050 Ti]
        Flags: bus master, fast devsel, latency 0, IRQ 73
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at d000 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia
08:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
        Subsystem: eVga.com. Corp. GP107GL High Definition Audio Controller
        Flags: bus master, fast devsel, latency 0, IRQ 66
        Memory at f7080000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel
09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
        Subsystem: ASUSTeK Computer Inc. Zeppelin/Raven/Raven2 PCIe Dummy Function
        Flags: bus master, fast devsel, latency 0
        Capabilities: <access denied>
09:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
        Subsystem: ASUSTeK Computer Inc. Family 17h (Models 00h-0fh) Platform Security Processor
        Flags: bus master, fast devsel, latency 0, IRQ 70
        Memory at f7300000 (32-bit, non-prefetchable) [size=1M]
        Memory at f7400000 (32-bit, non-prefetchable) [size=8K]
        Capabilities: <access denied>
        Kernel driver in use: ccp
        Kernel modules: ccp
09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller (prog-if 30 [XHCI])
        Subsystem: ASUSTeK Computer Inc. Family 17h (Models 00h-0fh) USB 3.0 Host Controller
        Flags: bus master, fast devsel, latency 0, IRQ 60
        Memory at f7200000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: <access denied>
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function
        Subsystem: ASUSTeK Computer Inc. Zeppelin/Renoir PCIe Dummy Function
        Flags: bus master, fast devsel, latency 0
        Capabilities: <access denied>
0a:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) (prog-if 01 [AHCI 1.0])
        Subsystem: ASUSTeK Computer Inc. FCH SATA Controller [AHCI mode]
        Flags: bus master, fast devsel, latency 0, IRQ 42
        Memory at f7708000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: ahci
        Kernel modules: ahci
0a:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller
        Subsystem: ASUSTeK Computer Inc. Family 17h (Models 00h-0fh) HD Audio Controller
        Flags: bus master, fast devsel, latency 0, IRQ 68
        Memory at f7700000 (32-bit, non-prefetchable) [size=32K]
        Capabilities: <access denied>
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intelDo you have any idea to dig further? Or any solution?
Cheers!
Offline

Setup and journal tail would suggest that this is caused by the KDE screensaver, so try to disable that and wait whether the problem still raises.
It might crash the nvidia driver, leaking into the kernel - so try te LTS kernel and/or the 390xx driver series.
Online
Hi Seth,
Hm, that is indeed an interesting possibility! So there are no protections from the proprietary drivers into the kernel? Because I usually trust more nouveau than Nvidia to make a good product...
Based on the symptoms, that would be consistent as it usually takes a couple of days. And the CPU fan is still running when that happens. I just disabled the screensaver just in case (actually, it is just the lockscreen kicking on). I didn't know how it could have further problems for a crash. How one would provide a dump/stacktrace or something to help the community once that is confirmed?
Cheers!
Offline

The userspace communicates w/ the kernel all the time. The nvidia driver (for X11) comes in two parts, a "userspace" driver for X11 and a kernel module. They talk to each other in a black box and the black box aside, this is what all graphics driver do.
Individual kernel modules can crash (what's certainly bad w/ the stuff controlling the VGA device) and occasionally the crash cannot be isolated (at this point you get a kernel panic) - this is not related to the legal status of the module (ie. a proprietary module can cause all of this just as much as an open source or even in-tree one, nouveau has a record of halting the kernel on PCI queries, https://wiki.archlinux.org/index.php/No … r_messages )
Debugging a halted kernel is a bit nasty, https://wiki.archlinux.org/index.php/Kdump
Online
Well, I think I would have a lot to learn on how the kernel interacts with the drivers...
Anyway, I tried removing the lockscreen, and the messages disappears from the logs. Bug the halt remains there.... :'-(
There are nothing relevant gathered in the logs. The last messages I have are related to kdeconnect discovering my cellphone, but that's it.
Where could I get anything relevant to help me debug that? Aside of guess work by replacing drivers or else?
About Kdump, I see infos about a crashed kernel (I guess when one gets a kernel panic), but what about an halted kernel, where the only command remaining is the acpi power button to turn off violently the computer?
I even tried ssh to the computer, but it is not accessible anymore...
Thanks!
Offline

Kdump should be suited to deal with that exact situation (regardless of the terminology)
Random things you could try (maybe quicker) is to
- boot w/ "iommu=soft" kernel parameter
- replace
  * the kernel (w/ the LTS one)
  * the GPU driver (w/ the 390xx one)
  * the GPU (if you have another one) and
- in case there're multiple RAM modules, removing all but one and sitching them out (and/or run memtest86 over night)
Online
Good to know. But question about kdump:
As far as I understand it, it catches a kernel exception and then do a dump on disk before displaying the kernel panic messages. But in case it hangs (I guess, it could be an infinite loop or something), there are not necessarily exceptions? So does it do a regular dump up until it gets stuck in the infinite loop? I would need to document myself more on kdump....
Since the problem occurs every couple of days or so, it will take time to test the options.
Memtest86+ would be a good thing to test, indeed. So you think it could be a RAM issue? It would make sense that the computer crashes when it goes to a corrupted place of the RAM.
I would be surprised if the LTS would solve the problem since I had this problem for at least two months.
And I sadly don't have any spare GPU to test this venue...
Offline
I've switched to LTS kernel and I'm still getting this error. There are thousands of such lines in journal:
lut 10 14:20:57 mayday kscreenlocker_greet[47148]: Could not create AF_NETLINK socket (Operation not permitted)
lut 10 14:20:57 mayday kscreenlocker_greet[47148]: Could not create AF_NETLINK socket (Operation not permitted)
lut 10 14:20:57 mayday kscreenlocker_greet[47148]: Could not create AF_NETLINK socket (Operation not permitted)
lut 10 14:20:57 mayday kscreenlocker_greet[47148]: Could not create AF_NETLINK socket (Operation not permitted)➜  ~ uname -a
Linux mayday 4.19.101-1-lts #1 SMP Sat, 01 Feb 2020 16:35:36 +0000 x86_64 GNU/LinuxI'm using integrated GPU
➜  ~ lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
00:1b.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 (rev f1)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 (rev f1)
00:1c.2 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 (rev f1)
00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 (rev f1)
00:1c.6 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #7 (rev f1)
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Z170 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
02:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
03:00.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:01.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:02.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:03.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:04.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:05.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:06.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:07.0 PCI bridge: ASMedia Technology Inc. Device 1187
06:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4360 802.11ac Wireless Network Adapter (rev 03)
07:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
0a:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
0c:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
0d:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host ControllerLast edited by jakub (2020-02-10 17:30:04)
Offline
Pages: 1