You are not logged in.
I have a nvidia card with nvidia drivers on a desktop pc. Intel IGP is disabled. With 5.x kernel every time I reboot/shutdown as soon as X is being killed the PC hangs forever (often fails to umount volumes too). I have a minimal openbox, I tried with xinit instead of a dm and it does the same. I also tried with sddm and KDE, same result.
The relevant error message seems to be:
May 17 20:53:40 aragorn systemd[1]: lightdm.service: Killing process 512 (Xorg) with signal SIGKILL.
May 17 20:54:23 aragorn kernel: INFO: task Xorg:512 blocked for more than 122 seconds.
May 17 20:54:23 aragorn kernel: Tainted: P OE 5.1.2-arch1-1-ARCH #1
May 17 20:54:23 aragorn kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
I found a few similar issues here on the forum, but none of the suggestions worked for me (trying to change DM or logging out to tty before rebooting or clean reinstalling nvidia drivers).
With LTS packages it usually works but randomically I keep getting this issue.
I tried with and without KMS.
To make the installer work I had to set nouveau.modeset=0 if I disable CMS from the motherboard bios (I don't know if that means anything). If I keep bios legacy compatibility nouveau modeset works, but I still get the error on reboot with nvidia drivers.
Any suggestion? Thanks!
This is the Journalctl as soon as the reboot process begins:
May 17 20:53:40 aragorn systemd[1]: lightdm.service: Killing process 512 (Xorg) with signal SIGKILL.
May 17 20:54:23 aragorn kernel: INFO: task Xorg:512 blocked for more than 122 seconds.
May 17 20:54:23 aragorn kernel: Tainted: P OE 5.1.2-arch1-1-ARCH #1
May 17 20:54:23 aragorn kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 20:54:23 aragorn kernel: Call Trace:
May 17 20:54:23 aragorn kernel: ? __schedule+0x30b/0x8b0
May 17 20:54:23 aragorn kernel: schedule+0x32/0x80
May 17 20:54:23 aragorn kernel: schedule_timeout+0x311/0x4a0
May 17 20:54:23 aragorn kernel: ? acpi_os_release_object+0xa/0x10
May 17 20:54:23 aragorn kernel: ? preempt_count_add+0x79/0xb0
May 17 20:54:23 aragorn kernel: ? preempt_count_add+0x79/0xb0
May 17 20:54:23 aragorn kernel: ? acpi_ut_trace_ptr+0x26/0x68
May 17 20:54:23 aragorn kernel: wait_for_common+0x15f/0x190
May 17 20:54:23 aragorn kernel: ? wake_up_q+0x70/0x70
May 17 20:54:23 aragorn kernel: flush_workqueue+0x128/0x3f0
May 17 20:54:23 aragorn kernel: ? nv_uninstall_notifier+0x60/0x60 [nvidia]
May 17 20:54:23 aragorn kernel: acpi_remove_notify_handler+0x1fb/0x2e6
May 17 20:54:23 aragorn kernel: nv_acpi_remove_one_arg+0xfb/0x140 [nvidia]
May 17 20:54:23 aragorn kernel: acpi_device_remove+0x5d/0xb0
May 17 20:54:23 aragorn kernel: device_release_driver_internal+0xe4/0x1d0
May 17 20:54:23 aragorn kernel: driver_detach+0x40/0x78
May 17 20:54:23 aragorn kernel: bus_remove_driver+0x74/0xc6
May 17 20:54:23 aragorn kernel: nv_acpi_uninit+0xa4/0xf0 [nvidia]
May 17 20:54:23 aragorn kernel: nvidia_close+0x281/0x2d0 [nvidia]
May 17 20:54:23 aragorn kernel: nvidia_frontend_close+0x2a/0x40 [nvidia]
May 17 20:54:23 aragorn kernel: __fput+0xa5/0x1d0
May 17 20:54:23 aragorn kernel: task_work_run+0x8f/0xb0
May 17 20:54:23 aragorn kernel: exit_to_usermode_loop+0xd3/0xe0
May 17 20:54:23 aragorn kernel: do_syscall_64+0x157/0x180
May 17 20:54:23 aragorn kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 17 20:54:23 aragorn kernel: RIP: 0033:0x7f089994d348
May 17 20:54:23 aragorn kernel: Code: Bad RIP value.
May 17 20:54:23 aragorn kernel: RSP: 002b:00007ffe242385a8 EFLAGS: 00003246 ORIG_RAX: 0000000000000003
May 17 20:54:23 aragorn kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f089994d348
May 17 20:54:23 aragorn kernel: RDX: 00007ffe24238500 RSI: 0000000000000000 RDI: 000000000000000d
May 17 20:54:23 aragorn kernel: RBP: 00007f0897694ce4 R08: 00007ffe242385b0 R09: 00007ffe242385bc
May 17 20:54:23 aragorn kernel: R10: fffffffffffff9b8 R11: 0000000000003246 R12: 0000557e00e536b0
May 17 20:54:23 aragorn kernel: R13: 00007f0897694ce8 R14: 00000000c1d00008 R15: 00000000c1d00008
May 17 20:55:11 aragorn systemd[1]: lightdm.service: Processes still around after final SIGKILL. Entering failed mode.
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=lightdm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
May 17 20:55:11 aragorn systemd[1]: lightdm.service: Failed with result 'timeout'.
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.108:54): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=lightdm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-user-sessions comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.114:55): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-user-sessions comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-update-done comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=ldconfig comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journal-catalog-update comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.138:56): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-update-done comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.138:57): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=ldconfig comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.138:58): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journal-catalog-update comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn systemd[1]: mnt-gandalf-matteo.mount: Mount process exited, code=exited, status=1/FAILURE
May 17 20:55:11 aragorn systemd[1]: Failed unmounting /mnt/gandalf/matteo.
May 17 20:55:11 aragorn audit[17033]: SYSTEM_SHUTDOWN pid=17033 uid=0 auid=4294967295 ses=4294967295 msg=' comm="systemd-update-utmp" exe="/usr/lib/systemd/systemd-update-utmp" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1128 audit(1558119311.141:59): pid=17033 uid=0 auid=4294967295 ses=4294967295 msg=' comm="systemd-update-utmp" exe="/usr/lib/systemd/systemd-update-utmp" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn systemd[1]: mnt-gandalf-matteo.mount: Failed with result 'exit-code'.
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-timesyncd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.201:60): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-timesyncd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-random-seed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-networkd-wait-online comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.204:61): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-random-seed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.204:62): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-networkd-wait-online comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-update-utmp comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn kernel: audit: type=1131 audit(1558119311.208:63): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-update-utmp comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn systemd[1]: tmp.mount: Mount process exited, code=exited, status=32/n/a
May 17 20:55:11 aragorn systemd[1]: Failed unmounting Temporary Directory (/tmp).
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-8515\x2d61C7 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-networkd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-sysctl comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-a48354a4\x2db90b\x2d409b\x2d8285\x2dda50071f8c7b comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-sysusers comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-remount-fs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=lvm2-monitor comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=lvm2-lvmetad comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-reboot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-reboot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 20:55:11 aragorn systemd[1]: Shutting down.
May 17 20:55:11 aragorn kernel: printk: systemd-shutdow: 51 output lines suppressed due to ratelimiting
May 17 20:55:11 aragorn systemd-journald[332]: Journal stopped
Last edited by Matt3o (2019-07-29 07:13:24)
Offline
xorg log?
You're using the nvidia blob, nouveau settings are irrelevant.
Online
xorg log https://paste.debian.net/1082016/
PS I'm not trying to configure nouveau drivers, I'm just finding weird that modeset doesn't work with them
Last edited by Matt3o (2019-05-18 13:04:21)
Offline
[ 11.879] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[ 11.879] (II) NVIDIA(0): may not be running or the "AcpidSocketPath" X
[ 11.879] (II) NVIDIA(0): configuration option may not be set correctly. When the
[ 11.879] (II) NVIDIA(0): ACPI event daemon is available, the NVIDIA X driver will
[ 11.879] (II) NVIDIA(0): try to use it to receive ACPI event notifications. For
[ 11.879] (II) NVIDIA(0): details, please see the "ConnectToAcpid" and
[ 11.879] (II) NVIDIA(0): "AcpidSocketPath" X configuration options in Appendix B: X
[ 11.879] (II) NVIDIA(0): Config Options in the README.
=> /usr/share/doc/nvidia/README
Online
I tried to disable ConnectToAcpid with no difference (except I don't get the warning about acpi service not found). You think I should try enabling the acpid?
Offline
You can try whether that mitigates the problem.
Otherwise I'd try downgrading the nvidia driver (assuming the GPU was already supported in 418.xx), there's https://bbs.archlinux.org/viewtopic.php?id=246482
Online
I had the same issue with 418.xx I was actually hoping 430 solved my problem... It's weird because apart from the issue on reboot/shutdown the pc seems to be working properly... but all those fails on unmounting volumes worries me.
Any other idea?
Offline
Does unmounting also fail in the startx case?
Speaking of which: can you end the session and terminate the server w/o powering down (and then power down from the console)? In that case you can maybe mitigate the problem w/ an automated but explicit bi-staged shutdown.
Online
On the next boot are any filesystems detected as dirty?
Offline
Does unmounting also fail in the startx case?
Speaking of which: can you end the session and terminate the server w/o powering down (and then power down from the console)? In that case you can maybe mitigate the problem w/ an automated but explicit bi-staged shutdown.
yes, it's exactly the same with startx. As soon as I try to terminate the server it locks down, no matter the method I use. It's very frustrating, I can't really find the source of the problem.
On the next boot are any filesystems detected as dirty?
yes on next boot it recovers the journal.
May 10 20:39:11 aragorn systemd-fsck[465]: /dev/nvme0n1p4: recovering journal
May 10 20:39:11 aragorn systemd-fsck[464]: fsck.fat 4.1 (2017-01-24)
May 10 20:39:11 aragorn systemd-fsck[464]: 0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
May 10 20:39:11 aragorn systemd-fsck[464]: Automatically removing dirty bit.
May 10 20:39:11 aragorn systemd-fsck[464]: Performing changes.
May 10 20:39:11 aragorn systemd-fsck[464]: /dev/nvme0n1p1: 351 files, 14811/140520 clusters
Offline
Online
you gave me hope... but still no luck...
Offline
still fighting with this issue. it's the fist time in almost 10 years that I reach such a wall with a new arch install (on a new PC).
I tried every possible combination of uefibios config. Specifically bios compatibility mode (which seems to affect modesetting somehow) and various ACPI compatibility layers.
Tried with and without KMS. Tried to disable ACPI altogether (from kernal parms). Tried with lightdm and with xintrc.
I tried to disconnect mouse/keyboard and everything to see if it could be a USB issue.
I feel it's the combination of my motherboard+cpu+gpu but I don't really know what else to tinker with.
Any suggestion would greatly appreciated.
PS: I have another PC that has a very similar configuration but it's with a 7th gen CPU and a gtx1060 and everything works flawlessly, so I tend to exclude some very stupid mistake on my part... but anything is possible
Offline
Did you try to switch the GPUs among the systems to see whether the error moves w/ it?
Online
Did you try to switch the GPUs among the systems to see whether the error moves w/ it?
that would be a very interesting test to do. it's easier said than done (they are on very very small pc cases), but I'll try. thanks again for your help
Offline
sorry for the bump, but trying to sort out the issue I also discovered that the Xorg service is constantly eating cpu cycles. Nothing dramatic, but it's constantly at around 10% bringing the load average to a steady 1.0 even in long idle. The other PC I have goes at 0.05 in long-idle (monitor off). I don't know if that could be related to the same issue...
Offline
Matt3o , are you using lvm + dm-crypt on a samsung SSD ?
If so, you may have encountered this kernel data corruption bug that was solved with kernel 5.1.5 .
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
thanks for the heads-up, I have a samsung SSD but I don't use lvm or dm-crypt
Last edited by Matt3o (2019-05-28 15:15:01)
Offline
sorry for the necro-post but I wanted to give closure to this issue in case others have the same problem.
few days ago finally gigabyte updated the bios of my board, so if you have a Z390 I AORUS PRO WIFI, upgrade to at least bios version F6 and everything will work again. It has been a painful couple of months... thanks gigabyte!
Offline
This problem was driving me nuts for a few days. Updated BIOS on Designare Z390 from F6 to F9b and the problem is gone. Magic. Does anyone have any guess how that can be related? I didn't find anything in BIOS changelog that could be even remotely related.
This topic is the only place I found with the mention of BIOS update (I suppose it's not very popular). I'm really curious about how to troubleshoot problems like this.
Offline