You are not logged in.

#1 2024-04-01 17:46:09

fredizzimo
Member
Registered: 2023-05-18
Posts: 7

All terminals including TTYs stop working with kernel 6.8.2-arch2-1

I started to notice frequent hangs when running the unt tests of Neovim. The tests just stop progressing, it might take a couple of complete runs, but it happens frequently. At the same time all terminals, both old and new stops working, while the rest of the system appears to be fine. Even switching to another TTY, for example "ctrl-alt-F3" only shows a blank screen.

And this is shown in the journal log after the reboot when I'm able to check it.

apr 01 20:15:04 ovrearch kernel: INFO: task nvim:35119 blocked for more than 122 seconds.
apr 01 20:15:04 ovrearch kernel:       Tainted: P           OE      6.8.2-arch2-1 #1
apr 01 20:15:04 ovrearch kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
apr 01 20:15:04 ovrearch kernel: task:nvim            state:D stack:0     pid:35119 tgid:35119 ppid:22760  flags:0x00000002
apr 01 20:15:04 ovrearch kernel: Call Trace:
apr 01 20:15:04 ovrearch kernel:  <TASK>
apr 01 20:15:04 ovrearch kernel:  __schedule+0x3e6/0x1520
apr 01 20:15:04 ovrearch kernel:  ? __memcg_slab_post_alloc_hook+0x167/0x200
apr 01 20:15:04 ovrearch kernel:  schedule+0x32/0xd0
apr 01 20:15:04 ovrearch kernel:  schedule_timeout+0x151/0x160
apr 01 20:15:04 ovrearch kernel:  ldsem_down_write+0x136/0x25d
apr 01 20:15:04 ovrearch kernel:  tty_ldisc_lock+0x4f/0x70
apr 01 20:15:04 ovrearch kernel:  tty_ldisc_hangup+0xd9/0x230
apr 01 20:15:04 ovrearch kernel:  __tty_hangup.part.0+0x1f3/0x370
apr 01 20:15:04 ovrearch kernel:  tty_release+0xf1/0x610
apr 01 20:15:04 ovrearch kernel:  __fput+0x92/0x2c0
apr 01 20:15:04 ovrearch kernel:  __x64_sys_close+0x3d/0x80
apr 01 20:15:04 ovrearch kernel:  do_syscall_64+0x89/0x170
apr 01 20:15:04 ovrearch kernel:  ? syscall_exit_to_user_mode+0x80/0x230
apr 01 20:15:04 ovrearch kernel:  ? do_syscall_64+0x96/0x170
apr 01 20:15:04 ovrearch kernel:  ? exc_page_fault+0x7f/0x180
apr 01 20:15:04 ovrearch kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
apr 01 20:15:04 ovrearch kernel: RIP: 0033:0x7200bb469bb4
apr 01 20:15:04 ovrearch kernel: RSP: 002b:00007ffc62f86a68 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
apr 01 20:15:04 ovrearch kernel: RAX: ffffffffffffffda RBX: 000059b50d7d0e30 RCX: 00007200bb469bb4
apr 01 20:15:04 ovrearch kernel: RDX: 000059b50d7d0ea8 RSI: 000000000000c409 RDI: 0000000000000010
apr 01 20:15:04 ovrearch kernel: RBP: 0000000000000004 R08: 000059b50d7d0ea8 R09: 000059b50d7d1048
apr 01 20:15:04 ovrearch kernel: R10: 00007200bb3877d0 R11: 0000000000000202 R12: 00007ffc62f86a70
apr 01 20:15:04 ovrearch kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
apr 01 20:15:04 ovrearch kernel:  </TASK>

apr 01 20:15:04 ovrearch kernel: INFO: task tty-test:35121 blocked for more than 122 seconds.
apr 01 20:15:04 ovrearch kernel:       Tainted: P           OE      6.8.2-arch2-1 #1
apr 01 20:15:04 ovrearch kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
apr 01 20:15:04 ovrearch kernel: task:tty-test        state:D stack:0     pid:35121 tgid:35121 ppid:35119  flags:0x00020006
apr 01 20:15:04 ovrearch kernel: Call Trace:
apr 01 20:15:04 ovrearch kernel:  <TASK>
apr 01 20:15:04 ovrearch kernel:  __schedule+0x3e6/0x1520
apr 01 20:15:04 ovrearch kernel:  ? unmap_page_range+0x108c/0x1110
apr 01 20:15:04 ovrearch kernel:  schedule+0x32/0xd0
apr 01 20:15:04 ovrearch kernel:  schedule_timeout+0x151/0x160
apr 01 20:15:04 ovrearch kernel:  wait_for_completion+0x86/0x170
apr 01 20:15:04 ovrearch kernel:  __flush_work.isra.0+0x173/0x280
apr 01 20:15:04 ovrearch kernel:  ? __pfx_wq_barrier_func+0x10/0x10
apr 01 20:15:04 ovrearch kernel:  n_tty_poll+0x134/0x1e0
apr 01 20:15:04 ovrearch kernel:  tty_poll+0x5a/0xc0
apr 01 20:15:04 ovrearch kernel:  ep_item_poll.isra.0+0x30/0x50
apr 01 20:15:04 ovrearch kernel:  do_epoll_wait+0x34f/0x830
apr 01 20:15:04 ovrearch kernel:  do_compat_epoll_pwait.part.0+0xb/0x70
apr 01 20:15:04 ovrearch kernel:  __x64_sys_epoll_pwait+0x95/0x140
apr 01 20:15:04 ovrearch kernel:  do_syscall_64+0x89/0x170
apr 01 20:15:04 ovrearch kernel:  ? __x64_sys_close+0x3d/0x80
apr 01 20:15:04 ovrearch kernel:  ? kmem_cache_free+0x3b7/0x3e0
apr 01 20:15:04 ovrearch kernel:  ? syscall_exit_to_user_mode+0x80/0x230
apr 01 20:15:04 ovrearch kernel:  ? do_syscall_64+0x96/0x170
apr 01 20:15:04 ovrearch kernel:  ? do_syscall_64+0x96/0x170
apr 01 20:15:04 ovrearch kernel:  ? do_syscall_64+0x96/0x170
apr 01 20:15:04 ovrearch kernel:  ? do_syscall_64+0x96/0x170
apr 01 20:15:04 ovrearch kernel:  ? do_syscall_64+0x96/0x170
apr 01 20:15:04 ovrearch kernel:  ? exc_page_fault+0x7f/0x180
apr 01 20:15:04 ovrearch kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
apr 01 20:15:04 ovrearch kernel: RIP: 0033:0x7edb7e73bc8d
apr 01 20:15:04 ovrearch kernel: RSP: 002b:00007ffe61daaa68 EFLAGS: 00000202 ORIG_RAX: 0000000000000119
apr 01 20:15:04 ovrearch kernel: RAX: ffffffffffffffda RBX: 00005e765f9ff2a0 RCX: 00007edb7e73bc8d
apr 01 20:15:04 ovrearch kernel: RDX: 0000000000000400 RSI: 00007ffe61dab7c0 RDI: 0000000000000003
apr 01 20:15:04 ovrearch kernel: RBP: 00005e765e343a18 R08: 0000000000000000 R09: 0000000000000008
apr 01 20:15:04 ovrearch kernel: R10: 00000000ffffffff R11: 0000000000000202 R12: 00005e765e3439c0
apr 01 20:15:04 ovrearch kernel: R13: 0000000000000003 R14: 00000000ffffffff R15: 00005e765e3439c0
apr 01 20:15:04 ovrearch kernel:  </TASK>

Operating System: Arch Linux
KDE Plasma Version: 6.0.3
KDE Frameworks Version: 6.0.0
Qt Version: 6.6.3
Kernel Version: 6.8.2-arch2-1 (64-bit)
Graphics Platform: Wayland
Processors: 12 × Intel® Xeon® CPU E5-1650 0 @ 3.20GHz
Memory: 15,5 GiB of RAM
Graphics Processor: NVIDIA GeForce GTX 970/PCIe/SSE2
Manufacturer: Hewlett-Packard
Product Name: HP Z420 Workstation

And the version of Neovim I was testing was the latest b25753381c6049132c5c8d02eb62df99f8a958fd

Sorry for the vague information. Since no terminal is working, I don't know how to debug it at this point anymore. I have also not had time to test different kernel versions if it occurs with them as well. And unfortunately, I will have to leave from here and won't have access to this computer for a few weeks, so if it's something hardware specific it might be hard to track down.

I also don't remember which version of the kernel I used when the last time it was working on this machine, and not the version of Neovim either.

Last edited by fredizzimo (2024-04-01 17:57:35)

Offline

#2 2024-04-01 18:14:19

fredizzimo
Member
Registered: 2023-05-18
Posts: 7

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

It seems like just suspending Neovim 0.9.5 with a file open, using "<C-z>" is enough to trigger this:

https://github.com/neovim/neovim/issues/28149

Offline

#3 2024-04-01 20:55:38

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

Have you  confirmed the ctrl+z situation?
It typically stops and backgrounds a process, no idea how it might manage to stop ALL ttys (and pts?), though.
I guess it manages to stop all processes down to PID1 (systemd/init) what should™ require UID0 privileges (are you doing this as root? Otherwise some suid process might be involved)

Offline

#4 2024-04-01 21:11:57

fredizzimo
Member
Registered: 2023-05-18
Posts: 7

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

Yes, I was able to repeat the "ctrl-z" issue as well, but I rebooted a bit too fast, so I did not get a journal log entry. But the symptoms were exactly the same.

And no, there's no root account involved. Based on the callstacks it looks like it could be a deadlock in the kernel(ldsem_down_write), but that might be a wrong lead, I know nothing about the kernel code.

I will check if I can repeat it here on my other system as well. If I can, I will probably setup a virtual machine, and see if I can narrow it down further.

Offline

#5 2024-04-01 21:23:05

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

This has been a problem in the past (CVE-2015-4170) - to see whether this is a kernel or userspace bug you could test the behavior on the LTS kernel.

Offline

#6 2024-04-01 22:02:57

ichernev
Member
Registered: 2022-09-23
Posts: 3

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

I ran with older (lts -- 6.6.23-1-lts) kernel, and I can report that nvim on TTY doesn't cause the issue, but nvim inside river (wlroots WM) does cause the same issue.

As reported in the nvim issue, not only are all terminals (and TTYs) unusable, but new ones as well, and also logind (or whatever the thing that starts up new TTYs at Ctrl+Alt+FN), doesn't even bother to show the login dialog (but if focused beforehand the login dialog will stay there but be unusable).

EDIT: dmesg --follow doesn't show anything. Any suggestions on logs to tail?

Last edited by ichernev (2024-04-01 22:06:21)

Offline

#7 2024-04-01 22:06:09

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

Are running PTS (aka "ssh") affected?

Offline

#8 2024-04-01 22:09:38

ichernev
Member
Registered: 2022-09-23
Posts: 3

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

>  Are running PTS (aka "ssh") affected?

Yes. So I logged in before triggering - works, after triggering -- stuck. Also I can't ssh again (no response from ssh, with ssh -v the last line is `debug1: pledge: fork`)

Offline

#9 2024-04-02 06:21:34

bostjan
Member
Registered: 2024-04-02
Posts: 3

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

I was experiencing the same behavior until recently. After pressing Ctrl-Z within neovim, all terminals stopped working. I figured out that if I manually send a signal (SIGCONT or SIGKILL) to the neovim process using kde's System Monitor app, everything gets back to normal again.

P.S. After system update I can't reproduce the issue any more. It doesn't mean that the issue is gone, it could be just a race condition that is not being triggered every time.

Last edited by bostjan (2024-04-02 06:22:05)

Offline

#10 2024-04-02 07:20:19

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

What packages got updated?
Pretty much the only way this makes sense is if the stopped nvim process is holding a global IO lock in the kernel (so basically the CVE returned, probably not exactly the same way)

Offline

#11 2024-04-02 07:48:04

fredizzimo
Member
Registered: 2023-05-18
Posts: 7

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

Sorry, I did not get around to do more testing yet.

But in my case, when I initially encountered it through the unit tests, the nvim process could not even be killed with SIGKILL, it was totally stuck. Trying to reboot the system also got stuck, I had to do a hardware reset.

Offline

#12 2024-04-02 10:01:55

pajlada
Member
Registered: 2024-04-02
Posts: 2

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

This seems to be an issue with io_uring which is being patched - see more information here https://github.com/axboe/liburing/issues/1113

Offline

#13 2024-04-05 17:57:22

loqs
Member
Registered: 2014-03-06
Posts: 17,440

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

Is this resolved by the work queue reverts that caused 6.8.4?

Last edited by loqs (2024-04-05 18:14:13)

Offline

#14 2024-04-07 10:30:42

pajlada
Member
Registered: 2024-04-02
Posts: 2

Re: All terminals including TTYs stop working with kernel 6.8.2-arch2-1

It seems resolved for me with Linux 6.8.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 05 Apr 2024 00:14:23 +0000 x86_64 GNU/Linux

Offline

Board footer

Powered by FluxBB