You are not logged in.
My NFS share has become unusable to the point where accessing it from another machine renders both the server and client unable to open a new terminal. To trigger this, all I have to do is create a zero-byte file on the share.
From the client:
mount 10.9.8.101:/scratch /scratch
cd /scratch
touch foo
At this point, the terminal is frozen on the client. If I open a new xterm on the server, a blank terminal is presented but it is frozen with no prompt. Existing terms I have open are frozen as well.
I see this on the server in journalctl -f:
Nov 20 09:16:39 quadruple kernel: INFO: task nfsd:2802 blocked for more than 122 seconds.
Nov 20 09:16:39 quadruple kernel: Not tainted 6.11.9-arch1-1 #1
Nov 20 09:16:39 quadruple kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 09:16:39 quadruple kernel: task:nfsd state:D stack:0 pid:2802 tgid:2802 ppid:2 flags:0x00004000
Nov 20 09:16:39 quadruple kernel: Call Trace:
Nov 20 09:16:39 quadruple kernel: <TASK>
Nov 20 09:16:39 quadruple kernel: __schedule+0x408/0x1440
Nov 20 09:16:39 quadruple kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 20 09:16:39 quadruple kernel: schedule+0x27/0xf0
Nov 20 09:16:39 quadruple kernel: schedule_preempt_disabled+0x15/0x30
Nov 20 09:16:39 quadruple kernel: rwsem_down_read_slowpath+0x26f/0x4e0
Nov 20 09:16:39 quadruple kernel: down_read+0x48/0xa0
Nov 20 09:16:39 quadruple kernel: shmem_getattr+0x7b/0xe0
Nov 20 09:16:39 quadruple kernel: fh_fill_pre_attrs+0x116/0x180 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel: nfsd4_open+0x9a0/0xc10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel: nfsd4_proc_compound+0x39f/0x700 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel: nfsd_dispatch+0xd2/0x220 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel: svc_process_common+0x4d5/0x6a0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:16:39 quadruple kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel: svc_process+0x131/0x180 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:16:39 quadruple kernel: svc_recv+0x7f4/0x9b0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:16:39 quadruple kernel: ? __pfx_nfsd+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel: nfsd+0x87/0xd0 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel: kthread+0xd2/0x100
Nov 20 09:16:39 quadruple kernel: ? __pfx_kthread+0x10/0x10
Nov 20 09:16:39 quadruple kernel: ret_from_fork+0x34/0x50
Nov 20 09:16:39 quadruple kernel: ? __pfx_kthread+0x10/0x10
Nov 20 09:16:39 quadruple kernel: ret_from_fork_asm+0x1a/0x30
Nov 20 09:16:39 quadruple kernel: </TASK>
Nov 20 09:18:42 quadruple kernel: INFO: task nfsd:2802 blocked for more than 245 seconds.
Nov 20 09:18:42 quadruple kernel: Not tainted 6.11.9-arch1-1 #1
Nov 20 09:18:42 quadruple kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 09:18:42 quadruple kernel: task:nfsd state:D stack:0 pid:2802 tgid:2802 ppid:2 flags:0x00004000
Nov 20 09:18:42 quadruple kernel: Call Trace:
Nov 20 09:18:42 quadruple kernel: <TASK>
Nov 20 09:18:42 quadruple kernel: __schedule+0x408/0x1440
Nov 20 09:18:42 quadruple kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 20 09:18:42 quadruple kernel: schedule+0x27/0xf0
Nov 20 09:18:42 quadruple kernel: schedule_preempt_disabled+0x15/0x30
Nov 20 09:18:42 quadruple kernel: rwsem_down_read_slowpath+0x26f/0x4e0
Nov 20 09:18:42 quadruple kernel: down_read+0x48/0xa0
Nov 20 09:18:42 quadruple kernel: shmem_getattr+0x7b/0xe0
Nov 20 09:18:42 quadruple kernel: fh_fill_pre_attrs+0x116/0x180 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel: nfsd4_open+0x9a0/0xc10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel: nfsd4_proc_compound+0x39f/0x700 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel: nfsd_dispatch+0xd2/0x220 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel: svc_process_common+0x4d5/0x6a0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:18:42 quadruple kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel: svc_process+0x131/0x180 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:18:42 quadruple kernel: svc_recv+0x7f4/0x9b0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:18:42 quadruple kernel: ? __pfx_nfsd+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel: nfsd+0x87/0xd0 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel: kthread+0xd2/0x100
Nov 20 09:18:42 quadruple kernel: ? __pfx_kthread+0x10/0x10
Nov 20 09:18:42 quadruple kernel: ret_from_fork+0x34/0x50
Nov 20 09:18:42 quadruple kernel: ? __pfx_kthread+0x10/0x10
Nov 20 09:18:42 quadruple kernel: ret_from_fork_asm+0x1a/0x30
Nov 20 09:18:42 quadruple kernel: </TASK>
Nov 20 09:20:44 quadruple kernel: INFO: task nfsd:2802 blocked for more than 368 seconds.
Nov 20 09:20:44 quadruple kernel: Not tainted 6.11.9-arch1-1 #1
Nov 20 09:20:44 quadruple kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 09:20:44 quadruple kernel: task:nfsd state:D stack:0 pid:2802 tgid:2802 ppid:2 flags:0x00004000
Nov 20 09:20:44 quadruple kernel: Call Trace:
Nov 20 09:20:44 quadruple kernel: <TASK>
Nov 20 09:20:44 quadruple kernel: __schedule+0x408/0x1440
Nov 20 09:20:44 quadruple kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 20 09:20:44 quadruple kernel: schedule+0x27/0xf0
Nov 20 09:20:44 quadruple kernel: schedule_preempt_disabled+0x15/0x30
Nov 20 09:20:44 quadruple kernel: rwsem_down_read_slowpath+0x26f/0x4e0
Nov 20 09:20:44 quadruple kernel: down_read+0x48/0xa0
Nov 20 09:20:44 quadruple kernel: shmem_getattr+0x7b/0xe0
Nov 20 09:20:44 quadruple kernel: fh_fill_pre_attrs+0x116/0x180 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel: nfsd4_open+0x9a0/0xc10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel: nfsd4_proc_compound+0x39f/0x700 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel: nfsd_dispatch+0xd2/0x220 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel: svc_process_common+0x4d5/0x6a0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:20:44 quadruple kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel: svc_process+0x131/0x180 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:20:44 quadruple kernel: svc_recv+0x7f4/0x9b0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:20:44 quadruple kernel: ? __pfx_nfsd+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel: nfsd+0x87/0xd0 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel: kthread+0xd2/0x100
Nov 20 09:20:44 quadruple kernel: ? __pfx_kthread+0x10/0x10
Nov 20 09:20:44 quadruple kernel: ret_from_fork+0x34/0x50
Nov 20 09:20:44 quadruple kernel: ? __pfx_kthread+0x10/0x10
Nov 20 09:20:44 quadruple kernel: ret_from_fork_asm+0x1a/0x30
Nov 20 09:20:44 quadruple kernel: </TASK>
I am not sure what do to to debug.
For reference, my /etc/exports:
/srv/nfs 10.9.8.0/24(ro,no_subtree_check,async,no_wdelay,fsid=0)
/srv/nfs/scratch 10.9.8.0/24(rw,no_subtree_check,async,no_wdelay,no_root_squash)
Last edited by graysky (2024-11-20 16:24:16)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
https://lore.kernel.org/linux-cve-annou … @gregkh/T/ ?
Are you good w/ 6.11.6 but run into this w/ 6.11.7 ?
Aaaannd… I then found https://lore.kernel.org/all/b40e7156-75 … gle.com/T/
Offline
Your google-fu is superior to mine, seth. Yes! If I downgrade to 6.11.6, everything works as expected. Updating to 6.11.7 triggers the bug. Looking into the links you posted now.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I see that the revert made it into 6.12.0 AND if I boot into that kernel, everything works as expected. Many thanks, seth!
% git checkout v6.12
% git log -- mm/shmem.c
commit d1aa0c04294e29883d65eac6c2f72fe95cc7c049
Author: Andrew Morton <akpm@linux-foundation.org>
Date: Fri Nov 15 16:57:24 2024 -0800
mm: revert "mm: shmem: fix data-race in shmem_getattr()"
Revert d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") as
suggested by Chuck [1]. It is causing deadlocks when accessing tmpfs over
NFS.
As Hugh commented, "added just to silence a syzbot sanitizer splat: added
where there has never been any practical problem".
Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net [1]
Fixes: d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()")
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline