You are not logged in.

#1 2024-11-20 14:11:39

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,645
Website

Writing to NFS share causes freezing forcing a reboot [SOLVED]

My NFS share has become unusable to the point where accessing it from another machine renders both the server and client unable to open a new terminal.  To trigger this, all I have to do is create a zero-byte file on the share.

From the client:

mount 10.9.8.101:/scratch /scratch
cd /scratch
touch foo

At this point, the terminal is frozen on the client.  If I open a new xterm on the server, a blank terminal is presented but it is frozen with no prompt.  Existing terms I have open are frozen as well.

I see this on the server in journalctl -f:

Nov 20 09:16:39 quadruple kernel: INFO: task nfsd:2802 blocked for more than 122 seconds.
Nov 20 09:16:39 quadruple kernel:       Not tainted 6.11.9-arch1-1 #1
Nov 20 09:16:39 quadruple kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 09:16:39 quadruple kernel: task:nfsd            state:D stack:0     pid:2802  tgid:2802  ppid:2      flags:0x00004000
Nov 20 09:16:39 quadruple kernel: Call Trace:
Nov 20 09:16:39 quadruple kernel:  <TASK>
Nov 20 09:16:39 quadruple kernel:  __schedule+0x408/0x1440
Nov 20 09:16:39 quadruple kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 20 09:16:39 quadruple kernel:  schedule+0x27/0xf0
Nov 20 09:16:39 quadruple kernel:  schedule_preempt_disabled+0x15/0x30
Nov 20 09:16:39 quadruple kernel:  rwsem_down_read_slowpath+0x26f/0x4e0
Nov 20 09:16:39 quadruple kernel:  down_read+0x48/0xa0
Nov 20 09:16:39 quadruple kernel:  shmem_getattr+0x7b/0xe0
Nov 20 09:16:39 quadruple kernel:  fh_fill_pre_attrs+0x116/0x180 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel:  nfsd4_open+0x9a0/0xc10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel:  nfsd4_proc_compound+0x39f/0x700 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel:  nfsd_dispatch+0xd2/0x220 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel:  svc_process_common+0x4d5/0x6a0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:16:39 quadruple kernel:  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel:  svc_process+0x131/0x180 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:16:39 quadruple kernel:  svc_recv+0x7f4/0x9b0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:16:39 quadruple kernel:  ? __pfx_nfsd+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel:  nfsd+0x87/0xd0 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:16:39 quadruple kernel:  kthread+0xd2/0x100
Nov 20 09:16:39 quadruple kernel:  ? __pfx_kthread+0x10/0x10
Nov 20 09:16:39 quadruple kernel:  ret_from_fork+0x34/0x50
Nov 20 09:16:39 quadruple kernel:  ? __pfx_kthread+0x10/0x10
Nov 20 09:16:39 quadruple kernel:  ret_from_fork_asm+0x1a/0x30
Nov 20 09:16:39 quadruple kernel:  </TASK>
Nov 20 09:18:42 quadruple kernel: INFO: task nfsd:2802 blocked for more than 245 seconds.
Nov 20 09:18:42 quadruple kernel:       Not tainted 6.11.9-arch1-1 #1
Nov 20 09:18:42 quadruple kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 09:18:42 quadruple kernel: task:nfsd            state:D stack:0     pid:2802  tgid:2802  ppid:2      flags:0x00004000
Nov 20 09:18:42 quadruple kernel: Call Trace:
Nov 20 09:18:42 quadruple kernel:  <TASK>
Nov 20 09:18:42 quadruple kernel:  __schedule+0x408/0x1440
Nov 20 09:18:42 quadruple kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 20 09:18:42 quadruple kernel:  schedule+0x27/0xf0
Nov 20 09:18:42 quadruple kernel:  schedule_preempt_disabled+0x15/0x30
Nov 20 09:18:42 quadruple kernel:  rwsem_down_read_slowpath+0x26f/0x4e0
Nov 20 09:18:42 quadruple kernel:  down_read+0x48/0xa0
Nov 20 09:18:42 quadruple kernel:  shmem_getattr+0x7b/0xe0
Nov 20 09:18:42 quadruple kernel:  fh_fill_pre_attrs+0x116/0x180 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel:  nfsd4_open+0x9a0/0xc10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel:  nfsd4_proc_compound+0x39f/0x700 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel:  nfsd_dispatch+0xd2/0x220 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel:  svc_process_common+0x4d5/0x6a0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:18:42 quadruple kernel:  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel:  svc_process+0x131/0x180 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:18:42 quadruple kernel:  svc_recv+0x7f4/0x9b0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:18:42 quadruple kernel:  ? __pfx_nfsd+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel:  nfsd+0x87/0xd0 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:18:42 quadruple kernel:  kthread+0xd2/0x100
Nov 20 09:18:42 quadruple kernel:  ? __pfx_kthread+0x10/0x10
Nov 20 09:18:42 quadruple kernel:  ret_from_fork+0x34/0x50
Nov 20 09:18:42 quadruple kernel:  ? __pfx_kthread+0x10/0x10
Nov 20 09:18:42 quadruple kernel:  ret_from_fork_asm+0x1a/0x30
Nov 20 09:18:42 quadruple kernel:  </TASK>
Nov 20 09:20:44 quadruple kernel: INFO: task nfsd:2802 blocked for more than 368 seconds.
Nov 20 09:20:44 quadruple kernel:       Not tainted 6.11.9-arch1-1 #1
Nov 20 09:20:44 quadruple kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 20 09:20:44 quadruple kernel: task:nfsd            state:D stack:0     pid:2802  tgid:2802  ppid:2      flags:0x00004000
Nov 20 09:20:44 quadruple kernel: Call Trace:
Nov 20 09:20:44 quadruple kernel:  <TASK>
Nov 20 09:20:44 quadruple kernel:  __schedule+0x408/0x1440
Nov 20 09:20:44 quadruple kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 20 09:20:44 quadruple kernel:  schedule+0x27/0xf0
Nov 20 09:20:44 quadruple kernel:  schedule_preempt_disabled+0x15/0x30
Nov 20 09:20:44 quadruple kernel:  rwsem_down_read_slowpath+0x26f/0x4e0
Nov 20 09:20:44 quadruple kernel:  down_read+0x48/0xa0
Nov 20 09:20:44 quadruple kernel:  shmem_getattr+0x7b/0xe0
Nov 20 09:20:44 quadruple kernel:  fh_fill_pre_attrs+0x116/0x180 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel:  nfsd4_open+0x9a0/0xc10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel:  nfsd4_proc_compound+0x39f/0x700 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel:  nfsd_dispatch+0xd2/0x220 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel:  svc_process_common+0x4d5/0x6a0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:20:44 quadruple kernel:  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel:  svc_process+0x131/0x180 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:20:44 quadruple kernel:  svc_recv+0x7f4/0x9b0 [sunrpc 1400000003000000474e5500deb52ff59d229437]
Nov 20 09:20:44 quadruple kernel:  ? __pfx_nfsd+0x10/0x10 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel:  nfsd+0x87/0xd0 [nfsd 1400000003000000474e550000334328922aa994]
Nov 20 09:20:44 quadruple kernel:  kthread+0xd2/0x100
Nov 20 09:20:44 quadruple kernel:  ? __pfx_kthread+0x10/0x10
Nov 20 09:20:44 quadruple kernel:  ret_from_fork+0x34/0x50
Nov 20 09:20:44 quadruple kernel:  ? __pfx_kthread+0x10/0x10
Nov 20 09:20:44 quadruple kernel:  ret_from_fork_asm+0x1a/0x30
Nov 20 09:20:44 quadruple kernel:  </TASK>

I am not sure what do to to debug.

For reference, my /etc/exports:

/srv/nfs          10.9.8.0/24(ro,no_subtree_check,async,no_wdelay,fsid=0)
/srv/nfs/scratch  10.9.8.0/24(rw,no_subtree_check,async,no_wdelay,no_root_squash)

Last edited by graysky (2024-11-20 16:24:16)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#2 2024-11-20 15:05:44

seth
Member
Registered: 2012-09-03
Posts: 59,042

Re: Writing to NFS share causes freezing forcing a reboot [SOLVED]

https://lore.kernel.org/linux-cve-annou … @gregkh/T/ ?
Are you good w/ 6.11.6 but run into this w/ 6.11.7 ?

Aaaannd… I then found https://lore.kernel.org/all/b40e7156-75 … gle.com/T/ smile

Offline

#3 2024-11-20 16:13:12

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,645
Website

Re: Writing to NFS share causes freezing forcing a reboot [SOLVED]

Your google-fu is superior to mine, seth.  Yes!  If I downgrade to 6.11.6, everything works as expected.  Updating to 6.11.7 triggers the bug.  Looking into the links you posted now.


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#4 2024-11-20 16:24:00

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,645
Website

Re: Writing to NFS share causes freezing forcing a reboot [SOLVED]

I see that the revert made it into 6.12.0 AND if I boot into that kernel, everything works as expected.  Many thanks, seth!

% git checkout v6.12
% git log -- mm/shmem.c

commit d1aa0c04294e29883d65eac6c2f72fe95cc7c049
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Fri Nov 15 16:57:24 2024 -0800

    mm: revert "mm: shmem: fix data-race in shmem_getattr()"

    Revert d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") as
    suggested by Chuck [1].  It is causing deadlocks when accessing tmpfs over
    NFS.

    As Hugh commented, "added just to silence a syzbot sanitizer splat: added
    where there has never been any practical problem".

    Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net [1]
    Fixes: d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()")

CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

Board footer

Powered by FluxBB