You are not logged in.

#1 2023-09-01 15:57:49

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,600
Website

Recently developed NFSv4 instabilities ... debug process discussion

Upon updating my system from approx 25-Aug, I am noticing NFSv4 server instabilities.  Basically I see core dump followed by rpc-srv/tcp: nfsd stopping.  Unfortunately the bug does not instantly manifest.  It seems to be triggered by clients access the nfsv4 shares but can take some time to trigger.

You see it below on the testing kernel but I get the same when on 6.4.12-arch1-1.

From journalctl:

Sep 01 11:47:45 myth2 kernel: ------------[ cut here ]------------
Sep 01 11:47:45 myth2 kernel: WARNING: CPU: 1 PID: 725 at net/core/skbuff.c:6828 skb_splice_from_iter+0x11f/0x310
Sep 01 11:47:45 myth2 kernel: Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core dm_crypt cbc encrypted_keys trusted asn1_encoder tee intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclam>
Sep 01 11:47:45 myth2 kernel:  grace nct6775 nct6775_core fuse sunrpc hwmon_vid dm_mod loop vfat fat ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_pci_renesas amdgpu i2>
Sep 01 11:47:45 myth2 kernel: CPU: 1 PID: 725 Comm: nfsd Not tainted 6.5.0-arch1-1 #1 889c1df33e983653d40745066be9d1a7f06d4bc4
Sep 01 11:47:45 myth2 kernel: Hardware name: MSI MS-7851/H97I AC (MS-7851), BIOS V5.9 02/19/2016
Sep 01 11:47:45 myth2 kernel: RIP: 0010:skb_splice_from_iter+0x11f/0x310
Sep 01 11:47:45 myth2 kernel: Code: 8b 75 00 48 29 c1 4c 39 f9 49 8b 56 08 49 0f 47 cf 49 89 cc f6 c2 01 0f 85 9e 01 00 00 66 90 4c 89 f2 48 8b 12 80 e6 02 74 7c <0f> 0b 49 c7 c7 fb ff ff ff 48 8b 44 24 10 4>
Sep 01 11:47:45 myth2 kernel: RSP: 0018:ffffa4ec41237c20 EFLAGS: 00010246
Sep 01 11:47:45 myth2 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000669
Sep 01 11:47:45 myth2 kernel: RDX: 0000000000000000 RSI: ffffcd5b0590ac40 RDI: ffff899740b97200
Sep 01 11:47:45 myth2 kernel: RBP: ffffa4ec41237c58 R08: 0000000000000011 R09: 0000000000001000
Sep 01 11:47:45 myth2 kernel: R10: 0000000000000669 R11: 0000000000000000 R12: 0000000000000669
Sep 01 11:47:45 myth2 kernel: R13: ffff899740b97200 R14: ffffcd5b0590ac80 R15: 0000000000000669
Sep 01 11:47:45 myth2 kernel: FS:  0000000000000000(0000) GS:ffff899855c80000(0000) knlGS:0000000000000000
Sep 01 11:47:45 myth2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 01 11:47:45 myth2 kernel: CR2: 00007f7df0976000 CR3: 000000010fa20006 CR4: 00000000001706e0
Sep 01 11:47:45 myth2 kernel: Call Trace:
Sep 01 11:47:45 myth2 kernel:  <TASK>
Sep 01 11:47:45 myth2 kernel:  ? skb_splice_from_iter+0x11f/0x310
Sep 01 11:47:45 myth2 kernel:  ? __warn+0x81/0x130
Sep 01 11:47:45 myth2 kernel:  ? skb_splice_from_iter+0x11f/0x310
Sep 01 11:47:45 myth2 kernel:  ? report_bug+0x171/0x1a0
Sep 01 11:47:45 myth2 kernel:  ? handle_bug+0x3c/0x80
Sep 01 11:47:45 myth2 kernel:  ? exc_invalid_op+0x17/0x70
Sep 01 11:47:45 myth2 kernel:  ? asm_exc_invalid_op+0x1a/0x20
Sep 01 11:47:45 myth2 kernel:  ? skb_splice_from_iter+0x11f/0x310
Sep 01 11:47:45 myth2 kernel:  tcp_sendmsg_locked+0x372/0xd20
Sep 01 11:47:45 myth2 kernel:  ? tcp_sendmsg+0x35/0x40
Sep 01 11:47:45 myth2 kernel:  ? __pfx_nfsd+0x10/0x10 [nfsd f8684f5a30b12f229e505128f799628b1bdfdfb4]
Sep 01 11:47:45 myth2 kernel:  tcp_sendmsg+0x2b/0x40
Sep 01 11:47:45 myth2 kernel:  sock_sendmsg+0x8f/0xa0
Sep 01 11:47:45 myth2 kernel:  svc_tcp_sendmsg+0x128/0x180 [sunrpc 196b546034ca8190975505fbb59b1c9e94d1ed26]
Sep 01 11:47:45 myth2 kernel:  svc_tcp_sendto+0xa9/0x1f0 [sunrpc 196b546034ca8190975505fbb59b1c9e94d1ed26]
Sep 01 11:47:45 myth2 kernel:  svc_send+0x51/0x150 [sunrpc 196b546034ca8190975505fbb59b1c9e94d1ed26]
Sep 01 11:47:45 myth2 kernel:  nfsd+0xd9/0x1b0 [nfsd f8684f5a30b12f229e505128f799628b1bdfdfb4]
Sep 01 11:47:45 myth2 kernel:  kthread+0xe8/0x120
Sep 01 11:47:45 myth2 kernel:  ? __pfx_kthread+0x10/0x10
Sep 01 11:47:45 myth2 kernel:  ret_from_fork+0x34/0x50
Sep 01 11:47:45 myth2 kernel:  ? __pfx_kthread+0x10/0x10
Sep 01 11:47:45 myth2 kernel:  ret_from_fork_asm+0x1b/0x30
Sep 01 11:47:45 myth2 kernel:  </TASK>
Sep 01 11:47:45 myth2 kernel: ---[ end trace 0000000000000000 ]---
Sep 01 11:47:45 myth2 kernel: rpc-srv/tcp: nfsd: sent 129495 when sending 131132 bytes - shutting down socket
Sep 01 11:47:53 myth2 nfsv4.exportd[700]: v4.0 client detached: 0x6e8214b864f2072e from "10.9.8.112:54826"

The packages affected:

warning: archlinux-keyring: downgrading from version 20230821-2 to version 20230821-1
warning: cpupower: downgrading from version 6.3-5 to version 6.3-4
warning: db5.3: downgrading from version 5.3.28-4 to version 5.3.28-2
warning: gd: downgrading from version 2.3.3-7 to version 2.3.3-6
warning: gperftools: downgrading from version 2.12-1 to version 2.10-3
warning: highway: downgrading from version 1.0.7-1 to version 1.0.6-1
warning: libavif: downgrading from version 1.0.1-1 to version 0.11.1-1
warning: libva: downgrading from version 2.19.0-2 to version 2.19.0-1
warning: libva-mesa-driver: downgrading from version 1:23.1.6-4 to version 1:23.1.6-1
warning: libvdpau: downgrading from version 1.5-2 to version 1.5-1
warning: libwacom: downgrading from version 2.8.0-1 to version 2.7.0-1
warning: llvm-libs: downgrading from version 16.0.6-1 to version 15.0.7-3
warning: mesa: downgrading from version 1:23.1.6-4 to version 1:23.1.6-1
warning: mesa-vdpau: downgrading from version 1:23.1.6-4 to version 1:23.1.6-1
warning: openexr: downgrading from version 3.1.11-1 to version 3.1.10-1
warning: perl-uri: downgrading from version 5.20-1 to version 5.19-2
warning: python: downgrading from version 3.11.5-1 to version 3.11.3-2
warning: spdlog: downgrading from version 1.12.0-1 to version 1.11.0-3
warning: sqlite: downgrading from version 3.43.0-1 to version 3.42.0-1
warning: svt-av1: downgrading from version 1.7.0-1 to version 1.6.0-1
warning: vim: downgrading from version 9.0.1736-2 to version 9.0.1736-1
warning: vim-runtime: downgrading from version 9.0.1736-2 to version 9.0.1736-1
warning: vulkan-headers: downgrading from version 1:1.3.262-1 to version 1:1.3.257-1

I am not sure what I can do to debug this but one idea is that a single package might be driving the problem.  If so, I could step the system day-by-day using ALA to try to pin point a package to blame, but since the problem can take a while to trigger, I'd like to seek feedback here for a more direct way.

Last edited by graysky (2023-09-01 15:58:21)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#2 2023-09-02 14:50:53

loqs
Member
Registered: 2014-03-06
Posts: 17,472

Re: Recently developed NFSv4 instabilities ... debug process discussion

graysky wrote:

You see it below on the testing kernel but I get the same when on 6.4.12-arch1-1.

Sep 01 11:47:45 myth2 kernel: WARNING: CPU: 1 PID: 725 at net/core/skbuff.c:6828 skb_splice_from_iter+0x11f/0x310

skb_splice_from_iter was only added in 6.5 https://github.com/torvalds/linux/commi … e02d1a850a and it was not backported to 6.4.12.  Can you please post the warning for 6.4.12.

Offline

#3 2023-09-02 16:05:34

seth
Member
Registered: 2012-09-03
Posts: 51,826

Re: Recently developed NFSv4 instabilities ... debug process discussion

Sep 01 11:47:45 myth2 kernel: rpc-srv/tcp: nfsd: sent 129495 when sending 131132 bytes - shutting down socket
Sep 01 11:47:53 myth2 nfsv4.exportd[700]: v4.0 client detached: 0x6e8214b864f2072e from "10.9.8.112:54826"

Any chance the cause is actually a change to the infrastructure?
(Aka "your network is rotting and the limited throughput stacks up and exceeds the tcp buffer")
https://serverfault.com/questions/88049 … ng-1048708

Offline

#4 2023-09-02 17:53:44

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,600
Website

Re: Recently developed NFSv4 instabilities ... debug process discussion

@loqs - I went back through and searched journalctl and you are correct.  I only see this for boot lines into 6.5.0-arch1-1.  I must have neglected to reboot upon downgrade to 6.4.12.  Thanks for the links.  I just switched to linux-lts and am trying to trigger this but haven't yet.  Will log some more uptime before declaring victory.

@seth - I do not think that applies but will consider it.  Thanks for the reply.


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

Board footer

Powered by FluxBB