You are not logged in.
Pages: 1
I have a VM running on proxmox.
The VM runs Arch with the kernel linux 6.17.7.arch1-1
It has been leaking kernel memory.
I installed smem to see what is happening.
Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 88.21% 36.25% 51.96%
userspace memory 9.49% 0.15% 9.34%
free memory 2.30% 2.30% 0.00% total used free shared buff/cache available
Mem: 31Gi 3.6Gi 500Mi 4.0Mi 11Gi 27Gi
Swap: 4.0Gi 179Mi 3.8GiIt's the Noncache memory usage I am worried about as it appears to be flagged as reclaimable (as it shows up in "available") yet nothing can trigger it to be reclaimed. The node just starts swapping as soon as it runs out of free ram.
drop_caches has no effect. Allocating ram that would cause this supposedly available ram to be freed has no effect.
I have checked slabtop and the memory is not there. This is around 16+GB of ram. I can stop all workloads on the VM and the memory is not freed.
I have run VM's with this workload in the past and not had any issues. This is a recent change.
The only interesting thing about the VM is it mounts a CephFS filesystem.
Just looking for some ideas on how to track this down a bit better for a bug report. I'm going to test out the LTS kernel and see if the issue persists. I'm also, once the current data move is complete going to try unmounting the cephfs and see if the cephfs driver is the cause. I don't think that would be the case but it's worth a shot.
As I said, any further ideas on testing or diagnosing which kernel module has gone nuts would be appriceated as I do intend to lodge a ticket with the kernel devs once I can narrow it down a bit better and hopefully create a reproducer that doesn't require me to run my workload.
Thanks all.
Offline
OK confirmed that unmounting ceph has no effect on the weird memory usage
Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 78.42% 0.28% 78.14%
userspace memory 3.74% 0.10% 3.64%
free memory 17.84% 17.84% 0.00%
total used free shared buff/cache available
Mem: 31Gi 648Mi 7.7Gi 4.0Mi 104Mi 30Gi
Swap: 4.0Gi 28Mi 4.0GiThis is the same box 12ish hours later.
It's leaking.
I'm going to try the TLS kernel and see if that resolves the issue.
EDIT:
The worse the memory situation gets the slower the network runs? Very strange
Last edited by insanemal (2025-11-08 10:30:09)
Offline
The worse the memory situation gets the slower the network runs?
rx/tx buffer?
What if you take down all network connections, does it free the memory?
Online
I didn't try that. But I did roll back to LTS and its fixed.
EDIT:
That's not entirely true. I did rmmod the virtio_net module as well as all the iptables modules and ceph modules.
Also I've just checked another VM running a less intensive workload
smem -wp
Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 73.39% 48.69% 24.69%
userspace memory 23.29% 13.43% 9.86%
free memory 3.33% 3.33% 0.00% And it does seem that the leak speed is directly related to how hard it hits the cephfs filesystem (or how hard it hits the network, not sure). So it's either network or something odd in the VFS layer.
EDIT 2:
Just realized, it's probably trying to allocate rx/tx buffers and having to reclaim ram/swap out, to keep working which would explain it's slow decline in performance.
Last edited by insanemal (2025-11-08 22:17:40)
Offline
You could try to use the fuse3 cephfs implementation… https://aur.archlinux.org/packages?O=0&K=cephfs
Online
I don't think it's ceph. Unloading all the ceph modules didn't return the memory.
I'm going to try for an NFS reproducer.
Plus regardless its a regression in the newer kernel, somewhere.
Last edited by insanemal (2025-11-09 14:00:46)
Offline
It's not ceph. And it doesn't appear to be network filesystems. It looks like it could be the virtio scsi driver. Also 6.17.6 doesn't have the issue. So hopefully I'll have a nice quick git-bisect.
Offline
Not seeing anything related to virtio in the 6.17.7 ChangeLog and Arch did not add any new cherry-picks for that release.
Offline
I have the same issue running arch on my laptop. Recently updated the system and before everything was ok, but now memory leaks:
smem -w -p
Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 60.85% 8.66% 52.19%
userspace memory 35.95% 5.43% 30.53%
free memory 3.19% 3.19% 0.00%free -h
total used free shared buff/cache available
Mem: 27Gi 24Gi 923Mi 2,8Gi 3,8Gi 2,5Gi
Swap: 0B 0B 0BThis isn't very reliable, I can have a day with everything ok, but the next day memory is leaked.
I tried to gather some info, but have no idea what is useful or not
uname -r
6.17.8-arch1-1cat /proc/meminfo -p
MemTotal: 28530440 kB
MemFree: 680952 kB
MemAvailable: 2671288 kB
Buffers: 12 kB
Cached: 4185708 kB
SwapCached: 0 kB
Active: 5586532 kB
Inactive: 7229264 kB
Active(anon): 4673736 kB
Inactive(anon): 5857168 kB
Active(file): 912796 kB
Inactive(file): 1372096 kB
Unevictable: 388 kB
Mlocked: 388 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Zswap: 0 kB
Zswapped: 0 kB
Dirty: 4540 kB
Writeback: 0 kB
AnonPages: 8615140 kB
Mapped: 1612328 kB
Shmem: 2945572 kB
KReclaimable: 95340 kB
Slab: 420796 kB
SReclaimable: 95340 kB
SUnreclaim: 325456 kB
KernelStack: 30544 kB
PageTables: 94800 kB
SecPageTables: 3812 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 14265220 kB
Committed_AS: 27379320 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 95408 kB
VmallocChunk: 0 kB
Percpu: 17024 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1062912 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 159744 kB
FilePmdMapped: 159744 kB
CmaTotal: 0 kB
CmaFree: 0 kB
Unaccepted: 0 kB
Balloon: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 17188092 kB
DirectMap2M: 11972608 kB
DirectMap1G: 1048576 kBsudo slabtop -sc
Active / Total Objects (% used) : 870615 / 1084021 (80,3%)
Active / Total Slabs (% used) : 24933 / 24933 (100,0%)
Active / Total Caches (% used) : 152 / 219 (69,4%)
Active / Total Size (% used) : 215854 / 273336 (79,0%)
Minimum / Average / Maximum Object : 0,01K / 0,25K / 8,31K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
74620 31473 42% 0,57K 2665 28 42640 radix_tree_node
23963 17594 73% 1,01K 773 31 24736 btrfs_inode
2019 2019 100% 8,31K 673 3 21536 task_struct
19584 19065 97% 1,00K 612 32 19584 kmalloc-1k
86730 21057 24% 0,19K 2065 42 16520 dentry
81900 78670 96% 0,19K 1950 42 15600 vm_area_struct
23200 19385 83% 0,50K 725 32 11600 kmalloc-512
65010 65010 100% 0,13K 2167 30 8668 kernfs_node_cache
21424 14336 66% 0,30K 824 26 6592 btrfs_delayed_node
5536 5490 99% 1,00K 173 32 5536 iommu_iova_magazine
652 644 98% 8,00K 163 4 5216 kmalloc-8k
42822 24191 56% 0,10K 1098 39 4392 Acpi-ParseExt
1072 1056 98% 4,00K 134 8 4288 kmalloc-4k
4469 4142 92% 0,77K 109 41 3488 shmem_inode_cache
1744 1744 100% 2,00K 109 16 3488 kmalloc-2k
17514 15471 88% 0,19K 417 42 3336 filp
13120 12283 93% 0,25K 410 32 3280 kmalloc-256
5000 3855 77% 0,62K 200 25 3200 inode_cache
4462 2296 51% 0,70K 97 46 3104 proc_inode_cache
41776 41419 99% 0,07K 746 56 2984 vmap_area
744 725 97% 4,00K 93 8 2976 biovec-max
45376 43000 94% 0,06K 709 64 2836 anon_vma_chain
11136 9093 81% 0,25K 348 32 2784 maple_node
25467 24733 97% 0,10K 653 39 2612 anon_vma
1184 1171 98% 2,00K 74 16 2368 kmalloc-cg-2k
67968 30505 44% 0,03K 531 128 2124 lsm_inode_cache
2457 2168 88% 0,81K 63 39 2016 sock_inode_cache
2928 2928 100% 0,64K 122 24 1952 debugfs_inode_cache
15424 14791 95% 0,12K 482 32 1928 kmalloc-128
9954 9478 95% 0,19K 237 42 1896 file_lock_cache
472 465 98% 4,00K 59 8 1888 kmalloc-cg-4k
29568 28644 96% 0,06K 462 64 1848 kmalloc-64
1596 1519 95% 1,12K 57 28 1824 signal_cache
840 776 92% 2,06K 56 15 1792 sighand_cache
14144 13630 96% 0,12K 442 32 1768 eventpoll_epi
17304 17068 98% 0,09K 412 42 1648 kmalloc-96
25728 25728 100% 0,06K 402 64 1608 dmaengine-unmap-2
1978 1790 90% 0,69K 43 46 1376 skbuff_small_head
1248 1229 98% 1,00K 39 32 1248 kmalloc-cg-1k
5964 5869 98% 0,19K 142 42 1136 kmalloc-192
128 128 100% 8,00K 32 4 1024 kmalloc-cg-8k
2016 1952 96% 0,50K 63 32 1008 pool_workqueue
31872 29456 92% 0,03K 249 128 996 kmalloc-32
12444 12444 100% 0,08K 244 51 976 sigqueueOffline
@masafi have you considered bisecting between 6.17.6 and 6.17.7 as insanemal intended to do?
Offline
Pages: 1