You are not logged in.
Pages: 1
Hello.
After some days of uptime, I've found an unusual high level of memory usage. After closing almost every program and deactivating swap, this is what I've got:
# free -k
total used free shared buffers cached
Mem: 4017412 1886404 2131008 0 12 48108
-/+ buffers/cache: 1838284 2179128
Swap: 0 0 0
# slabtop
Active / Total Objects (% used) : 127074 / 150991 (84,2%)
Active / Total Slabs (% used) : 4188 / 4188 (100,0%)
Active / Total Caches (% used) : 73 / 97 (75,3%)
Active / Total Size (% used) : 27323,81K / 34807,09K (78,5%)
Minimum / Average / Maximum Object : 0,01K / 0,23K / 15,69K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
17460 17460 100% 0,11K 485 36 1940K sysfs_dir_cache
13839 8814 63% 0,19K 659 21 2636K dentry
9024 6863 76% 0,06K 141 64 564K kmalloc-64
8330 7771 93% 0,05K 98 85 392K shared_policy_node
8092 7085 87% 0,57K 289 28 4624K inode_cache
7552 6801 90% 0,03K 59 128 236K kmalloc-32
6656 6656 100% 0,02K 26 256 104K kmalloc-16
5856 3067 52% 0,12K 183 32 732K kmalloc-128
5696 5398 94% 0,06K 89 64 356K anon_vma
5174 2188 42% 0,15K 199 26 796K btrfs_extent_map
4608 4608 100% 0,01K 9 512 36K kmalloc-8
4386 4386 100% 0,04K 43 102 172K btrfs_delayed_extent_op
4160 3843 92% 0,06K 65 64 260K btrfs_free_space
3822 2807 73% 0,09K 91 42 364K kmalloc-96
3390 1437 42% 1,04K 113 30 3616K btrfs_inode
3312 3312 100% 0,09K 72 46 288K btrfs_delayed_tree_ref
# cat /proc/meminfo
MemTotal: 4017412 kB
MemFree: 2130676 kB
Buffers: 12 kB
Cached: 48140 kB
SwapCached: 0 kB
Active: 23764 kB
Inactive: 31920 kB
Active(anon): 1620 kB
Inactive(anon): 6604 kB
Active(file): 22144 kB
Inactive(file): 25316 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 36 kB
Writeback: 0 kB
AnonPages: 7484 kB
Mapped: 1760 kB
Shmem: 692 kB
Slab: 35256 kB
SReclaimable: 17768 kB
SUnreclaim: 17488 kB
KernelStack: 688 kB
PageTables: 544 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 2008704 kB
Committed_AS: 24076 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 564516 kB
VmallocChunk: 34359159395 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 11904 kB
DirectMap2M: 4147200 kB
# ps aux -A
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 113616 1164 ? Ss nov11 0:03 /sbin/init
root 2 0.0 0.0 0 0 ? S nov11 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S nov11 1:58 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< nov11 0:00 [kworker/0:0H]
root 7 0.0 0.0 0 0 ? S nov11 0:00 [migration/0]
root 8 0.0 0.0 0 0 ? S nov11 0:16 [rcu_preempt]
root 9 0.0 0.0 0 0 ? S nov11 0:00 [rcu_bh]
root 10 0.0 0.0 0 0 ? S nov11 0:00 [rcu_sched]
root 11 0.0 0.0 0 0 ? S nov11 0:00 [watchdog/0]
root 12 0.0 0.0 0 0 ? S nov11 0:00 [watchdog/1]
root 13 0.0 0.0 0 0 ? S nov11 0:00 [migration/1]
root 14 0.0 0.0 0 0 ? S nov11 1:53 [ksoftirqd/1]
root 16 0.0 0.0 0 0 ? S< nov11 0:00 [kworker/1:0H]
root 17 0.0 0.0 0 0 ? S nov11 0:00 [watchdog/2]
root 18 0.0 0.0 0 0 ? S nov11 0:00 [migration/2]
root 19 0.0 0.0 0 0 ? S nov11 1:44 [ksoftirqd/2]
root 21 0.0 0.0 0 0 ? S< nov11 0:00 [kworker/2:0H]
root 22 0.0 0.0 0 0 ? S nov11 0:00 [watchdog/3]
root 23 0.0 0.0 0 0 ? S nov11 0:00 [migration/3]
root 24 0.0 0.0 0 0 ? S nov11 1:42 [ksoftirqd/3]
root 26 0.0 0.0 0 0 ? S< nov11 0:00 [kworker/3:0H]
root 27 0.0 0.0 0 0 ? S< nov11 0:00 [khelper]
root 28 0.0 0.0 0 0 ? S nov11 0:00 [kdevtmpfs]
root 29 0.0 0.0 0 0 ? S< nov11 0:00 [netns]
root 30 0.0 0.0 0 0 ? S< nov11 0:00 [writeback]
root 31 0.0 0.0 0 0 ? S< nov11 0:00 [bioset]
root 32 0.0 0.0 0 0 ? S< nov11 0:00 [kblockd]
root 37 0.0 0.0 0 0 ? S nov11 0:00 [khungtaskd]
root 38 0.0 0.0 0 0 ? S nov11 1:47 [kswapd0]
root 39 0.0 0.0 0 0 ? SN nov11 0:00 [ksmd]
root 40 0.0 0.0 0 0 ? SN nov11 0:01 [khugepaged]
root 41 0.0 0.0 0 0 ? S nov11 0:00 [fsnotify_mark]
root 42 0.0 0.0 0 0 ? S< nov11 0:00 [crypto]
root 46 0.0 0.0 0 0 ? S< nov11 0:00 [kthrotld]
root 47 0.0 0.0 0 0 ? S< nov11 0:00 [deferwq]
root 71 0.0 0.0 0 0 ? S nov11 0:00 [khubd]
root 81 0.0 0.0 0 0 ? S< nov11 0:00 [ata_sff]
root 82 0.0 0.0 0 0 ? S nov11 0:00 [scsi_eh_0]
root 83 0.0 0.0 0 0 ? S nov11 0:00 [scsi_eh_1]
root 86 0.0 0.0 0 0 ? S nov11 0:00 [scsi_eh_2]
root 87 0.0 0.0 0 0 ? S nov11 0:00 [scsi_eh_3]
root 95 0.0 0.0 0 0 ? S< nov11 0:02 [kworker/1:1H]
root 96 0.0 0.0 0 0 ? S< nov11 0:02 [kworker/0:1H]
root 105 0.0 0.0 0 0 ? S< nov11 0:00 [bioset]
root 113 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-genwork-1]
root 114 0.0 0.0 0 0 ? S nov11 0:04 [btrfs-submit-1]
root 116 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-fixup-1]
root 119 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-rmw-1]
root 120 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-endio-rai]
root 121 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-endio-met]
root 123 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-freespace]
root 125 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-cache-1]
root 126 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-readahead]
root 127 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-flush_del]
root 128 0.0 0.0 0 0 ? S nov11 0:00 [btrfs-qgroup-re]
root 132 0.0 0.0 0 0 ? S< nov11 0:02 [kworker/3:1H]
root 133 0.0 0.0 0 0 ? S< nov11 0:02 [kworker/2:1H]
root 134 0.0 0.0 0 0 ? S nov11 0:59 [btrfs-cleaner]
root 135 0.0 0.0 0 0 ? S nov11 1:46 [btrfs-transacti]
root 251 0.0 0.0 0 0 ? S< nov11 0:00 [kpsmoused]
root 253 0.0 0.0 0 0 ? S< nov11 0:00 [hd-audio0]
root 257 0.0 0.0 0 0 ? S< nov11 0:00 [kvm-irqfd-clean]
root 273 0.0 0.0 86328 776 ? Ss nov11 0:00 login -- root
root 1100 0.0 0.0 0 0 ? S nov11 0:00 [ecryptfs-kthrea]
root 1910 0.0 0.0 0 0 ? S 09:13 0:00 [kworker/2:1]
root 3220 0.0 0.0 0 0 ? S 09:14 0:00 [btrfs-endio-2]
root 3321 0.0 0.0 0 0 ? S 09:16 0:00 [kworker/u8:0]
root 3750 0.0 0.0 0 0 ? S 09:23 0:00 [btrfs-endio-met]
root 4051 0.0 0.0 0 0 ? S 09:27 0:00 [btrfs-endio-met]
root 4053 0.0 0.0 12876 1356 tty1 R+ 09:27 0:00 ps aux -A
root 10403 0.1 0.0 0 0 ? S nov12 2:36 [kworker/0:2]
root 10983 0.0 0.0 0 0 ? S nov12 0:00 [btrfs-delayed-m]
root 12547 0.1 0.0 0 0 ? S nov12 2:00 [kworker/3:2]
root 25179 0.0 0.0 0 0 ? S 08:11 0:00 [btrfs-worker-2]
root 25247 0.1 0.0 0 0 ? S 08:48 0:03 [kworker/1:2]
root 25257 0.0 0.0 0 0 ? S 08:59 0:00 [kworker/1:0]
root 25262 0.0 0.0 0 0 ? S 09:03 0:00 [btrfs-endio-wri]
root 25418 0.0 0.0 0 0 ? S 09:05 0:00 [btrfs-delalloc-]
root 25420 0.0 0.0 0 0 ? S 09:05 0:00 [kworker/u8:2]
root 25449 0.0 0.0 0 0 ? S 09:07 0:00 [kworker/2:0]
root 25539 0.0 0.0 0 0 ? S 09:09 0:00 [kworker/1:1]
root 25619 0.2 0.1 48344 6800 tty1 Ss 09:10 0:02 -zsh
root 27130 0.0 0.0 0 0 ? S 09:10 0:00 [kworker/0:1]
root 27334 0.0 0.0 0 0 ? S 09:11 0:00 [kworker/3:1]
As you can see, the only things that are running are zsh (almost 7 MB RSS!), login, init and, of course, the kernel, what amounts for less than 45 MB (init + zsh + slab, without caches and buffers), while free is saying I'm using 1.8 GB . And yes, I can get full memory problems if I write 2 GB of zeroes inside a tmpfs like /dev/shm and I open 200 MB worth of programs, so it is not really reclaimable memory. I think the problem is due to the kernel, but I can be mistaken. I'm running linux version 3.11.6-1 and systemd version 208-2 (not sure, but as it is the only thing with a two-day uptime, it could be part of the problem). Also, I'm using btrfs for the system partition and ecryptfs for a dir inside my home (it is a computer I'm using at work, I'm not going to left some things unencrypted for someone to get them at night while I'm home! Nevertheless, it is unmounted during this test).
I'm going to try with the long-term stable kernel version 3.10.18-1 to see if it is reproducible with that version.
Last edited by luismiguelgcg (2013-11-13 09:12:23)
Offline
The same results are obtained after two days of linux 3.10.18-1 usage. Let's see what happens with 3.12.0-1.
Offline
And the same happens with 3.12.0-1 and 3.12.1-1, x86_64. I've tried memtest86+, but it says my RAM modules are error-free. Has anybody a clue of what could be happening here?
Offline
You might try dropping all caches; but that'll probably not work. This interests me though; but I cannot really help you any further
echo 3 > /proc/sys/vm/drop_caches
Offline
I have this bug - droping caches doesn't do anything, and it is being reported as non-cached memory usage. It just doesn't show up under any userspace process in top.
Switched kernels as well, no dice. The system after a cold boot to a plasma desktop reports 1.2G memory usage out of 16G, but my last run ran OOM in 3 hours with 16G of ram and a 16G swap.
The commonality between our experiences appears to be having btrfs disks, though that wouldn't explain why the issue propagates between kernels.
Just between when I got on the forum to post about this, and now (around 15 mins later) my memory consumption is up to 2.7G with the exact same reported usage in top.
I looked around today, found out stock Arch kernels don't support kmemleak. I'm looking for alternative ways to profile what is leaking.
Also found out both veromix and dbus are leaking too depending on circumstance while trying to isolate this bug. Dbus at 1.37G and the python engine running veromix at 1.1, mainly caused by sink propagation making a lot of trash in veromix. Dunno what is causing the dbus leak at all, dbus-monitor isn't producing anything tangible. This is great.
Last edited by zanny (2013-11-26 14:44:30)
Offline
We have another thing in common, as we both are using plasma, but as far as I know plasma can't hop out of its userspace garden and start munching memory as a kernel kid. So yes, this should be something related to btrfs.
I'm going to compile linux-3.12.1-1 with kmemleak support. I hope I'll find what's happening here.
I'm not using veromix, but dbus is not leaking in my case. Could it be veromix doing something weird with pulseaudio, making it clog its dbus pipe?
Offline
The veromix / dbus leaks I mention are tangential to the kernel leak. Same thing with plasma, I tried booting with kdm disabled to just a raw tty and I still found the memory consumption going through the roof.
Veromix / dbus are just naturally leaking independently of the kernel leak, I just noticed it. I'm running with them under valgrind right now to see if whatever it is shows up again.
I'll hold off trying to build a manual kernel to try to debug this until I hear back if you get any results.
Some more unique circumstances I have that might cause a leak few others are experiencing:
I use two monitors on an hd4600
My chipset board has an abnormal number of usb hubs and ports (a total of 14 USB3 and 5 USB2)
Besides those though its a normal work computer, I use qt creator and libre office a lot but again those are all userspace and wouldn't be calling any sparsely traversed kernel instruction paths that could generate big leaks. I'd imagine the leaking happens irrespective of the software I run, though I could try disabling some of the myriad of daemons I have running.
Update: Since my system is effectively unusable with it running out of memory every hour or so, I've been toggling on / off daemons and features.
I disabled all the following services:
sshd
smbd
nmbd
metalog
mcelog
cups
proftpd
murmur
avahi
kdm
WIth them all off, I mucked around on a tty for a while without a desktop and had a sustained memory footprint of 200M with polkit being the largest consumer at 20M. I don't think anything was leaking, but I didn't try summing all userspace memory usage to get an idea.
With kdm enabled but my desktop reset (trashed the plasmarc files in config) I'd get around 800M of memory footprint at boot, it ramps up a bit with akonadi running but caps out around 1200M. Mucked around in firefox (600M) for a while but never broke 2G usage.
However, I started reenabling plasmoids and my autostart apps, and right now I'm booting *only* a moderately rebuilt plasma desktop and skype, and I'm getting leaks again. I'm just rebooting quickly again after clearing all my kde caches.
Update 2: Disabled akonadi and nepomuk, so far so good. I'm also using kmix over veromix, due to those earlier bugs I mentioned. I'll try reenabling features tomorrow, but since all this leakage is outside userspace allocations it seems to be in either the kernel fs drivers that is being revealed while nepomuk scans, or in alsa when pulseaudio is acting funky with it.
Try turning off Akonadi by setting StartServer=false in ~/.config/akonadi/akonadiserverrc and shutting off nepomuk in the system settings desktop search and see if it stops leaking.
Last edited by zanny (2013-11-27 04:13:37)
Offline
btrfs runs a lot of background services which also show up in the processlist; I wonder if it's possible to get memory usage statistics from those? I wonder if this is reproducable in a vm with btrfs and few thousand files on it
Offline
Ok, after three or four days running, I've got some results from kmemleak. Sadly, they are not interesting. 59 MB of unreferenced objects, all of them running in userspace, plus 48 bytes unreferenced by swapper/0, everything showing after killing all but systemd and a tty session with zsh. It seems that my kmemleak entries are false positives. After all this, I have the following /proc/meminfo contents:
MemTotal: 4017312 kB
MemFree: 2656652 kB
Buffers: 24 kB
Cached: 32572 kB
SwapCached: 360 kB
Active: 12328 kB
Inactive: 26612 kB
Active(anon): 5904 kB
Inactive(anon): 13620 kB
Active(file): 6424 kB
Inactive(file): 12992 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 3941376 kB
SwapFree: 3938300 kB
Dirty: 60 kB
Writeback: 0 kB
AnonPages: 6048 kB
Mapped: 4180 kB
Shmem: 13180 kB
Slab: 328380 kB
SReclaimable: 89772 kB
SUnreclaim: 238608 kB
KernelStack: 768 kB
PageTables: 276 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 5950032 kB
Committed_AS: 42172 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 567640 kB
VmallocChunk: 34359146568 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 13952 kB
DirectMap2M: 4145152 kB
The kernel slab is quite large due to kmemleak:
Active / Total Objects (% used) : 463523 / 1073631 (43,2%)
Active / Total Slabs (% used) : 39610 / 39610 (100,0%)
Active / Total Caches (% used) : 76 / 100 (76,0%)
Active / Total Size (% used) : 132585,62K / 318069,88K (41,7%)
Minimum / Average / Maximum Object : 0,01K / 0,30K / 15,75K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
708162 204346 28% 0,30K 27237 26 217896K kmemleak_object
36063 23976 66% 1,04K 1401 30 44832K btrfs_inode
90678 72450 79% 0,19K 4318 21 17272K dentry
20636 2746 13% 0,55K 737 28 11792K radix_tree_node
10087 9965 98% 0,57K 361 28 5776K inode_cache
18044 7703 42% 0,30K 694 26 5552K btrfs_delayed_node
32500 18405 56% 0,15K 1250 26 5000K btrfs_extent_map
536 476 88% 4,00K 67 8 2144K kmalloc-4096
17892 17892 100% 0,11K 497 36 1988K sysfs_dir_cache
2175 1234 56% 0,63K 87 25 1392K proc_inode_cache
22080 9420 42% 0,06K 345 64 1380K kmalloc-64
10368 6241 60% 0,12K 324 32 1296K kmalloc-128
1680 1269 75% 0,66K 70 24 1120K shmem_inode_cache
462 354 76% 2,17K 33 14 1056K task_struct
16640 14799 88% 0,06K 260 64 1040K btrfs_free_space
448 391 87% 2,00K 28 16 896K kmalloc-2048
This time, around 1 GB of memory is doing something weird.
Offline
Did you try disabling the semantic desktop daemons?
Offline
Hmm, quite odd. You should probably ask in the kernel mailing list. Someone will probably point you in the right direction (or at least help you find the culprit).
Offline
Did you try disabling the semantic desktop daemons?
Nepomuk was running until I closed almost everything (dbus, sshd, cupsd, ntpd, my X session, even systemd-journald). After that, with only a couple of programs running in userspace, I've found that.
You should probably ask in the kernel mailing list.
Yup, it's time to ask there. My problem is less evident than that of zanny: he runs out of memory much faster than me. Maybe it is something related to using btrfs with plasma. Or maybe it is something hardware-related, if very few people is seeing it.
Offline
More news about this. I was also using eCryptfs for some directories and files (~/.local, ~/.config, ~/Maildir, ...). I've ceased to use eCryptfs, and the whole problem has vanished. Maybe it is a problem related to btrfs + eCryptfs + some part of my userland programs...
Last edited by luismiguelgcg (2013-12-18 10:36:02)
Offline
Could it be this?
https://wiki.archlinux.org/index.php/FA … _my_RAM.3F
Offline
Could it be this?
https://wiki.archlinux.org/index.php/FA … _my_RAM.3F
No, it wasn't that. I know that many newcomers are confused with GNU/Linux and its memory management, but I am quite veteran with these systems!
I was talking about a lot of physical memory was used by something that wasn't neither cache, buffers, kernel slab, resident memory of userland programs... That was the problem: it was not listed as used by a program, by the kernel, as cache, hugepages... But it was used by something and generating OOMs! I thought that it could be a kernel memory leak, but I couldn't find anything interesting with kmemleak.
Last edited by luismiguelgcg (2013-12-18 13:06:19)
Offline
Pages: 1