You are not logged in.

#1 2013-11-13 09:01:27

luismiguelgcg
Member
Registered: 2013-02-07
Posts: 11

Possible memory hole

Hello.

After some days of uptime, I've found an unusual high level of memory usage. After closing almost every program and deactivating swap, this is what I've got:

# free -k

             total       used       free     shared    buffers     cached
Mem:       4017412    1886404    2131008          0         12      48108
-/+ buffers/cache:    1838284    2179128
Swap:            0          0          0


# slabtop

 Active / Total Objects (% used)    : 127074 / 150991 (84,2%)
 Active / Total Slabs (% used)      : 4188 / 4188 (100,0%)
 Active / Total Caches (% used)     : 73 / 97 (75,3%)
 Active / Total Size (% used)       : 27323,81K / 34807,09K (78,5%)
 Minimum / Average / Maximum Object : 0,01K / 0,23K / 15,69K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
 17460  17460 100%    0,11K    485       36      1940K sysfs_dir_cache        
 13839   8814  63%    0,19K    659       21      2636K dentry                 
  9024   6863  76%    0,06K    141       64       564K kmalloc-64             
  8330   7771  93%    0,05K     98       85       392K shared_policy_node     
  8092   7085  87%    0,57K    289       28      4624K inode_cache            
  7552   6801  90%    0,03K     59      128       236K kmalloc-32             
  6656   6656 100%    0,02K     26      256       104K kmalloc-16             
  5856   3067  52%    0,12K    183       32       732K kmalloc-128            
  5696   5398  94%    0,06K     89       64       356K anon_vma               
  5174   2188  42%    0,15K    199       26       796K btrfs_extent_map       
  4608   4608 100%    0,01K      9      512        36K kmalloc-8              
  4386   4386 100%    0,04K     43      102       172K btrfs_delayed_extent_op
  4160   3843  92%    0,06K     65       64       260K btrfs_free_space       
  3822   2807  73%    0,09K     91       42       364K kmalloc-96             
  3390   1437  42%    1,04K    113       30      3616K btrfs_inode            
  3312   3312 100%    0,09K     72       46       288K btrfs_delayed_tree_ref 


# cat /proc/meminfo

MemTotal:        4017412 kB
MemFree:         2130676 kB
Buffers:              12 kB
Cached:            48140 kB
SwapCached:            0 kB
Active:            23764 kB
Inactive:          31920 kB
Active(anon):       1620 kB
Inactive(anon):     6604 kB
Active(file):      22144 kB
Inactive(file):    25316 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                36 kB
Writeback:             0 kB
AnonPages:          7484 kB
Mapped:             1760 kB
Shmem:               692 kB
Slab:              35256 kB
SReclaimable:      17768 kB
SUnreclaim:        17488 kB
KernelStack:         688 kB
PageTables:          544 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2008704 kB
Committed_AS:      24076 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      564516 kB
VmallocChunk:   34359159395 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       11904 kB
DirectMap2M:     4147200 kB


# ps aux -A

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0 113616  1164 ?        Ss   nov11   0:03 /sbin/init
root         2  0.0  0.0      0     0 ?        S    nov11   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    nov11   1:58 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   nov11   0:00 [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S    nov11   0:00 [migration/0]
root         8  0.0  0.0      0     0 ?        S    nov11   0:16 [rcu_preempt]
root         9  0.0  0.0      0     0 ?        S    nov11   0:00 [rcu_bh]
root        10  0.0  0.0      0     0 ?        S    nov11   0:00 [rcu_sched]
root        11  0.0  0.0      0     0 ?        S    nov11   0:00 [watchdog/0]
root        12  0.0  0.0      0     0 ?        S    nov11   0:00 [watchdog/1]
root        13  0.0  0.0      0     0 ?        S    nov11   0:00 [migration/1]
root        14  0.0  0.0      0     0 ?        S    nov11   1:53 [ksoftirqd/1]
root        16  0.0  0.0      0     0 ?        S<   nov11   0:00 [kworker/1:0H]
root        17  0.0  0.0      0     0 ?        S    nov11   0:00 [watchdog/2]
root        18  0.0  0.0      0     0 ?        S    nov11   0:00 [migration/2]
root        19  0.0  0.0      0     0 ?        S    nov11   1:44 [ksoftirqd/2]
root        21  0.0  0.0      0     0 ?        S<   nov11   0:00 [kworker/2:0H]
root        22  0.0  0.0      0     0 ?        S    nov11   0:00 [watchdog/3]
root        23  0.0  0.0      0     0 ?        S    nov11   0:00 [migration/3]
root        24  0.0  0.0      0     0 ?        S    nov11   1:42 [ksoftirqd/3]
root        26  0.0  0.0      0     0 ?        S<   nov11   0:00 [kworker/3:0H]
root        27  0.0  0.0      0     0 ?        S<   nov11   0:00 [khelper]
root        28  0.0  0.0      0     0 ?        S    nov11   0:00 [kdevtmpfs]
root        29  0.0  0.0      0     0 ?        S<   nov11   0:00 [netns]
root        30  0.0  0.0      0     0 ?        S<   nov11   0:00 [writeback]
root        31  0.0  0.0      0     0 ?        S<   nov11   0:00 [bioset]
root        32  0.0  0.0      0     0 ?        S<   nov11   0:00 [kblockd]
root        37  0.0  0.0      0     0 ?        S    nov11   0:00 [khungtaskd]
root        38  0.0  0.0      0     0 ?        S    nov11   1:47 [kswapd0]
root        39  0.0  0.0      0     0 ?        SN   nov11   0:00 [ksmd]
root        40  0.0  0.0      0     0 ?        SN   nov11   0:01 [khugepaged]
root        41  0.0  0.0      0     0 ?        S    nov11   0:00 [fsnotify_mark]
root        42  0.0  0.0      0     0 ?        S<   nov11   0:00 [crypto]
root        46  0.0  0.0      0     0 ?        S<   nov11   0:00 [kthrotld]
root        47  0.0  0.0      0     0 ?        S<   nov11   0:00 [deferwq]
root        71  0.0  0.0      0     0 ?        S    nov11   0:00 [khubd]
root        81  0.0  0.0      0     0 ?        S<   nov11   0:00 [ata_sff]
root        82  0.0  0.0      0     0 ?        S    nov11   0:00 [scsi_eh_0]
root        83  0.0  0.0      0     0 ?        S    nov11   0:00 [scsi_eh_1]
root        86  0.0  0.0      0     0 ?        S    nov11   0:00 [scsi_eh_2]
root        87  0.0  0.0      0     0 ?        S    nov11   0:00 [scsi_eh_3]
root        95  0.0  0.0      0     0 ?        S<   nov11   0:02 [kworker/1:1H]
root        96  0.0  0.0      0     0 ?        S<   nov11   0:02 [kworker/0:1H]
root       105  0.0  0.0      0     0 ?        S<   nov11   0:00 [bioset]
root       113  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-genwork-1]
root       114  0.0  0.0      0     0 ?        S    nov11   0:04 [btrfs-submit-1]
root       116  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-fixup-1]
root       119  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-rmw-1]
root       120  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-endio-rai]
root       121  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-endio-met]
root       123  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-freespace]
root       125  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-cache-1]
root       126  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-readahead]
root       127  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-flush_del]
root       128  0.0  0.0      0     0 ?        S    nov11   0:00 [btrfs-qgroup-re]
root       132  0.0  0.0      0     0 ?        S<   nov11   0:02 [kworker/3:1H]
root       133  0.0  0.0      0     0 ?        S<   nov11   0:02 [kworker/2:1H]
root       134  0.0  0.0      0     0 ?        S    nov11   0:59 [btrfs-cleaner]
root       135  0.0  0.0      0     0 ?        S    nov11   1:46 [btrfs-transacti]
root       251  0.0  0.0      0     0 ?        S<   nov11   0:00 [kpsmoused]
root       253  0.0  0.0      0     0 ?        S<   nov11   0:00 [hd-audio0]
root       257  0.0  0.0      0     0 ?        S<   nov11   0:00 [kvm-irqfd-clean]
root       273  0.0  0.0  86328   776 ?        Ss   nov11   0:00 login -- root     
root      1100  0.0  0.0      0     0 ?        S    nov11   0:00 [ecryptfs-kthrea]
root      1910  0.0  0.0      0     0 ?        S    09:13   0:00 [kworker/2:1]
root      3220  0.0  0.0      0     0 ?        S    09:14   0:00 [btrfs-endio-2]
root      3321  0.0  0.0      0     0 ?        S    09:16   0:00 [kworker/u8:0]
root      3750  0.0  0.0      0     0 ?        S    09:23   0:00 [btrfs-endio-met]
root      4051  0.0  0.0      0     0 ?        S    09:27   0:00 [btrfs-endio-met]
root      4053  0.0  0.0  12876  1356 tty1     R+   09:27   0:00 ps aux -A
root     10403  0.1  0.0      0     0 ?        S    nov12   2:36 [kworker/0:2]
root     10983  0.0  0.0      0     0 ?        S    nov12   0:00 [btrfs-delayed-m]
root     12547  0.1  0.0      0     0 ?        S    nov12   2:00 [kworker/3:2]
root     25179  0.0  0.0      0     0 ?        S    08:11   0:00 [btrfs-worker-2]
root     25247  0.1  0.0      0     0 ?        S    08:48   0:03 [kworker/1:2]
root     25257  0.0  0.0      0     0 ?        S    08:59   0:00 [kworker/1:0]
root     25262  0.0  0.0      0     0 ?        S    09:03   0:00 [btrfs-endio-wri]
root     25418  0.0  0.0      0     0 ?        S    09:05   0:00 [btrfs-delalloc-]
root     25420  0.0  0.0      0     0 ?        S    09:05   0:00 [kworker/u8:2]
root     25449  0.0  0.0      0     0 ?        S    09:07   0:00 [kworker/2:0]
root     25539  0.0  0.0      0     0 ?        S    09:09   0:00 [kworker/1:1]
root     25619  0.2  0.1  48344  6800 tty1     Ss   09:10   0:02 -zsh
root     27130  0.0  0.0      0     0 ?        S    09:10   0:00 [kworker/0:1]
root     27334  0.0  0.0      0     0 ?        S    09:11   0:00 [kworker/3:1]

As you can see, the only things that are running are zsh (almost 7 MB RSS!), login, init and, of course, the kernel, what amounts for less than 45 MB (init + zsh + slab, without caches and buffers), while free is saying I'm using 1.8 GB .   And yes, I can get full memory problems if I write 2 GB of zeroes inside a tmpfs like /dev/shm and I open 200 MB worth of programs, so it is not really reclaimable memory.  I think the problem is due to the kernel, but I can be mistaken. I'm running linux version 3.11.6-1 and systemd version 208-2 (not sure, but as it is the only thing with a two-day uptime, it could be part of the problem). Also, I'm using btrfs for the system partition and ecryptfs for a dir inside my home (it is a computer I'm using at work, I'm not going to left some things unencrypted for someone to get them at night while I'm home!  Nevertheless, it is unmounted during this test).

I'm going to try with the long-term stable kernel version 3.10.18-1 to see if it is reproducible with that version.

Last edited by luismiguelgcg (2013-11-13 09:12:23)

Offline

#2 2013-11-15 12:16:41

luismiguelgcg
Member
Registered: 2013-02-07
Posts: 11

Re: Possible memory hole

The same results are obtained after two days of linux 3.10.18-1 usage.  Let's see what happens with 3.12.0-1.

Offline

#3 2013-11-25 15:10:46

luismiguelgcg
Member
Registered: 2013-02-07
Posts: 11

Re: Possible memory hole

And the same happens with 3.12.0-1 and 3.12.1-1, x86_64.  I've tried memtest86+, but it says my RAM modules are error-free.  Has anybody a clue of what could be happening here?

Offline

#4 2013-11-25 18:01:03

Spider.007
Member
Registered: 2004-06-20
Posts: 1,175

Re: Possible memory hole

You might try dropping all caches; but that'll probably not work. This interests me though; but I cannot really help you any further

echo 3 > /proc/sys/vm/drop_caches

Offline

#5 2013-11-26 03:48:23

zanny
Member
From: Baltimore
Registered: 2012-10-05
Posts: 84
Website

Re: Possible memory hole

I have this bug - droping caches doesn't do anything, and it is being reported as non-cached memory usage. It just doesn't show up under any userspace process in top.

Switched kernels as well, no dice. The system after a cold boot to a plasma desktop reports 1.2G memory usage out of 16G, but my last run ran OOM in 3 hours with 16G of ram and a 16G swap.

The commonality between our experiences appears to be having btrfs disks, though that wouldn't explain why the issue propagates between kernels.

Just between when I got on the forum to post about this, and now (around 15 mins later) my memory consumption is up to 2.7G with the exact same reported usage in top.

I looked around today, found out stock Arch kernels don't support kmemleak. I'm looking for alternative ways to profile what is leaking.

Also found out both veromix and dbus are leaking too depending on circumstance while trying to isolate this bug. Dbus at 1.37G and the python engine running veromix at 1.1, mainly caused by sink propagation making a lot of trash in veromix. Dunno what is causing the dbus leak at all, dbus-monitor isn't producing anything tangible. This is great.

Last edited by zanny (2013-11-26 14:44:30)

Offline

#6 2013-11-26 16:00:07

luismiguelgcg
Member
Registered: 2013-02-07
Posts: 11

Re: Possible memory hole

We have another thing in common, as we both are using plasma, but as far as I know plasma can't hop out of its userspace garden and start munching memory as a kernel kid. So yes, this should be something related to btrfs.

I'm going to compile linux-3.12.1-1 with kmemleak support. I hope I'll find what's happening here.

I'm not using veromix, but dbus is not leaking in my case. Could it be veromix doing something weird with pulseaudio, making it clog its dbus pipe?

Offline

#7 2013-11-26 23:13:54

zanny
Member
From: Baltimore
Registered: 2012-10-05
Posts: 84
Website

Re: Possible memory hole

The veromix / dbus leaks I mention are tangential to the kernel leak. Same thing with plasma, I tried booting with kdm disabled to just a raw tty and I still found the memory consumption going through the roof.

Veromix / dbus are just naturally leaking independently of the kernel leak, I just noticed it. I'm running with them under valgrind right now to see if whatever it is shows up again.

I'll hold off trying to build a manual kernel to try to debug this until I hear back if you get any results.

Some more unique circumstances I have that might cause a leak few others are experiencing:

  • I use two monitors on an hd4600

  • My chipset board has an abnormal number of usb hubs and ports (a total of 14 USB3 and 5 USB2)

Besides those though its a normal work computer, I use qt creator and libre office a lot but again those are all userspace and wouldn't be calling any sparsely traversed kernel instruction paths that could generate big leaks.  I'd imagine the leaking happens irrespective of the software I run, though I could try disabling some of the myriad of daemons I have running.

Update: Since my system is effectively unusable with it running out of memory every hour or so, I've been toggling on / off daemons and features.

I disabled all the following services:

sshd
smbd
nmbd
metalog
mcelog
cups
proftpd
murmur
avahi
kdm

WIth them all off, I mucked around on a tty for a while without a desktop and had a sustained memory footprint of 200M with polkit being the largest consumer at 20M. I don't think anything was leaking, but I didn't try summing all userspace memory usage to get an idea.

With kdm enabled but my desktop reset (trashed the plasmarc files in config) I'd get around 800M of memory footprint at boot, it ramps up a bit with akonadi running but caps out around 1200M. Mucked around in firefox (600M) for a while but never broke 2G usage.

However, I started reenabling plasmoids and my autostart apps, and right now I'm booting *only* a moderately rebuilt plasma desktop and skype, and I'm getting leaks again. I'm just rebooting quickly again after clearing all my kde caches.

Update 2: Disabled akonadi and nepomuk, so far so good. I'm also using kmix over veromix, due to those earlier bugs I mentioned. I'll try reenabling features tomorrow, but since all this leakage is outside userspace allocations it seems to be in either the kernel fs drivers that is being revealed while nepomuk scans, or in alsa when pulseaudio is acting funky with it.

Try turning off Akonadi by setting StartServer=false in ~/.config/akonadi/akonadiserverrc and shutting off nepomuk in the system settings desktop search and see if it stops leaking.

Last edited by zanny (2013-11-27 04:13:37)

Offline

#8 2013-11-28 09:11:16

Spider.007
Member
Registered: 2004-06-20
Posts: 1,175

Re: Possible memory hole

btrfs runs a lot of background services which also show up in the processlist; I wonder if it's possible to get memory usage statistics from those? I wonder if this is reproducable in a vm with btrfs and few thousand files on it

Offline

#9 2013-12-03 13:00:59

luismiguelgcg
Member
Registered: 2013-02-07
Posts: 11

Re: Possible memory hole

Ok, after three or four days running, I've got some results from kmemleak. Sadly, they are not interesting. 59 MB of unreferenced objects, all of them running in userspace, plus 48 bytes unreferenced by swapper/0, everything showing after killing all but systemd and a tty session with zsh. It seems that my kmemleak entries are false positives. After all this, I have the following /proc/meminfo contents:

MemTotal:        4017312 kB
MemFree:         2656652 kB
Buffers:              24 kB
Cached:            32572 kB
SwapCached:          360 kB
Active:            12328 kB
Inactive:          26612 kB
Active(anon):       5904 kB
Inactive(anon):    13620 kB
Active(file):       6424 kB
Inactive(file):    12992 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3941376 kB
SwapFree:        3938300 kB
Dirty:                60 kB
Writeback:             0 kB
AnonPages:          6048 kB
Mapped:             4180 kB
Shmem:             13180 kB
Slab:             328380 kB
SReclaimable:      89772 kB
SUnreclaim:       238608 kB
KernelStack:         768 kB
PageTables:          276 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5950032 kB
Committed_AS:      42172 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      567640 kB
VmallocChunk:   34359146568 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       13952 kB
DirectMap2M:     4145152 kB

The kernel slab is quite large due to kmemleak:

 Active / Total Objects (% used)    : 463523 / 1073631 (43,2%)
 Active / Total Slabs (% used)      : 39610 / 39610 (100,0%)
 Active / Total Caches (% used)     : 76 / 100 (76,0%)
 Active / Total Size (% used)       : 132585,62K / 318069,88K (41,7%)
 Minimum / Average / Maximum Object : 0,01K / 0,30K / 15,75K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
708162 204346  28%    0,30K  27237       26    217896K kmemleak_object        
 36063  23976  66%    1,04K   1401       30     44832K btrfs_inode            
 90678  72450  79%    0,19K   4318       21     17272K dentry                 
 20636   2746  13%    0,55K    737       28     11792K radix_tree_node        
 10087   9965  98%    0,57K    361       28      5776K inode_cache            
 18044   7703  42%    0,30K    694       26      5552K btrfs_delayed_node     
 32500  18405  56%    0,15K   1250       26      5000K btrfs_extent_map       
   536    476  88%    4,00K     67        8      2144K kmalloc-4096           
 17892  17892 100%    0,11K    497       36      1988K sysfs_dir_cache        
  2175   1234  56%    0,63K     87       25      1392K proc_inode_cache       
 22080   9420  42%    0,06K    345       64      1380K kmalloc-64             
 10368   6241  60%    0,12K    324       32      1296K kmalloc-128            
  1680   1269  75%    0,66K     70       24      1120K shmem_inode_cache      
   462    354  76%    2,17K     33       14      1056K task_struct            
 16640  14799  88%    0,06K    260       64      1040K btrfs_free_space       
   448    391  87%    2,00K     28       16       896K kmalloc-2048        

This time, around 1 GB of memory is doing something weird.

Offline

#10 2013-12-03 20:59:10

zanny
Member
From: Baltimore
Registered: 2012-10-05
Posts: 84
Website

Re: Possible memory hole

Did you try disabling the semantic desktop daemons?

Offline

#11 2013-12-04 03:05:38

Pse
Member
Registered: 2008-03-15
Posts: 413

Re: Possible memory hole

Hmm, quite odd. You should probably ask in the kernel mailing list. Someone will probably point you in the right direction (or at least help you find the culprit).

Offline

#12 2013-12-04 08:19:35

luismiguelgcg
Member
Registered: 2013-02-07
Posts: 11

Re: Possible memory hole

zanny wrote:

Did you try disabling the semantic desktop daemons?

Nepomuk was running until I closed almost everything (dbus, sshd, cupsd, ntpd, my X session, even systemd-journald). After that, with only a couple of programs running in userspace, I've found that.

Pse wrote:

You should probably ask in the kernel mailing list.

Yup, it's time to ask there.  My problem is less evident than that of zanny: he runs out of memory much faster than me.  Maybe it is something related to using btrfs with plasma.  Or maybe it is something hardware-related, if very few people is seeing it.

Offline

#13 2013-12-18 10:26:42

luismiguelgcg
Member
Registered: 2013-02-07
Posts: 11

Re: Possible memory hole

More news about this. I was also using eCryptfs for some directories and files (~/.local, ~/.config, ~/Maildir, ...).  I've ceased to use eCryptfs, and the whole problem has vanished.  Maybe it is a problem related to btrfs + eCryptfs + some part of my userland programs...

Last edited by luismiguelgcg (2013-12-18 10:36:02)

Offline

#14 2013-12-18 12:47:46

jakobcreutzfeldt
Member
Registered: 2011-05-12
Posts: 1,041

Re: Possible memory hole

Offline

#15 2013-12-18 13:02:45

luismiguelgcg
Member
Registered: 2013-02-07
Posts: 11

Re: Possible memory hole

jakobcreutzfeldt wrote:

No, it wasn't that. I know that many newcomers are confused with GNU/Linux and its memory management, but I am quite veteran with these systems!
I was talking about a lot of physical memory was used by something that wasn't neither cache, buffers, kernel slab, resident memory of userland programs... That was the problem: it was not listed as used by a program, by the kernel, as cache, hugepages... But it was used by something and generating OOMs! I thought that it could be a kernel memory leak, but I couldn't find anything interesting with kmemleak.

Last edited by luismiguelgcg (2013-12-18 13:06:19)

Offline

Board footer

Powered by FluxBB