[BUG FILED] Available memory gradually decreases

Salkay · 2021-06-10 08:27:52

According to free, my "available" memory gradually disappears over the course of a few days, eventually resulting in OOM killer. There is very little memory "used". My understanding is that "buffers" and "cache" should be made available when necessary, but this doesn't seem to occur. Even after quitting all applications, the "available" memory is quite low.

$ free -wm
               total        used        free      shared     buffers       cache   available
Mem:           31989        6148        1596       16693        1213       23031        8606
Swap:              0           0           0

Here, free + buffers + cache = 25840, much more than the available memory of 8606.

I attempted to fill the cache by writing/reading a large file, to see if somehow it was stuck in general.

$ free -wm
               total        used        free      shared     buffers       cache   available
Mem:           31989        6346         204       16938         546       24891        8249
Swap:              0           0           0

"cache" increased by almost 2G, but there was not much difference in "available" (as expected).

I then attempted to free up the caches.

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ free -wm
               total        used        free      shared     buffers       cache   available
Mem:           31989        6111        7853       16867           2       18021        8250
Swap:              0           0           0
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
$ free -wm
               total        used        free      shared     buffers       cache   available
Mem:           31989        6134        7864       16946          20       17970        8157
Swap:              0           0           0

As expected, this also had no result on "available". I expected "cache" to be almost zero, but it was still ~18G. However, now "free" was pretty similar to "available", so I wonder if there is some large amount of memory in "cache" that is inaccessible, and therefore not contributing to "available", nor able to be dropped.

Last edited by Salkay (2021-08-03 22:48:58)

Ropid · 2021-06-10 09:01:47

Maybe it's files inside a tmpfs filesystem? Check what's going on there with "df". This here filters out just the tmpfs entries:

df -h -t tmpfs

I just tried creating a file here in /tmp with "fallocate -l 10G testfile" and those 10G show up in the "cache" column in free's output.

Last edited by Ropid (2021-06-10 09:04:38)

Salkay · 2021-06-10 09:20:49

Thanks @Ropid. Good idea. It looks like there is ~0.75G there, which goes some way to explaining the shortfall, but it looks like there's still another ~16.5 G missing.

$ df -h -t tmpfs
Filesystem      Size  Used Avail Use% Mounted on
run              16G  1.7M   16G   1% /run
tmpfs            16G   45M   16G   1% /dev/shm
tmpfs            16G  699M   15G   5% /tmp
tmpfs           3.2G  176K  3.2G   1% /run/user/1000

sabroad · 2021-06-10 10:58:40

Salkay wrote:

looks like there's still another ~16.5 G missing.

Looks about the same as "shared" memory:

$ free -wm
               total        used        free      shared     buffers       cache   available
Mem:           31989        6148        1596       16693        1213       23031        8606
Swap:              0           0           0

What does

# cat /proc/meminfo

have to say about this?

sabroad · 2021-06-10 11:03:18

Salkay wrote:

looks like there's still another ~16.5 G missing.

$ df -h -t tmpfs
Filesystem      Size  Used Avail Use% Mounted on
run              16G  1.7M   16G   1% /run
tmpfs            16G   45M   16G   1% /dev/shm
tmpfs            16G  699M   15G   5% /tmp
tmpfs           3.2G  176K  3.2G   1% /run/user/1000

Also, there are other filesystems backed by tmpfs consuming shmem such as udev.

seth · 2021-06-10 12:53:47

man free wrote:

shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)

In addition, if there are files mapped by processes that have changed (or removed) on disk, they cannot be dropped.
I'm unsure that neither are file caches that are simply active.

Salkay · 2021-06-11 02:34:37

sabroad wrote:

Looks about the same as "shared" memory:
What does
# cat /proc/meminfo
have to say about this?
Also, there are other filesystems backed by tmpfs consuming shmem such as udev.

Thanks sabroad. The sizes have changed a bit overnight, so here are the new numbers. I'm not exactly sure what to look at here, but by my calculation, the shortfall is approximately free + buffers + cache - available, i.e. 440+3480+21457-6127 = 19250. Shmem indeed looks close at 18869 MB (18868708 kB). What does this mean, and how can I fix it?

$ free -wm
               total        used        free      shared     buffers       cache   available
Mem:           31989        6610         440       18406        3480       21457        6127
Swap:              0           0           0
$ cat /proc/meminfo
MemTotal:       32757108 kB
MemFree:          402584 kB
MemAvailable:    6168884 kB
Buffers:         3589912 kB
Cached:         20130860 kB
SwapCached:            0 kB
Active:          4171764 kB
Inactive:        7950616 kB
Active(anon):      14196 kB
Inactive(anon):  7257676 kB
Active(file):    4157568 kB
Inactive(file):   692940 kB
Unevictable:    17943260 kB
Mlocked:            2008 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             46852 kB
Writeback:            56 kB
AnonPages:       6344956 kB
Mapped:           528088 kB
Shmem:          18868708 kB
KReclaimable:    1779544 kB
Slab:            2009948 kB
SReclaimable:    1779544 kB
SUnreclaim:       230404 kB
KernelStack:       29232 kB
PageTables:        85416 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    16378552 kB
Committed_AS:   38599792 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       50136 kB
VmallocChunk:          0 kB
Percpu:             8544 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1650280 kB
DirectMap2M:    31780864 kB
DirectMap1G:           0 kB

As per the link, I also looked at /dev, but df didn't report its size, nor did du.

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
dev              16G     0   16G   0% /dev
run              16G  1.7M   16G   1% /run
/dev/nvme0n1p2   49G   38G  8.2G  83% /
tmpfs            16G   70M   16G   1% /dev/shm
tmpfs            16G  699M   15G   5% /tmp
/dev/nvme0n1p3  185G  142G   34G  81% /home
/dev/nvme0n1p1  356M  202M  155M  57% /boot
/dev/sdb1       1.8T  1.8T   60G  97% /externalHDD
/dev/sda6       1.7T  1.2T  443G  73% /HDD
tmpfs           3.2G  204K  3.2G   1% /run/user/1000
encfs           185G  142G   34G  81% /home/salkay/.decrypt
$ sudo du -hs /dev
0	/dev

seth wrote:

man free wrote:
shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)
In addition, if there are files mapped by processes that have changed (or removed) on disk, they cannot be dropped.
I'm unsure that neither are file caches that are simply active.

Thanks seth. Is there any way I can test this?

Ropid · 2021-06-11 05:13:03

For that "shmem" entry from /proc/meminfo, I tried finding out how to research what's going on there. I found a tool "ipcs" and a location "/dev/shm".

You can do "ipcs -m --human" to get an overview of those shmem objects and their size. Then you can do "ipcs -p" to get the process IDs that are involved with those.

About those deleted files being kept open by programs, you can find those programs with "lsof +L1" and "lsof -dDEL". The two lsof commands find different stuff.

Last edited by Ropid (2021-06-11 05:19:35)

Salkay · 2021-06-11 05:44:47

Thanks Ropid. Unfortunately I was forced to restart because the oom killer was going crazy. Previously the shortfall in memory was over 19G, but even after a restart it's still reasonably large at 20824+3+4206-21825 = 3208 M. I also checked again, and /proc/meminfo reports Shmem at ~3G (3074312 kB), so it's still consistent at least.

Ropid wrote:

You can do "ipcs -m --human" to get an overview of those shmem objects and their size. Then you can do "ipcs -p" to get the process IDs that are involved with those.

Unfortunately this didn't seem to reveal any processes with a large memory footprint.

$ ipcs -m --human
------ Shared Memory Segments --------
key        shmid      owner      perms      size       nattch     status      
0x00000000 32771      salkay     600          128K     2          dest         
0x00000000 32772      salkay     600          128K     2          dest         
0x00000000 32773      salkay     600          1.2M     2          dest         
0x00000000 32774      salkay     600          1.2M     2          dest         
0x00000000 65547      salkay     600           48K     2          dest         
0x00000000 65548      salkay     600           48K     2          dest         
0x00000000 32787      salkay     600          512K     2          dest         
0x00000000 32788      salkay     600            4M     2          dest         
0x51210046 21         salkay     600            1K     1                       
0x00000000 65560      salkay     600           24K     2          dest         
0x00000000 65561      salkay     600           24K     2          dest         
0x00000000 65564      salkay     600          840K     2          dest         
0x00000000 29         salkay     600          384K     2          dest         
0x00000000 65566      salkay     600          840K     2          dest         
0x00000000 32         salkay     600          384K     2          dest         
0x00000000 33         salkay     600          512K     2          dest         
0x00000000 36         salkay     600          512K     2          dest         
0x00000000 32805      salkay     600          512K     2          dest

The summary didn't seem helpful either.

$ ipcs -u
------ Messages Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes

------ Shared Memory Status --------
segments allocated 18
pages allocated 2839
pages resident  1402
pages swapped   0
Swap performance: 0 attempts	 0 successes

------ Semaphore Status --------
used arrays = 1
allocated semaphores = 1

Ropid wrote:

About those deleted files being kept open by programs, you can find those programs with "lsof +L1"

Am I just looking for the rows with a large SIZE/OFF? The largest only had 67 M.

$ lsof +L1
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NLINK    NODE NAME
...
pulseaudi   874  sal    6u   REG    0,1 67108864     0    3100 /memfd:pulseaudio (deleted)
...

Ropid wrote:

and "lsof -dDEL".

Here the column SIZE/OFF was empty for each row, so I wasn't sure what to look for. I tried adding the size flag -s but the column remained empty.

seth · 2021-06-11 06:02:22

Active(anon):      14196 kB
Inactive(anon):  7257676 kB

There's about 7GB in anon pages (allocated by programs), mostly inactive

Active(file):    4157568 kB
Inactive(file):   692940 kB

~4.5 GB in file caches (mostly active)

Unevictable:    17943260 kB
…
Shmem:          18868708 kB

These here are the bad numbers

Mlocked:            2008 kB

Suggests it's NOT some userspace process mlock'ing them and then just forgetting to unlock them.
The more likely cause is a "leaking" kernel module (it's technically not a leak, just dumb)

=> Which kernel do you use and does this happen w/ the lts kernel as well?

Salkay · 2021-06-11 06:06:31

Thanks @seth. Very interesting.

I use the vanilla linux kernel, but I will restart and try the lts.

sabroad · 2021-06-11 16:06:24

Salkay wrote:

Unevictable:    17943260 kB

Does a KDE session restart (logout + login) free the Unevictable memory?

Salkay · 2021-06-12 00:51:53

Thanks @sabroad. That looks interesting too! These problems are actually on my work computer, so I'll report back in a few days after the weekend (long weekend here in Australia). BTW I haven't experienced similar levels of OOM killing on my home computer, but in both there is a similar shortfall in free + buffers + cache - available, with both systems currently ~2GB. Is that an "expected" amount? What is reasonable?

EDIT: Regarding the "expected" shortfall, I had a look on my (non-leaking) home computer. "shared" was ~2GB and df -BM | grep tmpfs showed /dev/shm with 1GB. ls -l did not, possibly suggesting deleted but open files. lsof /dev/shm revelaed these deleted files were open in Signal and Steam. Quitting both programs freed up this directory, and the shortfall was now "only" 1GB.

Last edited by Salkay (2021-06-12 01:21:11)

seth · 2021-06-12 06:19:35

The numbers you want look at here are mlock and unevictable - if they're similar (unevictable maybe slightly bigger) you're "good"
shmem/shared is typically your combined tmpfs usage (so /tmp and /run/user/* next to /dev/shm)

Wild Penguin · 2021-06-13 20:28:36

Hi Salkay,

The behavior you've described sounds very similar (in it's timespan) to what I encountered in a thread I've begun; however the problems just went away after a kernel upgrade in my case. I had the leak on both -zen and regular Kernel. You might want to take a look at kmemleak documentation and try recompiling the kernel with it enabled. Wish I had known the tips in this thread (about lsof and ipcs!).

Just a few questions: what GPU are you using? Do you have any potential heavily GPU-using (such as folding@home) or other heavy (server-like) software running in the background besides the GUI?

In my case, my working theory is that the leak resided in amdgpu, possibly exacerbated by running F@H. However that is just a working theory and the problems went away before I could allocate time to investigate this. But in any case my leak is probably unrelated, this is just my 2 cents in general tips!

EDIT: As another workaround in addition to using the -lts branch, you might like to take a look at earlyOOM or nohang. I prefer the latter, having tried them both. Nohang has saved my butt a few times while working with the leaky Kernel (after >20GiB has been consumed by "something", before processes started to be killed). The in-Kernel OOM is very "stupid", especially from a desktop-oriented users point of view.

Last edited by Wild Penguin (2021-06-14 14:52:59)

Salkay · 2021-06-15 01:26:04

Thanks again all.

seth wrote:

does this happen w/ the lts kernel as well?

Yes, it also occurs with the lts kernel.

sabroad wrote:

Does a KDE session restart (logout + login) free the Unevictable memory?

Yes! It does. I tracked Unevictable memory over a few days, and it went from 1,899,436 kB to 4,657,444 kB. I then quit all my programs and it went down, but it was still 3,178,788 kB (does that mean some was evicted??). I logged out then back into Plasma, and it was only 568,620 kB.

seth wrote:

The numbers you want look at here are mlock and unevictable - if they're similar (unevictable maybe slightly bigger) you're "good"

I forgot to check Mlocked when the system was really bad, but even early on, Unevictable and Mlocked were quite different.

$ grep -E '^(Unevictable|Mlocked|Shmem):' /proc/meminfo
Unevictable:     1637828 kB
Mlocked:            1824 kB
Shmem:           1793220 kB

Wild Penguin wrote:

You might want to take a look at kmemleak documentation and try recompiling the kernel with it enabled.

Thanks; I'll check it out.

Wild Penguin wrote:

Just a few questions: what GPU are you using? Do you have any potential heavily GPU-using (such as folding@home) or other heavy (server-like) software running in the background besides the GUI?

Just using Intel GPU. Nothing heavy going on. I actually do have an Nvidia card, but I don't use it because it only has one physical output.

Wild Penguin wrote:

As another workaround in addition to using the -lts branch, you might like to take a look at

Thank you. I already actually use earlyoom. When memory is low it does help, but it randomly crashes browser tabs, which gives me time to save my work and restart, but not much else.

seth · 2021-06-15 05:23:25

Do you run on the xf86-video-intel or the modesetting driver and does the other exhibit this behavior?

Wild Penguin · 2021-06-15 08:00:57

Salkay wrote:

Yes! It does. I tracked Unevictable memory over a few days, and it went from 1,899,436 kB to 4,657,444 kB. I then quit all my programs and it went down, but it was still 3,178,788 kB (does that mean some was evicted??). I logged out then back into Plasma, and it was only 568,620 kB.

This means that the leak is not in Kernel-space but user space. In my case, there was no way to free the memory which had leaked (stopping the whole X.org GUI stack, F@H, tvheadend and still up to 20 GiBs of RAM is used) - so my tips about kmemleak will probably not help you.

Also, looking at the numbers in your first post - I realised the "Used" column does not actually go up (oops sorry, I didn't look at them properly before ). In my case it did - just for some reason I never posted the output of "free" in that thread (mostly since I've thought it doesn't actually give useful information in leak situations - but this thread proves otherwise).

But I will not derail this anymore - I believe the leak I've experienced was of a different kind.

Wild Penguin · 2021-06-16 08:09:08

Salkay wrote:

Wild Penguin wrote:
As another workaround in addition to using the -lts branch, you might like to take a look at
Thank you. I already actually use earlyoom. When memory is low it does help, but it randomly crashes browser tabs, which gives me time to save my work and restart, but not much else.

The main and major benefit of nohang is that it has configurable warnings. This is why I chose to use it over earlyoom.

Indeed with earlyoom (or any oom killer) the problem is the user is not notified if he/she is absent-mindedly working on something. Chances are, if the user notices the work of the OOM, it has already destroyed some important data by killing the said application (or something which depends on it, such as the whole X.org server). The configuration options of earlyoom makes this chance lower, but does not eliminate it. With alerts, the user can take manual action before any processes gets killed!

It's configuration is way more complicated, though, but the documentation makes it quite clear and the default configuration works for many desktop users. IIRC I didn't change anything - only looked at the documentation to determine what it is actually doing. You don't have to :-) .

Last edited by Wild Penguin (2021-06-16 08:11:03)

Salkay · 2021-06-19 09:50:07

seth wrote:

Do you run on the xf86-video-intel or the modesetting driver and does the other exhibit this behavior?

Sorry for the delay, I've not been using this system for a while. I wanted to test with more usage before reporting back. I did use modesetting, but I've tried xf86-video-intel now, and I see the same problem. Not quite as bad (with less usage?), but still:

$ grep -E '^(Unevictable|Mlocked|Shmem):' /proc/meminfo
Unevictable:     2545848 kB
Mlocked:            1948 kB
Shmem:           2810720 kB

And indeed, as per the linked bug report, restarting kwin resulted in Unevictable dropping to 1314316 kB.

I'm going to try some older kernels and see how they go. I can see LTS 4.19 and 5.4 in the AUR, so I'll try them.

Wild Penguin wrote:

The main and major benefit of nohang is that it has configurable warnings.

Thanks for all that advice. Regarding warnings, I actually already have a script that runs every 5 minutes to warn me of this!

#!/usr/bin/env sh

threshold=20

avail_percent=$(free | grep '^Mem:' | awk '{printf "%.0f", ($7 / $2 * 100)}')
if [ $avail_percent -lt $threshold ]; then
  notify-send -i emblem-warning 'Swap warning' "${avail_percent}% memory available"
  printf '%s\n%s\n' 'Swap warning' "${avail_percent}% memory available"
fi

Salkay · 2021-06-23 02:00:33

I've tested some older kernels. 5.4.127 exhibits the same bug, but 4.19.195 is totally fine! I've been running it for a couple of days now, and it looks solid!

$ grep -E '^(Unevictable|Mlocked|Shmem):' /proc/meminfo
Unevictable:        2032 kB
Mlocked:            2032 kB
Shmem:           3558272 kB

And

$ free -wm
               total        used        free      shared     buffers       cache   available
Mem:           32047        7329         294        3482        6573       17848       20785
Swap:              0           0           0

i.e. free+buffers+cache-available = 3932

Arch Linux

#1 2021-06-10 08:27:52

[BUG FILED] Available memory gradually decreases

#2 2021-06-10 09:01:47

Re: [BUG FILED] Available memory gradually decreases

#3 2021-06-10 09:20:49

Re: [BUG FILED] Available memory gradually decreases

#4 2021-06-10 10:58:40

Re: [BUG FILED] Available memory gradually decreases

#5 2021-06-10 11:03:18

Re: [BUG FILED] Available memory gradually decreases

#6 2021-06-10 12:53:47

Re: [BUG FILED] Available memory gradually decreases

#7 2021-06-11 02:34:37

Re: [BUG FILED] Available memory gradually decreases

#8 2021-06-11 05:13:03

Re: [BUG FILED] Available memory gradually decreases

#9 2021-06-11 05:44:47

Re: [BUG FILED] Available memory gradually decreases

#10 2021-06-11 06:02:22

Re: [BUG FILED] Available memory gradually decreases

#11 2021-06-11 06:06:31

Re: [BUG FILED] Available memory gradually decreases

#12 2021-06-11 16:06:24

Re: [BUG FILED] Available memory gradually decreases

#13 2021-06-12 00:51:53

Re: [BUG FILED] Available memory gradually decreases

#14 2021-06-12 06:19:35

Re: [BUG FILED] Available memory gradually decreases

#15 2021-06-13 20:28:36

Re: [BUG FILED] Available memory gradually decreases

#16 2021-06-15 01:26:04

Re: [BUG FILED] Available memory gradually decreases

#17 2021-06-15 05:23:25

Re: [BUG FILED] Available memory gradually decreases

#18 2021-06-15 08:00:57

Re: [BUG FILED] Available memory gradually decreases

#19 2021-06-16 08:09:08

Re: [BUG FILED] Available memory gradually decreases

#20 2021-06-19 09:50:07

Re: [BUG FILED] Available memory gradually decreases

#21 2021-06-23 02:00:33

Re: [BUG FILED] Available memory gradually decreases

Board footer