You are not logged in.
After doing a lot of compiling I frequently find myself in the following scenario: exactly half of my RAM is being used up by apparently nothing.
I know this question has been asked to death so many times that entire websites exist to dismiss it. But after going through all the usual explanations, none of them seem to apply here.
So here's an example of my situation: out of 15 GiB RAM (no swap), 7 GiB is "Used", according to free -m:
total used free shared buff/cache available
Mem: 15334 7000 7464 79 868 7978
Swap: 0 0 0
This 7 GiB is genuinely non-reclaimable. If I run a compile job that requires more than the remaining ~7 GiB of RAM, it gets killed with OOM (or the system locks up). So the memory is definitely being used.
The memory is not being used by any caches. If the above didn't convince you of this already, running echo 3 > /proc/sys/vm/drop_caches has no effect. In fact, the table above is after running that command.
It's not being used by any process, according to top and friends. Indeed, if I log out and in again, it's still there - the table basically doesn't change.
My /tmp directory is clean, and no other tmpfs is taking up a significant amount of space either, as can be checked with df -h.
It doesn't appear to be being used by the kernel either, as all the caches in slabtop are reporting usage amounts orders of magnitude less than 7 GiB.
However, there is one way to reclaim this memory without rebooting. If I relogin, and then run echo 3 > /proc/sys/vm/drop_caches, then the 7 GiB of Used memory drops completely to zero. But nothing else works:
Drop caches --> no change.
Relogin --> no change.
Drop caches + Relogin --> no change.
Relogin + Drop caches --> DROPS THE MEMORY.
What kind of memory usage could possibly lead to this behaviour? I can't reconcile this with anything:
Because the memory couldn't be reclaimed when under pressure, and a relogin with was required before it could be dropped, this suggests that a user process was responsible for holding on the memory.
But no process in System Monitor, top etc was using anything close to the required amount of memory!
After a relogin, the memory should have been released by the user process. So why did it remain as "Used", when surely it should have become cached, and therefore gone into "buff/cache" instead?
What tools can I use to investigate this problem when it comes up again, and what possible explanations are there?
Offline
What does df -h give? Because on my system roughly half of my RAM is available to tmpfs, if I understand the output correctly. (A bit less than 4G out of a bit under 8G total.)
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
cat /proc/meminfo
And see whether "echo 1 > /proc/sys/vm/drop_caches" (page cache) or "echo 2 > /proc/sys/vm/drop_caches" (slab) frees it.
Offline
I'll run those commands as soon as it happens again.
In the meantime, I wonder if the cause is that some process has created and opened a large file in /tmp, then deleted it while holding the file descriptor open? That sounds like it could explain the observations - but I need to test.
Offline
FWIW, a large file placed in tmpfs will reduce the "free" space reported by "free" by the filesize, but it will have only a very small increment on the "used" column. I don't know why this is the case, but it seems it is - which, given your symptoms, argues against the open file descriptor hypothesis (which otherwise sounds quite plausible).
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
Unlinked opened files will still show up in /proc/$PID/fd
cd /proc
for PID in [0-9]*; do if sudo ls -l $PID/fd | grep '/tmp'; then echo $PID; fi; done
Chromium does that for sure.
Offline
Happened again, though I didn't have time to wait for it to get really bad, so this time it was only a persistent 3.3 GiB of "Used" memory in free -h, which dropped to 1.7 GiB after a relogin + echo 2 > /proc/sys/vm/drop_caches, and no further after echo 1 > /proc/sys/vm/drop_caches. So the slab cache it is, which I thought I'd ruled out.
I grabbed the output of some commands just before the first drop_caches:
free -h
total used free shared buff/cache available
Mem: 14Gi 3.3Gi 9.8Gi 24Mi 1.8Gi 11Gi
Swap: 0B 0B 0B
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 7.5G 0 7.5G 0% /dev/shm
tmpfs 3.0G 9.6M 3.0G 1% /run
/dev/mapper/cryptmain 239G 87G 151G 37% /
tmpfs 7.5G 0 7.5G 0% /tmp
/dev/mapper/cryptmain 239G 87G 151G 37% /swap
/dev/mapper/cryptmain 239G 87G 151G 37% /home
/dev/nvme0n1p1 256M 132M 125M 52% /boot/efi
tmpfs 1.5G 84K 1.5G 1% /run/user/1000
cat /proc/meminfo
MemTotal: 15702696 kB
MemFree: 10308948 kB
MemAvailable: 11845908 kB
Buffers: 16 kB
Cached: 1738356 kB
SwapCached: 0 kB
Active: 2032404 kB
Inactive: 527916 kB
Active(anon): 750280 kB
Inactive(anon): 96884 kB
Active(file): 1282124 kB
Inactive(file): 431032 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 27184 kB
Writeback: 0 kB
AnonPages: 806120 kB
Mapped: 511420 kB
Shmem: 25212 kB
KReclaimable: 147484 kB
Slab: 312460 kB
SReclaimable: 147484 kB
SUnreclaim: 164976 kB
KernelStack: 9056 kB
PageTables: 16580 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 7851348 kB
Committed_AS: 3276608 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 46148 kB
VmallocChunk: 0 kB
Percpu: 20672 kB
HardwareCorrupted: 0 kB
AnonHugePages: 198656 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 11625960 kB
DirectMap2M: 4464640 kB
DirectMap1G: 1048576 kB
Afterwards, free -h reports
total used free shared buff/cache available
Mem: 14Gi 1.7Gi 11Gi 58Mi 2.0Gi 12Gi
Swap: 0B 0B 0B
So what could cause 1.6 GiB of slab objects to hang around in the kernel, but be killed by a logout?
Offline
/dev/mapper/cryptmain 239G 87G 151G 37% /
/dev/mapper/cryptmain 239G 87G 151G 37% /swap
/dev/mapper/cryptmain 239G 87G 151G 37% /home
You close any of those when you log out?
Offline
FWIW, a large file placed in tmpfs will reduce the "free" space reported by "free" by the filesize, but it will have only a very small increment on the "used" column. I don't know why this is the case, but it seems it is - which, given your symptoms, argues against the open file descriptor hypothesis (which otherwise sounds quite plausible).
Thanks. (This makes me curious about what uses memory on my system.)
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
/dev/mapper/cryptmain 239G 87G 151G 37% / /dev/mapper/cryptmain 239G 87G 151G 37% /swap /dev/mapper/cryptmain 239G 87G 151G 37% /home
You close any of those when you log out?
No, but I could try closing /home.
Offline
The idea was more that closing the encrypted volume was more relevant that the logout, but that's apparentl not the case.
Alternatively (but that's gonna be a chore) you could kill session processes one by one, drop the cache and see which process it makes an impact
Offline
Closing the encrypted volume might actually be impossible, since it's system root.
Alternatively (but that's gonna be a chore) you could kill session processes one by one, drop the cache and see which process it makes an impact
Yeah, I started doing that, but quickly got bored after killing all the obvious suspects and observing nothing to happen. I can always fall back on it as a last resort, though I was hoping for a way to interrogate this information out of the kernel.
Last edited by PBS (2022-11-09 08:17:17)
Offline
lsmod | sort -k 2 -rn
?
Offline