You are not logged in.
Recently I noticed that an abnormal amount of RAM was being used on my Arch machine. This seems very abnormal because I have a machine with 64GB of RAM, and I'm consuming 61GB. The incident has happened multiple times for me after a couple of days. No RAM intensive applications were running. While trying to figure out where my RAM was going, I ran into the "linux ate my RAM" page. The buff/cache being consumed on the computer is significantly less that what's being used, so I don't think that's the reason for the high memory usage. Is this a kernel memory leak? Would I need to compile a new kernel to debug this?
This is my kernel version:
4.16.7-1-ARCH #1 SMP PREEMPT Wed May 2 21:12:36 UTC 2018 x86_64 GNU/Linux
Top below shows that I'm using 61GB/64GB. I've sorted on memory usage, and it can be seen that the total memory consumption of the applications I'm running gets nowhere near 95%. The buff/cache is 1GB, and the available memory is 2GB...
Output of Top:
top - 07:15:28 up 2 days, 12:00, 0 users, load average: 0.20, 0.61, 0.76
Tasks: 284 total, 1 running, 280 sleeping, 2 stopped, 1 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 64333.2 total, 1344.4 free, 61932.7 used, 1056.0 buff/cache
MiB Swap: 32676.0 total, 32676.0 free, 0.0 used. 2279.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
894 thor 20 0 1418376 853896 18648 S 0.0 1.3 79:20.52 xfdesktop
1095 thor 20 0 2418612 727480 89704 S 0.0 1.1 2:00.04 Web Content
666 root 20 0 1651552 715948 101448 S 2.0 1.1 55:43.37 Xorg
1029 thor 20 0 2697284 658716 172180 S 0.0 1.0 9:24.63 firefox
6328 thor 20 0 3320972 380496 131388 S 0.0 0.6 6:11.75 python
6404 thor 20 0 2009496 348972 141784 S 0.0 0.5 1:18.34 Web Content
2940 thor 20 0 1977672 307556 88520 S 0.0 0.5 1:32.44 Web Content
1377 thor 20 0 1994868 303436 89176 S 0.0 0.5 1:19.97 Web Content
909 thor 20 0 1185424 218460 42284 S 0.0 0.3 0:50.85 polkit-gnome-au
902 thor 20 0 1204800 217672 41316 S 0.0 0.3 0:51.64 xfce4-power-man
912 thor 20 0 1203876 217420 41996 S 0.0 0.3 0:53.84 xfce4-power-man
5153 thor 20 0 730696 175844 29264 S 2.0 0.3 0:58.82 xfce4-terminal
375 root 20 0 191876 72232 71488 S 0.0 0.1 0:06.05 systemd-journal
24633 thor 30 10 844336 57604 46944 T 0.0 0.1 0:04.34 starwars
5352 thor 30 10 844336 57564 46912 T 0.0 0.1 0:02.02 starwars
890 thor 20 0 368488 51232 23600 S 0.0 0.1 6:00.87 xfce4-panel
848 thor 20 0 225512 42684 18456 S 0.0 0.1 3:02.87 xfwm4
892 thor 20 0 330472 37464 13712 S 0.0 0.1 1:30.74 xfsettingsd
905 thor 20 0 212612 36016 16020 S 0.0 0.1 0:31.23 panel-2-actions
772 thor 20 0 364604 34920 14052 S 0.0 0.1 1:36.50 xfce4-session
900 thor 20 0 209088 33304 13748 S 0.0 0.1 0:31.31 panel-6-systray
883 thor 20 0 213704 32776 13324 S 0.0 0.0 0:30.55 Thunar
6340 thor 20 0 99788 30100 7188 S 0.0 0.0 0:00.64 python2.7
775 polkitd 20 0 2915456 20076 14836 S 0.0 0.0 0:00.42 polkitd
6341 thor 20 0 58988 17884 5860 S 0.0 0.0 0:00.32 python3
5351 thor 20 0 146892 12192 5572 S 0.0 0.0 0:02.80 python2.7
1 root 20 0 234472 8736 6724 S 0.0 0.0 0:01.37 systemd
954 root 20 0 310044 8572 7120 S 0.0 0.0 0:00.12 upowerd
Offline
cat /proc/meminfo
Offline
Output of /proc/meminfo
MemTotal: 65877232 kB
MemFree: 1344968 kB
MemAvailable: 2584672 kB
Buffers: 116524 kB
Cached: 845748 kB
SwapCached: 0 kB
Active: 3904156 kB
Inactive: 1518224 kB
Active(anon): 3574300 kB
Inactive(anon): 62036 kB
Active(file): 329856 kB
Inactive(file): 1456188 kB
Unevictable: 32 kB
Mlocked: 32 kB
SwapTotal: 33460220 kB
SwapFree: 33460220 kB
Dirty: 68 kB
Writeback: 0 kB
AnonPages: 4460204 kB
Mapped: 388212 kB
Shmem: 64660 kB
Slab: 174612 kB
SReclaimable: 95832 kB
SUnreclaim: 78780 kB
KernelStack: 11168 kB
PageTables: 28660 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 66398836 kB
Committed_AS: 7747508 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 622932 kB
DirectMap2M: 12914688 kB
DirectMap1G: 55574528 kB
Offline
Yup, the kernel has no explanation for that huge gap.
How fast does this happen (does it take days or weeks to build up or do you get there within minutes/hours after the boot)?
Also "lsmod" output please, for suspicious modules.
Offline
It usually takes a couple of days. In the post above you can see that top reports the uptime as 2 days 12hrs. Yesterday it was consuming significantly less, so I'd say between 1 and 2 days.
lsmod output:
Module Size Used by
fuse 118784 3
8021q 36864 0
mrp 20480 1 8021q
nct6775 65536 0
hwmon_vid 16384 1 nct6775
amdkfd 176128 1
amd_iommu_v2 20480 1 amdkfd
b43 454656 0
snd_hda_codec_realtek 106496 1
snd_hda_codec_hdmi 57344 1
amdgpu 3158016 18
snd_hda_codec_generic 86016 1 snd_hda_codec_realtek
mac80211 905216 1 b43
intel_rapl 24576 0
x86_pkg_temp_thermal 16384 0
intel_powerclamp 16384 0
coretemp 16384 0
chash 16384 1 amdgpu
kvm_intel 176128 0
gpu_sched 28672 1 amdgpu
cfg80211 741376 2 b43,mac80211
ttm 122880 1 amdgpu
kvm 708608 1 kvm_intel
drm_kms_helper 200704 1 amdgpu
irqbypass 16384 1 kvm
ssb 86016 1 b43
crct10dif_pclmul 16384 0
crc32_pclmul 16384 0
drm 466944 15 amdgpu,gpu_sched,ttm,drm_kms_helper
snd_hda_intel 45056 0
ghash_clmulni_intel 16384 0
pcbc 16384 0
mmc_core 172032 2 b43,ssb
snd_hda_codec 151552 4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
mousedev 24576 0
btusb 53248 0
nls_iso8859_1 16384 1
hid_generic 16384 0
btrtl 16384 1 btusb
nls_cp437 20480 1
pcmcia 69632 1 ssb
btbcm 16384 1 btusb
aesni_intel 188416 0
snd_hda_core 94208 5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
agpgart 49152 2 ttm,drm
pcmcia_core 28672 1 pcmcia
syscopyarea 16384 1 drm_kms_helper
eeepc_wmi 16384 0
rng_core 16384 1 b43
aes_x86_64 20480 1 aesni_intel
snd_hwdep 16384 1 snd_hda_codec
btintel 24576 1 btusb
usbhid 57344 0
asus_wmi 32768 1 eeepc_wmi
sysfillrect 16384 1 drm_kms_helper
vfat 24576 1
iTCO_wdt 16384 0
crypto_simd 16384 1 aesni_intel
glue_helper 16384 1 aesni_intel
sysimgblt 16384 1 drm_kms_helper
bluetooth 638976 5 btrtl,btintel,btbcm,btusb
fat 81920 1 vfat
iTCO_vendor_support 16384 1 iTCO_wdt
wmi_bmof 16384 0
e1000e 282624 0
sparse_keymap 16384 1 asus_wmi
intel_wmi_thunderbolt 16384 0
mxm_wmi 16384 0
igb 241664 0
fb_sys_fops 16384 1 drm_kms_helper
snd_pcm 135168 4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi
bcma 61440 1 b43
hid 139264 2 hid_generic,usbhid
cryptd 28672 3 crypto_simd,ghash_clmulni_intel,aesni_intel
raid1 45056 1
snd_timer 36864 1 snd_pcm
i2c_algo_bit 16384 2 igb,amdgpu
ecdh_generic 24576 1 bluetooth
dca 16384 1 igb
intel_cstate 16384 0
rfkill 28672 4 asus_wmi,bluetooth,cfg80211
mei_me 45056 0
snd 98304 8 snd_hda_intel,snd_hwdep,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek,snd_pcm
intel_uncore 131072 0
input_leds 16384 0
ptp 20480 2 igb,e1000e
md_mod 167936 2 raid1
intel_rapl_perf 16384 0
soundcore 16384 1 snd
led_class 16384 3 asus_wmi,b43,input_leds
pcspkr 16384 0
mei 102400 1 mei_me
i2c_i801 32768 0
shpchp 40960 0
pps_core 20480 1 ptp
lpc_ich 28672 0
rtc_cmos 24576 1
wmi 28672 4 asus_wmi,wmi_bmof,intel_wmi_thunderbolt,mxm_wmi
evdev 20480 8
mac_hid 16384 0
vboxnetflt 32768 0
vboxnetadp 28672 0
vboxpci 28672 0
vboxdrv 487424 3 vboxnetadp,vboxnetflt,vboxpci
ip_tables 28672 0
x_tables 45056 1 ip_tables
ext4 716800 2
crc32c_generic 16384 0
crc16 16384 2 bluetooth,ext4
mbcache 16384 1 ext4
jbd2 122880 1 ext4
fscrypto 32768 1 ext4
sr_mod 28672 0
cdrom 69632 1 sr_mod
sd_mod 61440 8
serio_raw 16384 0
ahci 40960 5
atkbd 32768 0
xhci_pci 16384 0
libps2 16384 1 atkbd
libahci 40960 1 ahci
ehci_pci 16384 0
xhci_hcd 258048 1 xhci_pci
ehci_hcd 94208 1 ehci_pci
crc32c_intel 24576 2
libata 278528 2 ahci,libahci
usbcore 286720 6 usbhid,ehci_hcd,xhci_pci,btusb,xhci_hcd,ehci_pci
scsi_mod 258048 3 sd_mod,libata,sr_mod
usb_common 16384 1 usbcore
i8042 32768 0
serio 28672 4 serio_raw,atkbd,i8042
Offline
I'm seeing the same on Arch ARM (aarch64) for a month or so now; that machine is running the 4.16.x series of kernels. I do not see it manifesting on my Arch boxes though, just the ODROID-C2. I haven't been able to track down the cause of it.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
virtualbox?
Offline
I do have virtualbox installed, but I don't have it open or running anything.
Offline
If you're not using it anyway, I'd suggest trying to unload (and not auto-load) the vbox kernel modules.
And of course it would be interesting to hear whether graysky has vboxdrv loaded as well. (Actual any overlapping kernel modules - should not be that many with an odroid)
Offline
...And of course it would be interesting to hear whether graysky has vboxdrv loaded as well.
I do not. The Arch ARM box I have just runs pihole and openvpn... very minimal. Typically, it's only using 150-175 MB of RAM but ever since aarch64 went to 4.16.x, it has been "leaking" memory very similar to what the OP reported.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I do have a couple virtual machines and I do want to use them (one of the reasons I have so much RAM), but since I'm not doing anything with them right now I've disabled the virtualbox kernel modules from loading and will see if that has any effect. Given that graysky doesn't have virtualbox kernel modules running, this may not be the issue.
Offline
@OP - You ever get to the bottom of the leak? I did not!
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I have not gotten to the bottom of the leak. It has been ~7 days since I rebooted. My system has not consumed all of my RAM in that time, but it is still consuming more than it should be. 18/64 GB are currently in use. 2GB of the 18GB is being used for buff/cache which leaves 16GB/64GB = 25% used by something else. Top shows that no processes are using anything near 25%. Perhaps unloading the virtual box modules helped a little bit. This time the resources are disappearing much more slowly.
top - 16:16:43 up 7 days, 1:00, 0 users, load average: 0.26, 0.56, 0.71
Tasks: 274 total, 2 running, 272 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 64333.2 total, 44032.7 free, 18360.9 used, 1939.7 buff/cache
MiB Swap: 32676.0 total, 32676.0 free, 0.0 used. 45873.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15280 user 20 0 2504856 465484 147504 S 0.0 0.7 5:12.79 firefox
15347 user 20 0 2172924 353396 87924 S 0.0 0.5 0:13.44 Web Conte+
15644 user 20 0 2126068 339892 90812 S 0.7 0.5 45:05.65 Web Conte+
675 root 20 0 1146768 208084 72068 R 2.3 0.3 19:18.32 Xorg
15412 user 20 0 1805916 142972 112452 S 0.0 0.2 0:03.24 Web Conte+
15764 user 20 0 1775328 113948 81916 S 0.0 0.2 0:01.33 Web Conte+
15240 user 20 0 616128 56616 18156 S 0.0 0.1 0:47.16 xfdesktop
15584 user 20 0 597272 41912 29736 S 1.3 0.1 0:02.74 xfce4-ter+
374 root 20 0 120408 28524 27572 S 0.0 0.0 0:09.18 systemd-j+
15235 user 20 0 297720 25784 19700 S 0.0 0.0 0:05.74 xfce4-pan+
15232 user 20 0 203364 23640 17572 S 0.0 0.0 0:05.01 xfwm4
15257 user 20 0 334292 21456 15652 S 0.0 0.0 0:00.51 polkit-gn+
Offline
I can't offer much help since I can't even track what process is consuming the memory on my machine but I will say that for me the bug occurs on aarch64 which runs the 4.16.x kernel (confirmed on odroid-c2 and raspberry pi 3 running aarch64). When I switch over to armv7h on raspberry pi3 which runs the 4.14.x kernels, I do not see the leak. Can you try booting in the linux-lts kernel package on x86_64 box? It too runs the 4.14.x series.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Sure. I've installed the linux-lts kernel package, booted the kernel, and will post an update in a couple of days once I've allowed for enough time for the RAM usage to go back up.
Offline
I did a few tests using an older x86_64 box (Intel E5200), an few aarch64 boxes (ODROID-C2 and Raspberry Pi3), and a a few boxes running armv7h (RPi2 and RPi3).
tl;dr summary: I see memory "leaks" for the x86_64, and aarch64 boxes, but not for the armv7h boxes.
Details:
For x86_64, the test was to boot it under 4.16.11 and 4.14.41 and log the mem usage `free --mega | grep Mem | awk '{ print $3 }'` once per hour via a cronjob. The result was the used memory went up for each kernel with the box just sitting idle (mysqld, kodi, vncserver/lxqt, and openvpn/server all running just sitting idle with no user interaction with the exception of openvpn which has my family hitting it):
Under 4.16.11, used memory over a 24 h period increased in a linear fashion at a rate of approx 6 MB/hour.
Under 4.14.41, used memory over a 24 h period increased in a linear fashion at a rate of approx 5 MB/hour.
For aarch64.ODROID-C2 under kernel 4.16.11, sitting idle with just systemd networking, sshd, and ufw running, memory increased over 24 h at a very low rate of 0.3 MB/hour.
Running pihole and openvpn in lxcs in addition to the base services, memory increased increased in a linear fashion over a 24 h at a rate of 10.5 MB/hour,
Note that, if I let the ODROID-C2 run without a reboot, the free memory will eventually increase to consume the entire free amount.
I also tried running Arch ARM aarch64 on a Raspberry Pi 3. Measured under kernel 4.16.10.
Running pihole and openvpn in lxcs in addition to the base services, memory increased in a linear fashion over a 36 h at a rate of 9.9 MB/hour.
If I let it run without a reboot, it too will go until there is no free RAM.
In contrast, running armv7h on a Raspberry Pi 2 or 3 (similar results), does not show the memory leak:
I haven't run with the log file (doing it now) but one example is a RPi3 box running 4.14.39. All it does is runs pihole, nginx, php in an lxc. It has 13 days of uptime currently and `free --mega` shows 103 used.... when I booted this machine the used memory was 82 so in 13 days (312 hours), the rate of memory use was 0.09 MB/hour... that's over 100 times less than either of the aarch64 boxes
Last edited by graysky (2018-05-24 16:03:57)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I think the culprit could be systemd-journal. Right now I am seeing the used memory (free --mega) track linearly with the resident size set (ps --sort -rss -eo pid,pmem,rss,vsz,comm) for systemd-journalctl. I need to collect more data/will post back.
Last edited by graysky (2018-05-29 20:48:42)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
It's been about 7 days again for me and I'm not seeing a memory leak with the 4.14 kernel, "4.14.41-1-lts". I started running virtualbox VMs again as can be seen in the output of top. The total memory consumption is about 7GB, and it all seems to be explained by running processes and buff/cache usage.
top - 07:25:56 up 7 days, 6:28, 0 users, load average: 0.16, 0.13, 0.04
Tasks: 256 total, 1 running, 255 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 64334.1 total, 53802.2 free, 7275.5 used, 3256.4 buff/cache
MiB Swap: 32676.0 total, 32676.0 free, 0.0 used. 56592.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16109 thor 20 0 7042960 931628 860584 S 1.0 1.4 173:17.67 VirtualBox
1016 thor 20 0 2580068 530552 182868 S 0.0 0.8 38:34.30 firefox
14960 thor 20 0 4254468 401640 345400 S 1.3 0.6 162:26.29 VirtualBox
1205 thor 20 0 2002060 304268 122648 S 0.0 0.5 7:53.79 Web Conte+
1083 thor 20 0 2032824 270008 91736 S 0.3 0.4 49:36.45 Web Conte+
15024 thor 20 0 2090264 230364 130616 S 0.0 0.3 4:30.90 Web Conte+
1270 thor 20 0 1986348 212660 97684 S 0.0 0.3 104:06.72 Web Conte+
628 root 20 0 1053892 135104 87452 S 2.0 0.2 76:11.17 Xorg
14910 thor 20 0 2170444 121568 90236 S 0.0 0.2 25:58.00 VirtualBox
781 thor 20 0 1019712 55552 40916 S 0.0 0.1 0:00.25 polkit-gn+
775 thor 20 0 1039064 55056 42108 S 0.0 0.1 0:01.65 xfce4-pow+
784 thor 20 0 1038936 54196 41296 S 0.0 0.1 0:00.24 xfce4-pow+
967 thor 20 0 597168 44044 30444 S 1.0 0.1 0:12.18 xfce4-ter+
14938 thor 20 0 968376 38528 17504 S 0.7 0.1 59:30.19 VBoxSVC
345 root 20 0 128404 32452 31704 S 0.0 0.0 0:00.90 systemd-j+
Are you sure that the memory leak happened with the 4.14 kernel as well on an x86_64?
Offline
Yes, 100% sure both kernels on x86_64.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline