Can someone explain what exactly is being read/written from/to disk in this situation?
I have 2 GB of RAM and no swap partitions. Occasionally I'll forget how inefficient gwenview is at displaying very large images and accidentally double-click one. The entire system freezes; even alt+sysrq keystrokes are ineffective (and yes I do have them enabled).
For about 5 minutes, the system is locked up and the hard drive light is flickering. That scares me a bit, because with no swap, what could it possibly be doing for 5 straight minutes? I used to think it was synching before doing the OOM-killing, but there's no way a sync could take that long. Judging from the sound of the hard drive, it's hda (the drive / and all the other system partitions are on).
A few times after recovering from this I've run extensive data verification and never found any evidence of corruption, but I'd like to know for sure that the kernel isn't randomly deciding to use some filesystem as swap space.
In the mean time, I'm playing with disabling overcommit -- setting vm.overcommit_memory = 2 in /etc/sysctl.conf. That enforces a hard memory commit limit of swap size + overcommit_ratio * ram size (so I've read) -- and I've also read that the default overcommit_ratio is only 50%. What the bloody hell? It's almost like someone thinks swap is more important than RAM -- hell-llo, I have 2 GB of RAM so that I can get *away* from swap!
Anyway, I've set the ratio to 97% and so far things seem happy -- if I deliberately run out of memory, the process that did it always gets killed instantly and the system doesn't freeze up on OOM anymore."
Another thing -- in all my out of memory situations so far, VMWare has been running. I suppose it's possible that VMWare is the one doing the swappage; I'll have to investigate that further.
OK, I have ruled out VMWare as the culprit, but more googling has yielded a possible answer:
If the amount of "cached" memory is very low, it could mean
that your shared libraries are being pushed out of memory,
instead of the kernel swapping out some page that belongs to
only one process.
That might be happening to me, and then a whole bunch of processes all need to go to disk every time they want a bit of code out of one of the shared libraries. Seem likely?
I can say that I had the exact same issue when I was working with no swap...
Always after loading several huge pages with lots of content + flash I would get to the state with the HD busy and system non responsive...
After adding swap this never happened again but also, when loading these huge pages swap isnt being used.. (the first thing I checked) but it fixes it, so what ever.
edit: html pages in firefox I mean
Last edited by daf666 (2008-05-08 08:12:05)
I think I've finally figured this out. It's a kernel bug -- I'm guessing that under normal circumstances, the "cached" column in the free command "doesn't count" towards how much memory the system thinks it's using. After all, it's just cached copies of stuff that should be elsewhere, and if you run out of memory, you can safely dump that, right? Unfortunately, /dev/shm is counted under cached rather than used memory (as I discovered in an earlier post).
So if I've got 500 MB of stuff in /dev/shm * (which is where I mount my /tmp), there's now 500MB of stuff in the "cached" column that really does count -- system reaches all RAM full, decides it needs to dump cache, and suddenly finds that the 500MB it thought it could use isn't usable. For some reason it takes about 5 minutes of hard drive thrashing (probably because it's already chucked all of the system libraries, etc. out of cached and needs to re-read them from disk every time) before something finally figures out that it really is out of memory and that that 500MB isn't letting go and invokes OOM-killer.
*: VMWare does this; it creates a 512MB file (the amount of RAM in my virtual machine) then hides it by keeping the file open and deleting it, so the inode's still there, but you can't see it and it makes the df command really perplexing... but that's another story.
I haven't had a chance to try this with a newer kernel (maybe they've fixed it now?); I'm still running 2.6.23-ARCH here. (pacman -Syu upgrades are a major production for me because I have lots of RAID arrays and things, and an nvidia graphics card, and I use gnucash which sometimes needs manual recompiling, and so on...)