You are not logged in.
Hey there,
i recently noticed a very strange behavior regarding to folder sizes. I have to handle some huge amounts of files from time to time. This files are stored in directories in my home directory. Then i’m going to delete a huge amount of that files. But after deleting the folder size stays the same, even if there are no files in that folder.
So i decided to set up a little experiment.
$ mkdir test{1,2}
$ du -hs test{1,2}
4,0K test1
4,0K test2
$ for i in {1..5000}; do touch test1/$i; done
$ du -hs test{1,2}
76K test1
4,0K test2
$ rm test1/*
$ ls -a test{1,2}
test1:
. ..
test2:
. ..
$ du -hs test{1,2}
76K test1
4,0K test2
dirk ~ $
First I created two empty directories (proven by using du), then I started a loop with 5000 iterations to create 5000 empty files in that directory. Now the directory contains informations with an over-all size of 76K, which is correct, because of the directory contains 5000 files. Then i cleared the whole directory by using rm. now the directory is empty (proven by ls which only shows . and .. in both directories). But using du again shows, that the directory test1 (which contained the 5000 files before deleting them) still has a size of 76K despite it’s as empty as test2.
Both of the directories (test1 and test2) contain the same amount of files (none except . and ..), but why is the size of test1 76K and the size of test2 only 4K?
I know why test1 was 76K while it contained the 5000 files. But why is it still 76K after deleting the files, and why gets the size not adjusted after deleting the files?
I’m looking forward to your explainations.
Thanks in advance!
Kind regards,
Dirk
Offline
And surely someone will ask, this was on Ext4.
Offline
I don't think directories free the space used by their content indexes when they shrink. Normally any directory that once has been very large is likely to be large again, so it just keeps the space.
Offline
Had this happen to me a while ago. Forgot to install gamin to update my folders. I used nautilus though.
Offline
I don't think directories free the space used by their content indexes when they shrink. Normally any directory that once has been very large is likely to be large again, so it just keeps the space.
If I recall right, this is caused by the directory contents management.
Remember, as everything in unix, directories are files. Basically a directory holds a list of inode indices pointing to the files it contains. If you remove a file from it, this inode pointer will simply be set to 0 but not removed from the list. If you put another file into the directory conceptually its inode index is appended to this list whithout altering the previous contents.
Often changed directories thus tend to grow enormously.
On the other hand if you did remov all files from a directory it will retain a list of zeroes. If you want a really empty directory you have to rmdir it totally and create a new one.
To know or not to know ...
... the questions remain forever.
Offline
So i waste my disk space (5000 was only for testing purposes) because of thousands and thousands of useless inode pointers in a directory? Is there a way to change this behavior?
Edit: Well, maybe there is, but i don’t know, but i created a little Python script to get rid of that useless inode pointers:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# vim: ts=4:sw=4
# CC-by-sa, Dirk Sohler, spam@0x7be.de
""" It seems like Linux keeps inode pointers for already deleted files in
directory definitions. By using this script the problem still is there, but
the results are fixed by moving all files into a new directory (keeping
file metadata)
"""
import os
from optparse import OptionParser
import tempfile
import shutil
def opts():
"""Parses the options given by user"""
parser = OptionParser()
parser = OptionParser(
usage='%prog [options] {directory name(s)}',
version='%prog 0.1')
return parser.parse_args()
def getsize(dirs):
s = 0
for d in dirs:
s += os.stat(d).st_size
return s
def mvdirs(dirs):
count = 0
for d in dirs:
dirname = d
tmpdir = tempfile.mkdtemp()
shutil.copytree(d, tmpdir + '/d')
shutil.rmtree(d)
shutil.move(tmpdir + '/d', dirname)
shutil.rmtree(tmpdir)
count += 1
return count
def main():
o,dirs = opts()
os = getsize(dirs)
ct = mvdirs(dirs)
ns = getsize(dirs)
sd = os - ns
mb = round(sd/1024.0/1024, 2)
print('You just saved %s bytes (%s MB) by processing %s directories'
% (sd,mb,ct))
if __name__ == '__main__':
main()
Call with directory name(s):
$ cd "my huge directories"
$ antiwaste.py *
You just saved 6434816 bytes (6.14 MB) by processing 31 directories
$
This script was done quick’n’dirty. Use at your own risk
Last edited by Dirk Sohler (2011-01-31 04:17:58)
Offline