You are not logged in.

#1 2011-01-30 15:17:38

Dirk Sohler
Member
From: Hamburg, Germany
Registered: 2009-10-03
Posts: 109

The mystery of “ghost files” … (probably a bug?)

Hey there,

i recently noticed a very strange behavior regarding to folder sizes. I have to handle some huge amounts of files from time to time. This files are stored in directories  in my home directory. Then i’m going to delete a huge amount of that files. But after deleting the folder size stays the same, even if there are no files in that folder.

So i decided to set up a little experiment.

$ mkdir test{1,2}

$ du -hs test{1,2}
4,0K    test1
4,0K    test2

$ for i in {1..5000}; do touch test1/$i; done

$ du -hs test{1,2}
76K    test1
4,0K    test2

$ rm test1/*


$ ls -a test{1,2}
test1:
.  ..

test2:
.  ..


$ du -hs test{1,2}
76K    test1
4,0K    test2

dirk ~ $ 

First I created two empty directories (proven by using du), then I started a loop with 5000 iterations to create 5000 empty files in that directory. Now the directory contains informations with an over-all size of 76K, which is correct, because of the directory contains 5000 files. Then i cleared the whole directory by using rm. now the directory is empty (proven by ls which only shows . and .. in both directories). But using du again shows, that the directory test1 (which contained the 5000 files before deleting them) still has a size of 76K despite it’s as empty as test2.

Both of the directories (test1 and test2) contain the same amount of files (none except . and ..), but why is the size of test1 76K and the size of test2 only 4K?

I know why test1 was 76K while it contained the 5000 files. But why is it still 76K after deleting the files, and why gets the size not adjusted after deleting the files?

I’m looking forward to your explainations.

Thanks in advance!

Kind regards,
Dirk

Offline

#2 2011-01-30 15:34:02

keenerd
Package Maintainer (PM)
Registered: 2007-02-22
Posts: 647
Website

Re: The mystery of “ghost files” … (probably a bug?)

And surely someone will ask, this was on Ext4.

Offline

#3 2011-01-30 16:14:55

ataraxia
Member
From: Pittsburgh
Registered: 2007-05-06
Posts: 1,553

Re: The mystery of “ghost files” … (probably a bug?)

I don't think directories free the space used by their content indexes when they shrink. Normally any directory that once has been very large is likely to be large again, so it just keeps the space.

Offline

#4 2011-01-30 16:33:32

defears
Member
Registered: 2010-07-26
Posts: 218

Re: The mystery of “ghost files” … (probably a bug?)

Had this happen to me a while ago. Forgot to install gamin to update my folders. I used nautilus though.

Offline

#5 2011-01-30 17:22:02

bernarcher
Forum Fellow
From: Germany
Registered: 2009-02-17
Posts: 2,281

Re: The mystery of “ghost files” … (probably a bug?)

ataraxia wrote:

I don't think directories free the space used by their content indexes when they shrink. Normally any directory that once has been very large is likely to be large again, so it just keeps the space.

If I recall right, this is caused by the directory contents management.

Remember, as everything in unix, directories are files. Basically a directory holds a list of inode indices pointing to the files it contains. If you remove a file from it, this inode pointer will simply be set to 0 but not removed from the list. If you put another file into the directory conceptually its inode index is appended to this list whithout altering the previous contents.

Often changed directories thus tend to grow enormously.

On the other hand if you did remov all files from a directory it will retain a list of zeroes. If you want a really empty directory you have to rmdir it totally and create a new one.


To know or not to know ...
... the questions remain forever.

Offline

#6 2011-01-31 02:33:43

Dirk Sohler
Member
From: Hamburg, Germany
Registered: 2009-10-03
Posts: 109

Re: The mystery of “ghost files” … (probably a bug?)

So i waste my disk space (5000 was only for testing purposes) because of thousands and thousands of useless inode pointers in a directory? Is there a way to change this behavior?

Edit: Well, maybe there is, but i don’t know, but i created a little Python script to get rid of that useless inode pointers:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# vim: ts=4:sw=4
# CC-by-sa, Dirk Sohler, spam@0x7be.de


""" It seems like Linux keeps inode pointers for already deleted files in
directory definitions. By using this script the problem still is there, but
the results are fixed by moving all files into a new directory (keeping
file metadata)
"""


import os
from optparse import OptionParser
import tempfile
import shutil


def opts():
    """Parses the options given by user"""
    parser = OptionParser()
    parser = OptionParser(
        usage='%prog [options] {directory name(s)}',
        version='%prog 0.1')
    return parser.parse_args()


def getsize(dirs):
    s = 0
    for d in dirs:
        s +=  os.stat(d).st_size
    return s


def mvdirs(dirs):
    count = 0
    for d in dirs:
        dirname = d
        tmpdir = tempfile.mkdtemp()
        shutil.copytree(d, tmpdir + '/d')
        shutil.rmtree(d)
        shutil.move(tmpdir + '/d', dirname)
        shutil.rmtree(tmpdir)
        count += 1
    return count


def main():
    o,dirs = opts()
    os = getsize(dirs)
    ct = mvdirs(dirs)
    ns = getsize(dirs)
    sd = os - ns
    mb = round(sd/1024.0/1024, 2)
    print('You just saved %s bytes (%s MB) by processing %s directories'
            % (sd,mb,ct))


if __name__ == '__main__':
    main()

Call with directory name(s):

$ cd "my huge directories"
$ antiwaste.py *
You just saved 6434816 bytes (6.14 MB) by processing 31 directories
$ 

This script was done quick’n’dirty. Use at your own risk smile

Last edited by Dirk Sohler (2011-01-31 04:17:58)

Offline

Board footer

Powered by FluxBB