You are not logged in.
UPDATE 2014: rmlint was rewritten. New thread is here. Please use the new one.
I was asked to do a PKGBUILD so here it is.
rmlint is a commandline tool to clean your filesystem from various sort of lint (duplicates, empty files/dir..)
It is written in pure C and tends to be much faster than fdupes which seems to be standard on (e.g.) Ubuntu.
Additionally it is able to find empty files/dirs, nonstripped binaries, files with same basenames (nameclusters), empty files/directories, old tempdata, strange filenames and bad links.
Because it dumps a log and a script, it is easy to modify it to your needs.
Developement happens on: https://github.com/sahib/rmlint
See also the README for a detailed list of features.
Any feedback & critics / patches are welcome!
Last edited by SahibBommelig (2014-12-19 14:44:25)
Offline
Amazing little program! Thanks, really appreciate it.
Check your merge requests
Last edited by SanskritFritz (2014-03-05 10:04:35)
zʇıɹɟʇıɹʞsuɐs AUR || Cycling in Budapest with a helmet camera || Revised log levels proposal: "FYI" "WTF" and "OMG" (John Barnette)
Offline
@Sahib:
What's the best way to "uninstall" the rmlint I made from the .tar, so that I can install the AUR one (so pacman is keeping track of everything)? (first time building from source manually, i.e. without a PKGBUILD and makepkg)
Thanks.
Offline
Nice, it's time to install and find duplicate files. Thanks for this amazing application Sahib
Ask, and it shall be given you.
Seek, and ye shall find.
Knock, and it shall be opened unto you.
Offline
What's the best way to "uninstall" the rmlint I made from the .tar, so that I can install the AUR one (so pacman is keeping track of everything)? (first time building from source manually, i.e. without a PKGBUILD and makepkg)
The easiest way is
sudo make uninstall
Also if you installed the program to the same location as the PKGBUILD, then
pacman -Uf
overwrites all files.
zʇıɹɟʇıɹʞsuɐs AUR || Cycling in Budapest with a helmet camera || Revised log levels proposal: "FYI" "WTF" and "OMG" (John Barnette)
Offline
The easiest way is
sudo make uninstall
Also if you installed the program to the same location as the PKGBUILD, then
pacman -Uf
overwrites all files.
sudo make uninstall didn't work (make: *** No rule to make target `uninstall'. Stop.), but -Uf worked fine.
Thanks.
Offline
What's the best way to "uninstall" the rmlint I made from the .tar, so that I can install the AUR one (so pacman is keeping track of everything)? (first time building from source manually, i.e. without a PKGBUILD and makepkg)
The most simple approach would be probably:
'sudo rm $DESTDIR/bin/rmlint $DESTDIR/share/man/man1/rmlint.1.gz' where DESTDIR is most likely /usr (..or /usr/local)
rmlint only installs two files...
The Makefile didn't have an 'uninstall' target, but I just added one.
But pacman -Uf should do fine too most probably.
Offline
Shameless bump.
On the request of bencahill:
- Added an possibility to mark a directory as source (==where originals come from) when having more than one dir.
Just prepend the path with a '//' to 'prefer' it.
Example:
$ rmlint 'testdir/recursed_a' '//testdir/recursed_b'
ls testdir/recursed_b/one
rm testdir/recursed_a/one
..while normally...
$ rmlint 'testdir/recursed_a' 'testdir/recursed_b'
ls testdir/recursed_a/one
rm testdir/recursed_b/one
- Also speeded up the --paranoid option (simply by using mmap())
Offline
Hi Sahib,
Nice to see the new '//' feature!
I was thinking whether the package should be named rmlint-git. VCS PKGBUILD Guidelines states:
Properly suffix pkgname with -cvs, -svn, -hg, -darcs, -bzr or -git. If the package tracks a moving development trunk it should be given a suffix. If the package fetches a release from a VCS tag then it should not be given a suffix. Use this rule of thumb: if the output of the package depends on the time at which it was compiled, append a suffix; otherwise do not.
Thank you for rmlint!
Attila
Offline
On the request of bencahill:
- Added an possibility to mark a directory as source (==where originals come from) when having more than one dir.
Just prepend the path with a '//' to 'prefer' it.
I apologize for not answering...I really enjoy the new feature, and have used it at least a dozen times already . Yes, I have that many duplicates.
Thanks again for the wondeful software .
!give SahibBommelig cookie
Offline
Hey Attila, nice to meet again
Thanks for the hint, I must have overlooked those guidelines.
I made a new package 'rmlint-git' which is basically the very same as before, also
I consider rmlint to be mostly finished software, so the '-git'-suffix doesn't say much either.
Please vote if you like it.
@bencahill: I'll enjoy my cookie, thanks. Hope I don't already have a copy of it.
Offline
rmlint is a great piece of software, thanks a lot !
If I run it directly in my home folder it return the error :
FATAL: nftw():: Value too large for defined data type
No files in cache to search through => No duplicates.
Offline
Hi,
This was already reported by someone else and should be (..hopefully..) already fixed.
This should only happen on 32bit systems, and with files greater than 2GB.
Thanks for the report though
edit: typo.
Last edited by SahibBommelig (2011-04-08 11:23:07)
Offline
Hi Sahib,
It's working like a charm now !
Next time I'll update first before asking question
Otherwise big kudo for rmlint it's rock !
PS: glyr seem very interresting, do you plan to open a topic in this forum for asking you questions about it ?
Offline
Little update.
Some bugfixes are in, for example the actual duplicate counter was not always exact.
Also some crashes have been fixed (thanks to Micheal and rider),
-c/-C behaves now more a bit smoother:
With -c you can specify a command that is executed on each found duplicate,
in the command given '<orig>' and '<dupl>' are replaced with the path to the original / duplicate.
Example to simulate the behaviour of '-m link' without removing anything:
$ rmlint testdir -v5
echo '/tmp/testcase2/a' # original
rm -f '/tmp/testcase2/a.copy' # duplicate
echo '/tmp/testcase2/b.copy' # original
rm -f '/tmp/testcase2/b' # duplicate
$ rmlint testdir -v5 -c "rm '<dupl>' && ln -s '<orig>' '<dupl>'"
echo '/tmp/testcase2/a' # original
rm '/tmp/testcase2/a.copy' && ln -s '/tmp/testcase2/a' '/tmp/testcase2/a.copy'
echo '/tmp/testcase2/b.copy' # original
rm '/tmp/testcase2/b' && ln -s '/tmp/testcase2/b.copy' '/tmp/testcase2/b'
Also -c replaces rmlint's defaultcommand in the script.
There is also -C, which is the same for originals, just <dupl> won't expand there.
At the moment both packages (rmlint and rmlint-git) will result in the same binary.
Edit: rmlint should also work again on 32bit. (Just fixed - Sorry)
P.S: Oh I have more than 10 votes? :-)
Last edited by SahibBommelig (2011-04-18 18:44:23)
Offline
Hi,
I'm getting the following error when I run rmlint. It happens only in some directories, and always after rmlints starts to list the duplicates.
FATAL: Rmlint crashed due to a Segmentation fault! :(
No other detail is given in the output, just the Seg fault after listing some duplicates.
I have a 32 bit system.
I will be happy to follow your guidelines to locate the origin of the bug.
Cheers.
Arch is to Linux as Jeet Kune Do is to martial arts.
Offline
Hello samhain,
Are you using the the git version or the older tar.gz?
Could you please do the following:
git clone git://github.com/sahib/rmlint.git
cd rmlint
DEBUG=true make
valgrind ./rmlint <additional args>
...and post the (full) output of the last command?
Thanks for you help.
Offline
I guess you missed the configure command in your instructions
I am using the rmlint-git package from AUR.
I followed your instructions and I noticed something strange.
If I compile with
DEBUG=true make
I don't get a Seg fault and rmlint runs fine:
(I guess that just the last lines of the output are enough)
=> In total 4684 files, whereof 716 are duplicate(s)
=> 6 other suspicious items found [0 B]
=> Totally 263.50 MB [276297238 Bytes] can be removed.
=> Nothing removed yet!
A log has been written to rmlint.log.
A ready to use shellscript to rmlint.sh.
==12415==
==12415== HEAP SUMMARY:
==12415== in use at exit: 0 bytes in 0 blocks
==12415== total heap usage: 23,942 allocs, 23,942 frees, 25,869,232 bytes allocated
==12415==
==12415== All heap blocks were freed -- no leaks are possible
==12415==
==12415== For counts of detected and suppressed errors, rerun with: -v
==12415== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 8)
BUT, if I compile rmlint with just
make
I get the following output:
valgrind ./rmlint /home/samhain/rpg/
==10978== Memcheck, a memory error detector
==10978== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==10978== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==10978== Command: ./rmlint /home/samhain/rpg/
==10978==
# Empty file(s):
rm /home/samhain/rpg/heroquest/yeoldeInn/UKVersion/kellar/list
rm /home/samhain/rpg/infinity/ora/ora-3.14/lanzaORA.log
# Empty dir(s):
rmdir /home/samhain/rpg/workshop/40k/game01
rmdir /home/samhain/rpg/workshop/40k/Rosters
rmdir /home/samhain/rpg/workshop/40k/campaignSystem/game01
rmdir /home/samhain/rpg/workshop/40k/campaignSystem/Rosters
# Duplicate(s):==10978== Thread 3:
==10978== Invalid read of size 1
==10978== at 0x804B703: ??? (in /home/samhain/dwork/rmlint/rmlint/rmlint)
==10978== by 0x804C27B: ??? (in /home/samhain/dwork/rmlint/rmlint/rmlint)
==10978== by 0x804D24C: ??? (in /home/samhain/dwork/rmlint/rmlint/rmlint)
==10978== by 0x416DDED: clone (in /lib/libc-2.14.so)
==10978== Address 0x6dd5000 is not stack'd, malloc'd or (recently) free'd
==10978==
FATAL: Rmlint crashed due to a Segmentation fault! :(
FATAL: Please file a bug report (See rmlint -h)
==10978==
==10978== HEAP SUMMARY:
==10978== in use at exit: 290,798 bytes in 3,063 blocks
==10978== total heap usage: 10,490 allocs, 7,427 frees, 19,848,237 bytes allocated
==10978==
==10978== LEAK SUMMARY:
==10978== definitely lost: 41,360 bytes in 260 blocks
==10978== indirectly lost: 247,607 bytes in 2,781 blocks
==10978== possibly lost: 272 bytes in 2 blocks
==10978== still reachable: 1,559 bytes in 20 blocks
==10978== suppressed: 0 bytes in 0 blocks
==10978== Rerun with --leak-check=full to see details of leaked memory
==10978==
==10978== For counts of detected and suppressed errors, rerun with: -v
==10978== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 8)
Is this enough or do you need more info?
Arch is to Linux as Jeet Kune Do is to martial arts.
Offline
This is a very nice tool. I just used it to decrease the space used by duplicates (which I actually need) by using the script your tool made and changing echo and rm of the duplocates so it creates a hardlink.
Are you open for feature requests?
Offline
Hello samhain,
I guess you missed the configure command in your instructions
Yupp - Pardon.
This valgrind error you are getting is very strange..
But I'd guess it's the very high optimization level (till now this never made any trouble though)
Could you try changing the OPTI= line in the Makefile to something nicer like:
OPTI=-march=native -Os -s -finline-functions
Recompile (via plain 'make') and see if the problem persists. If so I will change the level in upstream.
Awebb:
Thanks.
You may post feature requests, but it might take very long till they're ready for use (I have very little time and mostly work on glyr/gmpc in my free time)
My guess would be that your feature request is "Can it do the hardlinks automagically?" - Yupp. (See the -C / -c option)
Some fun day I will probably rewrite this with GLib and some better IO..
Offline
Could you try changing the OPTI= line in the Makefile to something nicer like:
OPTI=-march=native -Os -s -finline-functions
Recompile (via plain 'make') and see if the problem persists. If so I will change the level in upstream.
Done, this is the error I get:
==31568== Memcheck, a memory error detector
==31568== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==31568== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==31568== Command: ./rmlint /home/samhain/rpg/
==31568==
# Empty file(s):
rm /home/samhain/rpg/heroquest/yeoldeInn/UKVersion/kellar/list
rm /home/samhain/rpg/infinity/ora/ora-3.14/lanzaORA.log
# Empty dir(s):
rmdir /home/samhain/rpg/workshop/40k/game01
rmdir /home/samhain/rpg/workshop/40k/Rosters
rmdir /home/samhain/rpg/workshop/40k/campaignSystem/game01
rmdir /home/samhain/rpg/workshop/40k/campaignSystem/Rosters
# Duplicate(s):==31568== Thread 3:
==31568== Invalid read of size 1
==31568== at 0x804B3C4: ??? (in /home/samhain/dwork/rmlint/rmlint/rmlint)
==31568== by 0x804B723: ??? (in /home/samhain/dwork/rmlint/rmlint/rmlint)
==31568== by 0x804BF84: ??? (in /home/samhain/dwork/rmlint/rmlint/rmlint)
==31568== by 0x405DCA6: start_thread (in /lib/libpthread-2.14.so)
==31568== by 0x416DDED: clone (in /lib/libc-2.14.so)
==31568== Address 0x6dd5000 is not stack'd, malloc'd or (recently) free'd
==31568==
FATAL: Rmlint crashed due to a Segmentation fault! :(
FATAL: Please file a bug report (See rmlint -h)
==31568==
==31568== HEAP SUMMARY:
==31568== in use at exit: 290,798 bytes in 3,063 blocks
==31568== total heap usage: 10,490 allocs, 7,427 frees, 19,848,237 bytes allocated
==31568==
==31568== LEAK SUMMARY:
==31568== definitely lost: 41,252 bytes in 259 blocks
==31568== indirectly lost: 247,219 bytes in 2,776 blocks
==31568== possibly lost: 768 bytes in 8 blocks
==31568== still reachable: 1,559 bytes in 20 blocks
==31568== suppressed: 0 bytes in 0 blocks
==31568== Rerun with --leak-check=full to see details of leaked memory
==31568==
==31568== For counts of detected and suppressed errors, rerun with: -v
==31568== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 8)
Arch is to Linux as Jeet Kune Do is to martial arts.
Offline
I just wanna thank you for this amazing app. Please keep the good work.
All your base are belong to us
Offline
Thanks for rmlint. It would be useful to be able to specify the directory where duplicates must be found so that nothing is deleted somewhere else. I tried “rmlint ///mnt/disk1/ ///mnt/disk2/ .” but it didn’t work, sometimes duplicates were found in disk1 or disk2 and originals kept in “.”.
I also wanted to exclude “.svn”, “.hg” and other directories, but it wasn’t obvious how to do that. Also not delete empty files, but I seem to remember seeing such a feature in the git log…
Offline
This looks very useful. Ran it quickly in default mode on my data drive and was surprised to find that nearly 1,700 files are duplicated (many of them several times), though I'm not quite ready to trust the generated script just yet. Many of the duplicates are there for a reason other than redundancy.
The drive is 384Gb of used size, and the total running time only took a few minutes. Very impressive.
Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Cooler Master N300 chassis, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703
/ is the root of all problems.
Offline