You are not logged in.

#1 2014-12-18 22:42:14

SahibBommelig
Member
From: Germany
Registered: 2010-05-28
Posts: 80

rmlint-2.4.0 - a lint/duplicate finder [rewrite of old rmlint]

HELLO ARCHY PEOPLE.

rmlint finds space waste and other broken things on your file system and offers to remove it.
It is especially good at finding duplicates in your files and getting rid of
them. You might argue that most of this can be done via few lines of bash, but
can do you do it fast too? And how do you get rid of the results?

Some of you might remember this tool since it had another thread here.

SO WHAT'S NEW?

rmlint has been completely rewritten in the meantime.
Roughly 800+ commits are included in this release which resulted in:

  • Much cleaner and extensible code with less bugs.

  • More tests and less nonsense features (--junkchars, --oldtmp, posix regex!).

  • Speedups regularly between 2x and 8x of the original speed or even more.

  • Saner, more unix-ish commandline interface.

Okay, that was vague - Sorry. Concrete features are:

  • Exchangeable hashsum-algorithms. (cryptographic and non-cryptographic)

  • Ability to find duplicate directories. (experimental)

  • Filter duplicates by basename, file extension or only files newer than a certain mtime.

  • Localization support (help needed!)

  • More output formats. (shell/python script, json/csv dump, a progressbar...)

  • Support for reading files from stdin. (using "-" as file)

  • More options to guess the original in a set of duplicates. (--sortcriteria option)

With this said:
The new version is not compatible to the old one. Do not assume it works with the same options!
But it should be noted that the new version does not ever delete files itself, but gives you weapons to do so.

ANY HELP NEEDED?

It's still fresh software that needs packagers, translators, bugfixers and mostly testers.
People that want to port rmlint to other platforms (OSX, BSD*) are welcome too of course.
In any case, GitHub is where the action should happen.
If you know a little Python, adding a testcase to our testsuite along with your bugreport would be great.

At this point: a big thanks to my co-author SeeSpotRun which made this happen.

I WANT IT!

The rmlint-git package in the AUR has been updated to compile the rewritten version from upstream/master (which should contain stable software always):

$ your-aur-helper -S rmlint-git

Enjoy!

Last edited by SahibBommelig (2015-10-25 14:54:26)

Offline

#2 2015-05-10 00:08:31

SahibBommelig
Member
From: Germany
Registered: 2010-05-28
Posts: 80

Re: rmlint-2.4.0 - a lint/duplicate finder [rewrite of old rmlint]

We're proud to release the new rmlint version 2.2.0 "Dreary Dropbear"!

Rmlint is a fast, feature-full but still easy to use lint and duplicate file finder.
This new releases includes over 400 commits and some noticeable improvements:

- Improved speed, particularly for byte-by-byte comparison option "-pp".
- Reduced memory footprint.  This is particularly important for very large data sets (>5 million files) which rmlint now handles with ease.
- Fix some annoying bugs and crashes (especially on 32bit).
- Improved testsuite to ensure internal program integrity during development.

Reminder: We still feature a nice progressbar (-g), finding duplicate
directories (-D) and fast byte-by-byte comparison (-pp).

Links:

- GitHub
- Documentation
- Full Changelog

Support wanted:

Non-developers:

- Testers and morale boosters.  Give us some feedback via Issue Tracker.
- Packagers for other distributions. You can also vote for the AUR package to get included in the official repos.
- Translators (only French and German available at present)
- Beer money is appreciated too of course.

Developers:

Here's what we're currently working on:

- An easy GUI for those in need (Prototype)
- Extend testsuite (current coverage as per lcov output)
- Automated speed regression tests (early benchmark)
- Faster re-running of rmlint (improved --cache)
- Sort output files by certain criteria (find biggest size sucker e.g.)
- Make shell script perform sanity checks.

Have fun! smile

Offline

#3 2015-10-25 14:54:12

SahibBommelig
Member
From: Germany
Registered: 2010-05-28
Posts: 80

Re: rmlint-2.4.0 - a lint/duplicate finder [rewrite of old rmlint]

Hello,

we're happy to release the new rmlint version 2.4.0 Myopic Micrathene.
If you wonder what a Micrathene is, look here.

Here's the newsticker:

- A new optional GUI  frontend based on Python/GTK3.
- A benchmark suite  to protect against performance regression.
- Support for btrfs and reflink capable filesystems: Files can be now   deduplicated by the fileystem using the BTRFS_IOC_FILE_EXTENT_SAME ioctl if
  the user specified -c sh:clone.
- New --replay option that reprocesses the json file(s) of a previous run.
- New --sort-by option that sorts rmlint's output. Sort for example by size  (--sort-by s) to print the biggest size suckers first.
- The shellscript now does sanity checks before removing files and can be told to double check the files before removing them.

That's of course a short list for about 700 commits.

Links:

- GitHub
- Documentation
- Changelog
- IssueTracker

Support wanted:

While we're a somewhat healthy Open Source project, we can't do everything alone.
This is not only due to time constraints, but also due to the unability to test/package
rmlint on other systems or translating it to languages we don't speak.

In particular we want help on these topics:

- Packagers, particularly for Debian/Ubuntu. See  here for more info.
  There is already a package for Arch, thanks to  Massimiliano Torromeo!
- Translators: See here for more information.
- Testers and Patchers. Especially for the new GUI, since it is a separate codebase.
- Beer money is always welcome.

Plans for upcoming releases:
Not many. We'd like to stabilise rmlint now and go up in smaller version jumps.

Have fun while killing some files.

Offline

Board footer

Powered by FluxBB