You are not logged in.
Edit: there's an up to date thread about the python script that resulted from this thread here.
Hey there,
I'm working on a python script to sort the system packages that have been least used. I just found that it would be pretty cool if there was an application to find packages that you have installed a long time ago and don't use anymore.
Although, the hardest part of this idea is to get a good sorting criteria. By now (version 0.0.4a), the algorithm works as follows:
1. Gets a list of all pacman packages.
2. For each pacman package, do:
3. Get a list of all files on that package.
4. For each file on a package, do:
5. Get the last time the file was accessed.
6. Assign the latest access time between all package's files as the package last access time.
7. Sort the packages by their last access time.
I'm not sure if that was clear enough, but here's what I've come so far:
http://files.venox.qc.to/lupac-0.0.1a : uses the algorithm described above, except that it looks for directories too;
http://files.venox.qc.to/lupac-0.0.2a : uses usage average between all files and directories of the package;
http://files.venox.qc.to/lupac-0.0.3a : uses the same algorithm as 0.0.1a, but filters directories;
http://files.venox.qc.to/lupac-0.0.4a : command-line parameters and table formatting. (latest)
The best results are given by version 0.0.4a, if you have the time, try those and tell me what you think
Output of version 0.0.4a on my machine: http://files.venox.qc.to/results
Last edited by venox (2008-02-14 22:16:40)
Offline
Hi, I am looking forward to this kind of utility, and even thinking about making one myself.
But, also, I don't have any better sorting criteria, only one suggestion:
I think we only need to sort executables, which will greatly down-size the quantity of sortees.
Good luck,
bsdson.tw
Offline
Okay~
Here comes a simple bash script which will do most of the requirements, except "sorting". *
*. sorting can be easily done by
sort -t "=" -k 2 <the output of the script>
and the script itself:
(version 2)
#!/bin/sh
pkgs=$(pacman -Qt | cut -d " " -f 1)
for pkg in $pkgs ; do
count=0
total=0
max=0
files=$(pacman -Ql $pkg | cut -d " " -f 2)
for file in $files ; do
if [ -x $file -a ! -d $file ] ; then
time=$(ls -lu --time-style=+%s $file | cut -d " " -f 6)
count=$(expr $count + 1)
total=$(expr $total + $time)
if [ $time -gt $max ] ; then
max=$time
fi
fi
done
if [ 0 -ne $count ] ; then
avg=$(expr $total / $count)
avg=$(date --date="1970-01-01 UTC $avg seconds" "+%Y/%m/%d %H:%M")
max=$(date --date="1970-01-01 UTC $max seconds" "+%Y/%m/%d %H:%M")
echo "$pkg avg_access_time=$avg last_access_time=$max"
fi
done
BR,
bsdson.tw
Last edited by bsdson.tw (2008-02-13 11:07:01)
Offline
I think you'd better look at the executable being accessed the most, if someone has imagemagick installed and uses import and convert very often, but the other 13 executables almost never, he probably doesn't want that package selected (It's used often) but the unused binaries will make your average pretty low.
Offline
Thank you, Ramses de Norre.
You are right, let me update my code in the original post.
Offline
I've ran your script and sorted the output, but there is something wrong... I get a lot of files that are said to be accessed the last time in 1970 And after 12 such old packages, the next one was accessed yesterday.
I'll try to look through the script in more detail if I find the time one of these days, but I've got the impression that there goes something wrong in there...
Offline
I have a new version of the script, this one gets better results (far from being perfect though) by checking only by files, not directories. I tried using just executable files, but it failed miserably to detect unused libraries
http://files.venox.qc.to/lupac-0.0.3a
Actually, version 0.0.3a works just like 0.0.1a (doesn't compute the average of the results), but it filters the search for non-directories (as the access time of directories seem to be updated by updatedb or something).
I've posted the results of version 0.0.3a on my machine here: http://files.venox.qc.to/results
Last edited by venox (2008-02-13 17:06:54)
Offline
Thank you, venox.
But as I know, the script I posted before has already checked "only executable files".
And I also wonder why there are some files whose access times are "1970-1-10 00:00".
Maybe using "access time" in "ls" is not what we want.
BR,
Henry
Offline
That's because there are some files with 0 as the access time (maybe they were never accessed?). In my script I'm filtering those files, just in case.
The output of the python script I've posted seem to make sense. I'm working on a new version of the script, with some command-line parameters, I'll post it here later.
Offline
lupac-0.0.4a is here. It has command-line parameters and a fancy table formatting.
you can get it here: http://files.venox.qc.to/lupac-0.0.4a
Usage: ./lupac-0.0.4a [OPTIONS]
-a <N>, --ago=<N> lists only packages that have not been used for <N> days or more
-q, --quiet don't display status messages
-n, --notable don't display output in a fancy table
-h, --help show this help screen
Offline
I've released the new 0.0.5a version of the script and posted it here: http://bbs.archlinux.org/viewtopic.php?pid=330560
Now it looks for dependencies on the packages so the results are much more accurate.
Last edited by venox (2008-02-14 21:37:31)
Offline