You are not logged in.
Edit: with v0.2 this now runs fast! (less than 1/2 second)
Want to know which packages take up the most space on your system? Well, you could just use:
pacman -Qi | grep -e "Name" -e "Installed Size"
and sort it yourself (or do some awk stuff for it to be sorted for you...). But that does not really tell you the "actual" amount of space required for a package. e.g. on my system skype is the only package using qt so having skype installed costs me at least 105Mb (~20Mb for skype and ~85Mb for qt).
I try to address this with bigpkg (requires python). It works by calculating the size of a package as its actual size plus the sum of its share of its dependencies (for each dependency, add the dep size divided by the number of packages needing that dep).
This is the output of the top 10 packages on my system:
...
deluge: 81306.0K
jabref: 83594.0K
wine: 85750.0K
r: 88727.0K
kernel26: 99609.0K
skype: 106191.0K
acroread: 150832.0K
neverball: 191562.0K
texlive-bibtexextra: 206465.0K
openoffice-base: 370234.0K
The size shown does not reflect how much can be saved by removing the package for two reasons:
1. Bad handling of dependency chains. If we have a dep chain A -> B -> C and packages B,C are only required by A, the contribution of C will only be size(C)/2 as both A and B have dependencies on C.
2. Each package is attributed a portion of its dependencies size (e.g. on my system, jabref is one of two packages using openjdk6 so removing it will not save 83MB as ~75 of that is still needed for the other package requiring openjdk6.)
Last edited by Allan (2009-05-31 13:48:30)
Online
trying this out now, btw, you can speed it up by tracking the dependencies yourself.
if i get time later i'll try and do that, it's along the lines of
read all installed packages (/var/lib/pacman/local/*) into a dictionary of objects holding the necessary info
then after that go through it and for each packages dependency pull the size required details out packages[dependency]
it shouldn't use too much memory, and each file is read only once, part of the slowness is that pactree is called os many times, and it must re-read every file it needs very time
Offline
yeah - I have a more efficient implementation (doing what you suggest) but I need to fix a small bug in it...
Online
.
.
.
yasm
zip
zlib
Assessing package sizes...
Calculating package usage...
Traceback (most recent call last):
File "/home/army/.bin/bigpkg", line 52, in <module>
pkg_usage[pkg] = install_size[pkg]
KeyError: 'aaphoto'
Python and pacman-contrib are installed, aaphoto is my first package alphabetically.
Last edited by Army (2009-05-30 09:40:39)
Offline
Hmmm.... not robust to non-English pacman usage! Another reason to move to directly reading from /var/lib/pacman/local
Replace line 35:
pkg_info = os.popen("pacman -Qi " + pkg + " 2> /dev/null").read().split("\n")
with
pkg_info = os.popen("LC_ALL=C pacman -Qi " + pkg + " 2> /dev/null").read().split("\n")
and that should fix it.
Edit: I uploaded a version of the script with that fixed.
Online
#!/usr/bin/env python
#
# bigpkg : find packages that require a lot of space on your system
# v0.1 (2009-05-30)
#
# Copyright (C) 2009 Allan McRae <allan@archlinux.org>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
import os, re
DB_LOCAL_PATH = '/var/lib/pacman/local'
RX_NAME = re.compile('([\w-]+)', re.UNICODE)
pkg_list = {}
pkg_deps = {}
pkg_alias = {}
install_size = {}
needed_by = {}
def parse_options(path):
options = {}
section = None
fp = open(path)
for ln in fp.readlines():
ln = ln.strip()
if not ln:
continue
if ln[0] == '%':
section = ln[1:-1]
options[section] = []
elif section:
mat = RX_NAME.match(ln) # get rid of the version comparison
if mat:
options[section].append(mat.group(1))
return options
def track_depends(name):
deps = pkg_deps.get(name)
if deps:
return deps
for dep in pkg_list.get(name):
if pkg_list.has_key(dep) and dep not in deps:
deps.add(dep)
d_deps = track_depends(dep)
deps |= d_deps
return deps
print('Gathering package stats...')
for l_dir in os.listdir(DB_LOCAL_PATH):
opt = parse_options('%s/%s/desc' % (DB_LOCAL_PATH, l_dir))
if opt.has_key('NAME'):
name = opt['NAME'][0]
if opt.has_key('SIZE'):
size = int(opt['SIZE'][0]) / 1024.0
else:
size = 0.0
opt = parse_options('%s/%s/depends' % (DB_LOCAL_PATH, l_dir))
if opt.has_key('DEPENDS'):
deps = set(opt.get('DEPENDS'))
else:
deps = set()
install_size[name] = size
needed_by[name] = 0
pkg_list[name] = deps
pkg_deps[name] = set()
print ("Finding package dependencies...")
for pkg in pkg_list:
pkg_deps[pkg] |= track_depends(pkg)
for pkg in pkg_list:
for dep in pkg_deps[pkg]:
needed_by[dep] += 1
pkg_usage = {}
for pkg in pkg_list:
if needed_by[pkg] == 0:
pkg_usage[pkg] = install_size[pkg]
for dep in pkg_deps[pkg]:
pkg_usage[pkg] += install_size[dep] / needed_by[dep]
pkg_usage = [ [v[1],v[0]] for v in pkg_usage.items()]
pkg_usage.sort()
for pkg in range(len(pkg_usage)):
print (pkg_usage[pkg][1] + ": " + str(round(pkg_usage[pkg][0])) + "K")
this doesn't track the dependencies recursively so the size output is wrong but it shows openjdk6 which is missed by the original
it also shows more packages than the original, i assume that's because of the lack of recursive dep tracking
useful bits are prolly only the parse_options() and its uses
--
testing the new script
Last edited by kumyco (2009-05-30 14:37:23)
Offline
Thanks, I will have a look at it later. BTW, the original does not miss openjdk6, it just realises that it is dep for packages needing java_runtime.
Online
Slightly self promotional thread jacking, but I'd like to point out that
pacgraph --console
does the exact same thing (but without missing random deps), and only takes a few seconds.
It's also written in python (but without pacman-contrib) so feel free to borrow the fast bits of code :-)
Last edited by keenerd (2009-05-30 13:48:24)
Offline
updated script with recursive dependency tracking, openjdk still appears in the output so you might wanna see what's going on there.
i changed pkg_list to a dictionary so i keep track of the deps listed in the depends file, then track each deps and push it into pkg_deps
each track tries to retrieve the cached set first, packages listed as provides are not tracked (maybe that has something to do with openjdk appearing)
apart from minor differences with the package sizes it appears to be similar to the output of the original, so i guess it kinda works properly.
it takes less than half a second not sure how much (i doubt it's a lot) is due to disk cache
Offline
@Allan: Yepp, works now
Offline
I have uploaded a new version of this script (v0.2). It parses /var/lib/pacman/local directly to get package information as suggested by kumyco. The major difference from kumyco's script above is that is handles "provides". This means the script no longer needs pacman-contrib installed and it is FAST!
Online
great, after looking at the output a little closer, i finally went about removing some of the stuff i don't use, now i have about 500mb more to download other useless stuff. takes a second and a half after a cold boot
Offline
Great package. Thanks Allan!
Offline
Hi,
Useful script!
Could this easily be adapted to output human readable usage-estimates. I.e. similar to du -h?
Otherwise. would getting the number in megabytes just be a matter of changing 1024 in line 69?
Cheers,
Rasmus
Arch x64 on Thinkpad X200s/W530
Offline
Adding an extra "/1024", will make it into MB. Not sure if I will ever get around to automatically adjusting large values to MB though...
Online
It found these old leftover files.
[fp@viron ~]$ ./bigpkg.sh
Traceback (most recent call last):
File "./bigpkg.sh", line 81, in <module>
parse_package_info(pkg, dir)
File "./bigpkg.sh", line 37, in parse_package_info
f = open(file)
IOError: [Errno 20] Not a directory: '/var/lib/pacman/local/xf86-video-vesa-2.2.0-1/depends'
[fp@viron ~]$ ls -ld /var/lib/pacman/local/*|grep -v ^d
brwxrwxr-- 1 1919090633 1814788876 128, 165 Apr 5 1935 /var/lib/pacman/local/supertuxkart-0.6.1a-1
s-w-r--rw- 1 2950775117 642501627 0 Jan 26 1992 /var/lib/pacman/local/xf86-video-vesa-2.2.0-1
p-wx-wSrwx 1 3455059560 573774742 0 Dec 20 1949 /var/lib/pacman/local/xorg-fonts-alias-1.0.1-2
Last edited by fphillips (2010-05-11 06:25:02)
Offline
No, it appears you have lost your local pacman database. Try a "pacman -Q" and see what pacman thinks you have installed...
Online
They were just old files from previous package versions that didn't get cleaned up.
Offline
Ah... pacman-3.4 (when it is released) will flags those for you. You can just delete them yourself.
Online
It seems that it is not working as for xmonad I end up with xmonad-contrib: 173176K
Otherwise pacgraph give me: 709MB xmonad-contrib which account for ghc (600 mb)
Offline
Read the limitations in the intro
Online
This is a nice script, If I understand correctly, the results given to me are a quantitative sum, but a qualitative one: it's showing how space-efficient packages are? which will usually put skype near the top, since, for many people, it's the only program to use qt. Is this close to the real concept?
I also have a request. This is kinda a follow-up on this script. Once i see what the big programs are, I can query what it's dependencies are. Then I can choose a dependency and -Qi it as well to see what programs use it. My request here: is there a way you can make a small script for me to show what packages use a choosen package without all the extra info of -i?
ex: (CMD) qt
Results-- skype opera qbittorrent
I would appreciate it. Of course, if there is already an easy way to do this, I would like to known. the pacman man pages didn't seem to show a way...
btw, you recommended this program to me on another thread; I wanted to thank you for that
Offline
This is a nice script, If I understand correctly, the results given to me are a quantitative sum, but a qualitative one: it's showing how space-efficient packages are? which will usually put skype near the top, since, for many people, it's the only program to use qt. Is this close to the real concept?
Correct. It does some sort of weighting of the dependencies across the packages that need them.
I also have a request. This is kinda a follow-up on this script. Once i see what the big programs are, I can query what it's dependencies are. Then I can choose a dependency and -Qi it as well to see what programs use it. My request here: is there a way you can make a small script for me to show what packages use a choosen package without all the extra info of -i?
ex: (CMD) qt
Results-- skype opera qbittorrentI would appreciate it. Of course, if there is already an easy way to do this, I would like to known. the pacman man pages didn't seem to show a way...
There is not think there is an easy way to do this with pacman. Try this (change glibc to the package you want):
grep "^glibc$" $(find /var/lib/pacman/local/ -name depends) | cut -f6 -d"/" | cut -f1 -d"-" | sort
Online
o.O that works very well. thank you. As a quick result, I figured out why virtualbox is so high on my list. It has gcc ~80mb) which, according script you just gave me, only VB uses. Both scripts in conjunction are very helpful for analyzing my system
Last edited by Japanlinux (2010-05-26 11:14:31)
Offline