You are not logged in.

#1 2009-05-30 06:37:51

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

bigpkg - find packages that require a lot of space on your system

Edit: with v0.2 this now runs fast!  (less than 1/2 second)

Want to know which packages take up the most space on your system?  Well, you could just use:

pacman -Qi | grep -e "Name" -e "Installed Size"

and sort it yourself (or do some awk stuff for it to be sorted for you...).  But that does not really tell you the "actual" amount of space required for a package.  e.g. on my system skype is the only package using qt so having skype installed costs me at least 105Mb (~20Mb for skype and ~85Mb for qt). 

I try to address this with bigpkg (requires python).  It works by calculating the size of a package as its actual size plus the sum of its share of its dependencies (for each dependency, add the dep size divided by the number of packages needing that dep).

This is the output of the top 10 packages on my system:

...
deluge: 81306.0K
jabref: 83594.0K
wine: 85750.0K
r: 88727.0K
kernel26: 99609.0K
skype: 106191.0K
acroread: 150832.0K
neverball: 191562.0K
texlive-bibtexextra: 206465.0K
openoffice-base: 370234.0K

The size shown does not reflect how much can be saved by removing the package for two reasons:
1. Bad handling of dependency chains.  If we have a dep chain A -> B -> C and packages B,C are only required by A, the contribution of C will only be size(C)/2 as both A and B have dependencies on C.
2. Each package is attributed a portion of its dependencies size (e.g. on my system, jabref is one of two packages using openjdk6 so removing it will not save 83MB as ~75 of that is still needed for the other package requiring openjdk6.)

Last edited by Allan (2009-05-31 13:48:30)

Offline

#2 2009-05-30 08:12:19

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: bigpkg - find packages that require a lot of space on your system

trying this out now, btw, you can speed it up by tracking the dependencies yourself.
if i get time later i'll try and do that, it's along the lines of
read all installed packages (/var/lib/pacman/local/*) into a dictionary of objects holding the necessary info
then after that go through it and for each packages dependency pull the size required details out packages[dependency]
it shouldn't use too much memory, and each file is read only once, part of the slowness is that pactree is called os many times, and it must re-read every file it needs very time

Offline

#3 2009-05-30 08:16:46

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

yeah - I have a more efficient implementation (doing what you suggest) but I need to fix a small bug in it...

Offline

#4 2009-05-30 09:40:01

Army
Member
Registered: 2007-12-07
Posts: 1,784

Re: bigpkg - find packages that require a lot of space on your system

.
.
.
        yasm
        zip
        zlib
Assessing package sizes...
Calculating package usage...
Traceback (most recent call last):
  File "/home/army/.bin/bigpkg", line 52, in <module>
    pkg_usage[pkg] = install_size[pkg]
KeyError: 'aaphoto'

Python and pacman-contrib are installed, aaphoto is my first package alphabetically.

Last edited by Army (2009-05-30 09:40:39)

Offline

#5 2009-05-30 10:03:47

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

Hmmm....  not robust to non-English pacman usage!  Another reason to move to directly reading from /var/lib/pacman/local

Replace line 35:

    pkg_info = os.popen("pacman -Qi " + pkg + " 2> /dev/null").read().split("\n")

with

    pkg_info = os.popen("LC_ALL=C pacman -Qi " + pkg + " 2> /dev/null").read().split("\n")

and that should fix it.

Edit: I uploaded a version of the script with that fixed.

Offline

#6 2009-05-30 13:02:15

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: bigpkg - find packages that require a lot of space on your system

#!/usr/bin/env python
#
# bigpkg : find packages that require a lot of space on your system
# v0.1 (2009-05-30)
#
# Copyright (C) 2009 Allan McRae <allan@archlinux.org>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

import os, re

DB_LOCAL_PATH = '/var/lib/pacman/local'
RX_NAME = re.compile('([\w-]+)', re.UNICODE)


pkg_list = {}
pkg_deps = {}
pkg_alias = {}
install_size = {}
needed_by = {}

def parse_options(path):
    options = {}
    section = None
    
    fp = open(path)
    for ln in fp.readlines():
        ln = ln.strip()
        if not ln:
            continue
        
        if ln[0] == '%':
            section = ln[1:-1]
            options[section] = []
        elif section:
            mat = RX_NAME.match(ln) # get rid of the version comparison
            if mat:
                options[section].append(mat.group(1))
    return options

def track_depends(name):
    deps = pkg_deps.get(name)
    if deps:
        return deps
    for dep in pkg_list.get(name):
        if pkg_list.has_key(dep) and dep not in deps:
            deps.add(dep)
            d_deps = track_depends(dep)
            deps |= d_deps
    return deps


print('Gathering package stats...')

for l_dir in os.listdir(DB_LOCAL_PATH):
    opt = parse_options('%s/%s/desc' % (DB_LOCAL_PATH, l_dir))
    if opt.has_key('NAME'):
        name = opt['NAME'][0]
        if opt.has_key('SIZE'):
            size = int(opt['SIZE'][0]) / 1024.0
        else:
            size = 0.0
        
        opt = parse_options('%s/%s/depends' % (DB_LOCAL_PATH, l_dir))
        if opt.has_key('DEPENDS'):
            deps = set(opt.get('DEPENDS'))
        else:
            deps = set()
        
        install_size[name] = size
        needed_by[name] = 0
        pkg_list[name] = deps
        pkg_deps[name] = set()
        

print ("Finding package dependencies...")
for pkg in pkg_list:
    pkg_deps[pkg] |= track_depends(pkg)

for pkg in pkg_list:
    for dep in pkg_deps[pkg]:
        needed_by[dep] += 1

pkg_usage = {}
for pkg in pkg_list:
    if needed_by[pkg] == 0:
        pkg_usage[pkg] = install_size[pkg]
        for dep in pkg_deps[pkg]:
            pkg_usage[pkg] += install_size[dep] / needed_by[dep]

pkg_usage = [ [v[1],v[0]] for v in pkg_usage.items()]
pkg_usage.sort()

for pkg in range(len(pkg_usage)):
    print (pkg_usage[pkg][1] + ": " + str(round(pkg_usage[pkg][0])) + "K")

this doesn't track the dependencies recursively so the size output is wrong but it shows openjdk6 which is missed by the original
it also shows more packages than the original, i assume that's because of the lack of recursive dep tracking
useful bits are prolly only the parse_options() and its uses
--
testing the new script

Last edited by kumyco (2009-05-30 14:37:23)

Offline

#7 2009-05-30 13:11:41

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

Thanks, I will have a look at it later.  BTW, the original does not miss openjdk6, it just realises that it is dep for packages needing java_runtime.

Offline

#8 2009-05-30 13:34:48

keenerd
Package Maintainer (PM)
Registered: 2007-02-22
Posts: 647
Website

Re: bigpkg - find packages that require a lot of space on your system

Slightly self promotional thread jacking, but I'd like to point out that

pacgraph --console

does the exact same thing (but without missing random deps), and only takes a few seconds.

It's also written in python (but without pacman-contrib) so feel free to borrow the fast bits of code :-)

Last edited by keenerd (2009-05-30 13:48:24)

Offline

#9 2009-05-30 14:45:02

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: bigpkg - find packages that require a lot of space on your system

updated script with recursive dependency tracking, openjdk still appears in the output so you might wanna see what's going on there.
i changed pkg_list to a dictionary so i keep track of the deps listed in the depends file, then track each deps and push it into pkg_deps
each track tries to retrieve the cached set first, packages listed as provides are not tracked (maybe that has something to do with openjdk appearing)
apart from minor differences with the package sizes it appears to be similar to the output of the original, so i guess it kinda works properly.
it takes less than half a second not sure how much (i doubt it's a lot) is due to disk cache

Offline

#10 2009-05-30 18:00:23

Army
Member
Registered: 2007-12-07
Posts: 1,784

Re: bigpkg - find packages that require a lot of space on your system

@Allan: Yepp, works now smile

Offline

#11 2009-05-31 13:50:46

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

I have uploaded a new version of this script (v0.2).  It parses /var/lib/pacman/local directly to get package information as suggested by kumyco.  The major difference from kumyco's script above is that is handles "provides".  This means the script no longer needs pacman-contrib installed and it is FAST!

Offline

#12 2009-05-31 14:10:54

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: bigpkg - find packages that require a lot of space on your system

great, after looking at the output a little closer, i finally went about removing some of the stuff i don't use, now i have about 500mb more to download other useless stuff. takes a second and a half after a cold boot

Offline

#13 2009-06-06 03:03:28

japetto
Member
From: Chicago, IL US
Registered: 2006-07-02
Posts: 183

Re: bigpkg - find packages that require a lot of space on your system

Great package.  Thanks Allan!

Offline

#14 2009-07-07 20:56:22

Pank
Member
From: IT
Registered: 2009-06-13
Posts: 371

Re: bigpkg - find packages that require a lot of space on your system

Hi,
Useful script!
Could this easily be adapted to output human readable usage-estimates. I.e. similar to du -h?
Otherwise. would getting the number in megabytes just be a matter of changing 1024 in line 69?
Cheers,
Rasmus


Arch x64 on Thinkpad X200s/W530

Offline

#15 2009-07-07 21:23:00

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

Adding an extra "/1024", will make it into MB.  Not sure if I will ever get around to automatically adjusting large values to MB though...

Offline

#16 2009-07-07 21:57:49

Renan Birck
Member
From: Brazil
Registered: 2007-11-11
Posts: 401
Website

Re: bigpkg - find packages that require a lot of space on your system

I was able to get about 200MB back. Thanks for this app.

Offline

#17 2010-05-11 06:16:14

fphillips
Member
From: Austin, TX
Registered: 2009-01-24
Posts: 202

Re: bigpkg - find packages that require a lot of space on your system

It found these old leftover files.

[fp@viron ~]$ ./bigpkg.sh 
Traceback (most recent call last):
  File "./bigpkg.sh", line 81, in <module>
    parse_package_info(pkg, dir)
  File "./bigpkg.sh", line 37, in parse_package_info
    f = open(file)
IOError: [Errno 20] Not a directory: '/var/lib/pacman/local/xf86-video-vesa-2.2.0-1/depends'

[fp@viron ~]$ ls -ld /var/lib/pacman/local/*|grep -v ^d
brwxrwxr-- 1 1919090633 1814788876 128, 165 Apr  5  1935 /var/lib/pacman/local/supertuxkart-0.6.1a-1
s-w-r--rw- 1 2950775117  642501627        0 Jan 26  1992 /var/lib/pacman/local/xf86-video-vesa-2.2.0-1
p-wx-wSrwx 1 3455059560  573774742        0 Dec 20  1949 /var/lib/pacman/local/xorg-fonts-alias-1.0.1-2

Last edited by fphillips (2010-05-11 06:25:02)

Offline

#18 2010-05-11 06:18:13

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

No, it appears you have lost your local pacman database.   Try a "pacman -Q" and see what pacman thinks you have installed...

Offline

#19 2010-05-11 06:26:11

fphillips
Member
From: Austin, TX
Registered: 2009-01-24
Posts: 202

Re: bigpkg - find packages that require a lot of space on your system

They were just old files from previous package versions that didn't get cleaned up.

Offline

#20 2010-05-11 06:28:18

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

Ah...  pacman-3.4 (when it is released) will flags those for you.  You can just delete them yourself.

Offline

#21 2010-05-11 09:04:19

lymphatik
Member
From: Somewhere else
Registered: 2009-03-07
Posts: 119

Re: bigpkg - find packages that require a lot of space on your system

It seems that it is not working as for xmonad I end up with xmonad-contrib: 173176K

Otherwise pacgraph give me: 709MB xmonad-contrib which account for ghc (600 mb)

Offline

#22 2010-05-11 11:08:00

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

Read the limitations in the intro

Offline

#23 2010-05-26 10:37:24

Japanlinux
Member
Registered: 2010-05-18
Posts: 173

Re: bigpkg - find packages that require a lot of space on your system

This is a nice script, If I understand correctly, the results given to me are a quantitative sum, but a qualitative one: it's showing how space-efficient packages are? which will usually put skype near the top, since, for many people, it's the only program to use qt. Is this close to the real concept?

I also have a request. This is kinda a follow-up on this script. Once i see what the big programs are, I can query what it's dependencies are. Then I can choose a dependency and -Qi it as well to see what programs use it. My request here: is there a way you can make a small script for me to show what packages use a choosen package without all the extra info of -i?

ex: (CMD) qt
Results-- skype opera qbittorrent

I would appreciate it. Of course, if there is already an easy way to do this, I would like to known. the pacman man pages didn't seem to show a way...


btw, you recommended this program to me on another thread; I wanted to thank you for that big_smile

Offline

#24 2010-05-26 10:56:52

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: bigpkg - find packages that require a lot of space on your system

Japanlinux wrote:

This is a nice script, If I understand correctly, the results given to me are a quantitative sum, but a qualitative one: it's showing how space-efficient packages are? which will usually put skype near the top, since, for many people, it's the only program to use qt. Is this close to the real concept?

Correct.  It does some sort of weighting of the dependencies across the packages that need them.

Japanlinux wrote:

I also have a request. This is kinda a follow-up on this script. Once i see what the big programs are, I can query what it's dependencies are. Then I can choose a dependency and -Qi it as well to see what programs use it. My request here: is there a way you can make a small script for me to show what packages use a choosen package without all the extra info of -i?

ex: (CMD) qt
Results-- skype opera qbittorrent

I would appreciate it. Of course, if there is already an easy way to do this, I would like to known. the pacman man pages didn't seem to show a way...

There is not think there is an easy way to do this with pacman.  Try this (change glibc to the package you want):

grep "^glibc$" $(find /var/lib/pacman/local/ -name depends) | cut -f6 -d"/" | cut -f1 -d"-" | sort

Offline

#25 2010-05-26 11:11:10

Japanlinux
Member
Registered: 2010-05-18
Posts: 173

Re: bigpkg - find packages that require a lot of space on your system

o.O that works very well. thank you.  wink As a quick result, I figured out why virtualbox is so high on my list. It has gcc ~80mb) which, according script you just gave me, only VB uses. Both scripts in conjunction are very helpful for analyzing my system big_smile

Last edited by Japanlinux (2010-05-26 11:14:31)

Offline

Board footer

Powered by FluxBB