You are not logged in.

#1 2009-02-15 23:14:56

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

A script to find optimal pkg combinations.

Users are often surprised when they go to install a simple program and get smacked by a "total installed size: xxx MB" message once pacman figures out which deps you need. Often though, this is trivial because you'll need want several other apps that share some of those dependencies so the overall cost is less than the sum of the individual installations.

I've written a script that can calculate the combinations of packages from different sets that will produce the smallest installation size. It's easier to make this clear with a few examples:

Let's say that you want a torrent client, a plotting program, an editor, a multimedia player, a file manager and an ftp client. You've narrowed your choices down as follows:
torrent client: transmission-gtk or deluge
plotting program: qtiplot or labplot
editor: gvim, emacs, medit or geany... but if you choose emacs, you want to have leafpad too
multimedia player: kaffeine, vlc or kdemultimedia
file manager: pcmanfm or thunar, but if you choose thunar, you want feh to set the desktop bg because unlike pcmanfm, it can't
ftp client: filezilla,  kftpgrabber or gproftpd

figuring out which combination will give you the smallest install size would be a pain manually. To calculate this with the script, you would create a text file as follows:

transmission-gtk deluge
qtiplot labplot
gvim emacs+leafpad medit geany
kaffeine vlc kdemultimedia
pcmanfm thunar+feh
filezilla kftpgrabber gproftpd

and then run

/path/to/script /path/to/file

Each line is a separate category. Also note the use of "+" to express that two or more packages go together. Pass the path of the file to the script and it will calculate the optimal combination based on your current installation and show you the additional size (if multiple combinations end up being the same size, they will all be returned). If  you want it to ignore installed packages and calculate the total size, invoke it with

/path/to/script /path/to/file 0

As another example (using groups), let's say that you have Xfce4 and want to move to either gnome or kde with openbox as your WM, vlc as your media player and either firefox or opera as your browser:

gnome kde
openbox+vlc
firefox opera

(openbox and vlc could have been on separate lines)

Here's the script. If there's interest in it, I'll flesh out the options, add a help message, package it and add it to my repo and the AUR. Also, I don't have a name for it yet. It was named "calcdep" while I wrote it and it's saved as "pkg-compare_installed_size" on my system for now, so suggestions are welcome along with any other feedback.

#!/usr/bin/perl
use strict;
use warnings;

#############################
########## GLOBALS ##########
#############################

my $pkg_info;
my $group_hash;
my $provides_hash;

#####################################
########## GET PACMAN INFO ##########
#####################################

# build hash that contains info from the available pacman repos
my $name = '';
my $sync_info = `LC_ALL=C pacman -Si`;
$sync_info =~ s/\n\s//g;
foreach my $line (split "\n",$sync_info)
{
  if ($line =~ m/^Name\s+:\s+(\S+)/)
  {
    $name = $1;
  }
  elsif ($name eq '')
  {
    next;
  }
  elsif ($line =~ m/^Groups\s+:\s+(\S+)/)
  {
    foreach my $group (split (/\s+/,$1))
    {
      push @{ $group_hash->{$group} }, $name if ($group ne 'None');
    }
  }
  elsif ($line =~ m/^Depends On\s+:\s+(\S.*)/)
  {
    $pkg_info->{$name}->{'deps'} = [map {s/[>=]{1,2}.+$//;$_} split (/\s+/,$1)] if ($1 ne 'None');
  }
  elsif ($line =~ m/^Provides\s+:\s+(\S.*)/)
  {
    foreach my $prov (map {s/[>=]{1,2}.+$//;$_} split (/\s+/,$1))
    {
      push @{ $provides_hash->{$prov} }, $name if ($prov and $prov ne 'None');
    }
  }
  elsif ($line =~ m/^Installed Size\s+:\s+(\d+\.\d\d)/)
  {
    $pkg_info->{$name}->{'size'} = $1;
  }
}

# check which pkgs are installed
foreach my $pkg (split "\n",`pacman -Qq`)
{
  $pkg_info->{$pkg}->{'installed'} = 1;
}




##########################
########## MAIN ##########
##########################

my ($file,$skip_installed) = @ARGV;
$skip_installed = 1 if not defined $skip_installed;

my $categories;
open (my $fh, '<', $file) or die "unable to open file $file: $!\n";
while (defined(my $line=<$fh>))
{
  chomp $line;
  next if ($line =~ m/^\s*$/ or $line =~ m/^\s*#/);
  $categories->[$.] = [split /\s+/,$line];
}
close $fh;

my $results;
my $best;

&compute_size;


my @results = grep {$results->{$_} == $best} keys %{$results};
my $n = scalar @results;

printf "minimum size: %10.2f K\nsets: %20d\n", $best,$n;
my $i = 1;
for my $result (@results)
{
  $result =~ s/:=:/ /g;
  print "set $i: $result\n";
  $i += 1;
}



##########################
########## SUBS ##########
##########################

# appends deps to a list of pkgs
sub add_deps
{
  return keys %{&get_list_hash(\@_)};
}

# compute total size for each combination of pkgs
sub compute_size
{
  my ($i,@list) = @_;
  $i = 1 if not defined $i;
  foreach my $alternative (@{$categories->[$i]})
  {
    if (exists($categories->[$i+1]))
    {
      &compute_size($i+1,@list,$alternative);
    }
    else
    {
      my @true_list = map {split (/\+/, $_)} (@list,$alternative);
      my $size = &get_list_size(@true_list);
      $best = $size if (not defined($best) or $size < $best);
      $results->{join ':=:', @true_list}=$size;
    }
  }
}

# recurses the dep tree
sub get_list_hash
{
  my ($pkgs,$checked,$level) = @_;
  $checked = {} if not defined $checked;
  $level = 0 if not defined $level;
  foreach my $pkg (@{$pkgs})
  {
    #print (("\t" x $level),$pkg,"\n");
    if (not exists $pkg_info->{$pkg})
    {
      if (exists($provides_hash->{$pkg}))
      {
        return &get_list_hash([$provides_hash->{$pkg}->[0]],$checked,$level+1);
      }
      elsif (exists($group_hash->{$pkg}))
      {
        return &get_list_hash($group_hash->{$pkg}, $checked,$level+1);
      }
      else
      {
        print STDERR "unknown pkg: $pkg\n";
        return $checked;
      }
    }
    next if ((exists $checked->{$pkg}) or ($skip_installed and exists($pkg_info->{$pkg}->{'installed'})));
    $checked->{$pkg} += 1;
    $checked = &get_list_hash($pkg_info->{$pkg}->{'deps'}, $checked,$level+1);
  }
  return $checked;
}

# computes total size for a given list of pkgs plus their deps
sub get_list_size
{
  return &get_size(&add_deps(@_));
}

# get total size for a given list
sub get_size
{
  my $size = 0;
  foreach my $pkg (@_)
  {
    $size += $pkg_info->{$pkg}->{'size'};
  }
  return $size;
}

*edit*
Added a line to support comments beginning with "#" and to ignore empty lines.

Last edited by Xyne (2010-02-15 07:47:16)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#2 2009-02-15 23:16:13

haxit
Member
From: /home/haxit
Registered: 2008-03-04
Posts: 1,247
Website

Re: A script to find optimal pkg combinations.

Woah! Awesome!


Archi686 User | Old Screenshots | Old .Configs
Vi veri universum vivus vici.

Offline

#3 2009-05-30 18:35:20

demian
Member
From: Frankfurt, Germany
Registered: 2009-05-06
Posts: 709

Re: A script to find optimal pkg combinations.

Thank you, this is very useful.


no place like /home
github

Offline

#4 2010-02-14 22:20:17

markp1989
Member
Registered: 2008-10-05
Posts: 431

Re: A script to find optimal pkg combinations.

this is helpfull, thanks big_smile


Desktop: E8400@4ghz - DFI Lanparty JR P45-T2RS - 4gb ddr2 800 - 30gb OCZ Vertex - Geforce 8800 GTS - 2*19" LCD
Server/Media Zotac GeForce 9300-ITX I-E - E5200 - 4gb Ram - 2* ecogreen F2 1.5tb - 1* wd green 500gb - PicoPSU 150xt - rtorrent - xbmc - ipazzport remote - 42" LCD

Offline

#5 2010-02-14 22:32:49

Stythys
Member
From: SF Bay Area
Registered: 2008-05-18
Posts: 878
Website

Re: A script to find optimal pkg combinations.

Xyne++


[home page] -- [code / configs]

"Once you go Arch, you must remain there for life or else Allan will track you down and break you."
-- Bregol

Offline

#6 2010-02-14 22:34:49

cool
Member
Registered: 2008-09-12
Posts: 111
Website

Re: A script to find optimal pkg combinations.

Xyne you are a blessing for Arch Linux, your work is always appreciated

Offline

#7 2010-02-14 23:42:06

raf_kig
Member
Registered: 2008-11-28
Posts: 143

Re: A script to find optimal pkg combinations.

Nice work - one little thing i noticed:
It won't work when LANG/LC_ALL!=default

--- test.pl.orig    2010-02-15 00:40:30.464972741 +0100
+++ test.pl    2010-02-15 00:39:33.329154231 +0100
@@ -16,7 +16,7 @@
 
 # build hash that contains info from the available pacman repos
 my $name = '';
-my $sync_info = `pacman -Si`;
+my $sync_info = `LC_ALL=C pacman -Si`;
 $sync_info =~ s/\n\s//g;
 foreach my $line (split "\n",$sync_info)
 {

Regards,

raf

Offline

#8 2010-02-15 07:47:35

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: A script to find optimal pkg combinations.

updated, thanks smile


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

Board footer

Powered by FluxBB