pkgsizes - find out the relatively FAIR package sizes in your system!

dervish · 2018-06-25 10:14:02

I dealt with the packages in Arch, wrote the script. Then I saw an old script from Allan McRae (on the forum) and once again rewrote everything from scratch.
Most importantly that now all calculated values are not hidden from the user but they are printed in the table.

I have decided to upload the script to repository - maybe it will be useful to someone.
Repository: https://github.com/AndreyBalandin/archlinux-pkgsizes/
AUR: https://aur.archlinux.org/packages/pkgsizes/
Only Python 3.6 is required.

The result of the script:

Name               Installed_Size  Depends_On  Full_Size  Used_By  Shared_Size  Relative_Size
libreoffice-still  416.7MiB        161         1.4GiB     0        0.0          638.0MiB
chromium           161.1MiB        214         1.1GiB     0        0.0          189.3MiB
.....
glibc               41.4MiB        4           51.1MiB    728      58.2KiB      41.4MiB
icu                 35.1MiB        9           202.9MiB   155      231.9KiB     35.4MiB

The 'Relative_Size' column is the most interesting - it answers the question "How much does the installed package actually take?"
You need to consider not only the own size of the package but also the sizes of all its dependencies.
Summarizing the sizes of all dependencies is also wrong because dependent packages can be used by dozens of packages.
Thus, the relatively fair package size should contain its own size and those portions of the dependent packages that are distributed to theirs 'owners'.

Copy the script to your current directory:

curl -LO https://github.com/AndreyBalandin/archlinux-pkgsizes/raw/master/pkgsizes.py

(or you can install it from AUR)

Run script and save the table to a file:

python3 pkgsizes.py > pkgsizes.txt

Examples of usage are in README.md

Awebb · 2018-06-25 11:01:45

1. If I pipe the output of your script to something useful, instead of writing it to a file, it complains about a broken pipe.

2. cat textfile | awk ... | column | less ?

dervish · 2018-06-25 11:20:33

Awebb, Do you try this from README.md?

View the whole table:

cat pkgsizes.txt | column -t | less

The first 20 lines are the packages with the largest relative sizes:

cat pkgsizes.txt | head -20 | column -t | less

Filter packages with Python:

cat pkgsizes.txt | grep python | column -t | less

Sort by Installed_Size(2) descending:

cat pkgsizes.txt | sort -hrk 2 | column -t | less

Show only columns Name(1) and Relative_Size(7):

cat pkgsizes.txt | awk '{print $1" "$7}' | column -t | less

Output Name(1) and Relative_Size(7) for those rows for which the Used_By (5) field is 0, i.e. packages not used by other packages:

cat pkgsizes.txt | awk '$5 == 0 { print $1" "$7 }' | column -t | less

Awebb · 2018-06-25 11:28:59

Yes, no, I tried to see, if you can use it without writing a text file. You cannot. Something is wrong with it and breaks a pipe. It explains, why you went through great lengths to explain how to cat stuff through awk, which will most likely send chills down most people's spines.

Try: http://catb.org/jargon/html/U/UUOC.html

WorMzy · 2018-06-25 11:29:22

Cool script, thanks for sharing!

A quick comment about your AUR package though, it is, for all intents and purposes, a -git package. You should either change it to pull the tagged versions from your repository, or rename the package to pkgsizes-git and use a pkgver() function.

https://wiki.archlinux.org/index.php/VC … guidelines

dervish · 2018-06-25 11:36:27

Awebb wrote:

Yes, no, I tried to see, if you can use it without writing a text file. You cannot. Something is wrong with it and breaks a pipe.

python pkgsizes.py | column -t | less

Works fine.

dervish · 2018-06-25 11:43:52

WorMzy wrote:

You should either change it to pull the tagged versions from your repository, or ...

I pull the version directly from my script (which is similar to tagged commit). So I decided not to use the git suffix.

Awebb · 2018-06-25 11:49:34

Example of a broken pipe:

$ ./pkgsizes.py | head
Reading local database...
Processing dependency trees for packages...

...

Traceback (most recent call last):
  File "./pkgsizes.py", line 243, in <module>
    output(packages)
  File "./pkgsizes.py", line 229, in output
    humanize(pkg.relative_size), sep='\t')
BrokenPipeError: [Errno 32] Broken pipe

dervish · 2018-06-25 12:06:07

The script reads information about all installed packages in the system. It's just not convenient to run it several times with the pipes.
I wrote a reasonable usage pattern.

WorMzy · 2018-06-25 12:22:15

Ah, I didn't notice you already had a pkgver function. Note that extracting the metadata from the script itself is not a great solution; it is not inconceivable that you will forget to update the metadata for a minor fix, at which point it will be possible to have to have two packages, built on different days, with the exact same pkgver, but a different script inside. It is far more reliable to use the git-controlled metadata as the source for your pkgver.

In either case you need the -git suffix.

dervish · 2018-06-25 12:37:45

WorMzy wrote:

...
In either case you need the -git suffix.

I understand. I can remake PKGBUILD.
Just wondering which option to use in this simple case:
there is one script that you need to install on the user's computer.
Why download the entire repository? Why should user have 'git' on his computer?

WorMzy · 2018-06-25 13:00:20

If the user wants to use VCS packages, then they should be willing to install the required VCS package to facilitate that, even if they remove it immedately after building the package.

According to your wonderful script, git only uses 37.1MiB, relatively, on my system. Admititngly, that is a large amount of space if you only use git to facilitate the installation of this one package, but in the grand scheme of things, 40MiB is nothing on a modern HDD (or even SSD).

Of course, you can go the other way, and turn your existing PKGBUILD into a static versioned package. Change the source to

https://raw.githubusercontent.com/AndreyBalandin/archlinux-pkgsizes/v${pkgver}/pkgsizes.py

and drop the pkgver() function. Then, whenever you make a new tag, you just need to increment the static pkgver variable in your PKGBUILD. Then the user won't need git installed.

dervish · 2018-06-25 13:22:26

WorMzy, thank you for such a detailed explanation!
I will do as you suggest.

Xyne · 2018-06-25 13:44:01

pkgsizes PKGBUILD wrote:

source=("https://github.com/AndreyBalandin/archlinux-pkgsizes/raw/master/$pkgname.py")
md5sums=('SKIP')

pkgver() {
    cd $srcdir
    # extract version from script with pattern: $pkgname v(XX.YY.ZZ)
    sed -n "0,/.*$pkgname v\([0-9]\+.[0-9]\+.[0-9]\+\).*/s//\1/p" $pkgname.py
}

Any package that always checks out the latest sources from a VCS repository is a VCS package and must include the appropriate VCS suffix in the pkgname to conform to our packaging guidelines. Non-VCS packages should download the same version of the source files every time a package is built, or exit with an error if the source files have changed (detected via checksums). You have two options to fix the PKGBUILD:

Modify the current PKGBUILD to check out a fixed (possibly tagged) commit or archive. For example, you can use version tags in your git repo and set the pkgver to the appropriate tag and then use the $pkgver variable in the download URL. In this case, you update the PKGBUILD every time there is a new official release.
Change the name to add the "-git" suffix (you will need to re-upload the package under the new name and request the deletion of the existing package) and add a proper pkgver function that derives a pkgver from the commit without inspecting the source file. See the VCS package guidelines for examples.

A pkgver should uniquely identify the contents of the package. If you want to use a release version, then the source should be static for that version. If you want to track a VCS branch, then the pkgver must derive from the commit. You can track a branch that only uses tagged commits (e.g. via a "release" branch), but the version should never be parsed from the source files directly because, as mentioned above, you may forget to update that version before a commit. Besides, it doesn't make much sense to manually update such a version in the source code when using a VCS. If you need the version to appear in the source code for later display, you could use Git hooks to e.g. insert the most recent tag on the current branch into the source code.

dervish · 2018-06-25 13:59:38

Xyne wrote:

Modify the current PKGBUILD to check out a fixed (possibly tagged) commit or archive...

Yes, I chose this option.
As WorMzy suggested to me.
PKGBUILD updated: https://aur.archlinux.org/packages/pkgsizes/

Xyne · 2018-06-25 22:03:38

dervish wrote:

PKGBUILD updated: https://aur.archlinux.org/packages/pkgsizes/

That's much better. Please also change md5sum to sha512sum and replace 'SKIP' with the actual checksum.

dervish · 2018-06-26 05:16:17

Xyne wrote:

change md5sum to sha512sum

Done.

Arch Linux

#1 2018-06-25 10:14:02

pkgsizes - find out the relatively FAIR package sizes in your system!

#2 2018-06-25 11:01:45

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#3 2018-06-25 11:20:33

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#4 2018-06-25 11:28:59

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#5 2018-06-25 11:29:22

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#6 2018-06-25 11:36:27

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#7 2018-06-25 11:43:52

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#8 2018-06-25 11:49:34

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#9 2018-06-25 12:06:07

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#10 2018-06-25 12:22:15

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#11 2018-06-25 12:37:45

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#12 2018-06-25 13:00:20

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#13 2018-06-25 13:22:26

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#14 2018-06-25 13:44:01

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#15 2018-06-25 13:59:38

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#16 2018-06-25 22:03:38

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

#17 2018-06-26 05:16:17

Re: pkgsizes - find out the relatively FAIR package sizes in your system!

Board footer