You are not logged in.
I dealt with the packages in Arch, wrote the script. Then I saw an old script from Allan McRae (on the forum) and once again rewrote everything from scratch.
Most importantly that now all calculated values are not hidden from the user but they are printed in the table.
I have decided to upload the script to repository - maybe it will be useful to someone.
Repository: https://github.com/AndreyBalandin/archlinux-pkgsizes/
AUR: https://aur.archlinux.org/packages/pkgsizes/
Only Python 3.6 is required.
The result of the script:
Name Installed_Size Depends_On Full_Size Used_By Shared_Size Relative_Size
libreoffice-still 416.7MiB 161 1.4GiB 0 0.0 638.0MiB
chromium 161.1MiB 214 1.1GiB 0 0.0 189.3MiB
.....
glibc 41.4MiB 4 51.1MiB 728 58.2KiB 41.4MiB
icu 35.1MiB 9 202.9MiB 155 231.9KiB 35.4MiB
The 'Relative_Size' column is the most interesting - it answers the question "How much does the installed package actually take?"
You need to consider not only the own size of the package but also the sizes of all its dependencies.
Summarizing the sizes of all dependencies is also wrong because dependent packages can be used by dozens of packages.
Thus, the relatively fair package size should contain its own size and those portions of the dependent packages that are distributed to theirs 'owners'.
Copy the script to your current directory:
curl -LO https://github.com/AndreyBalandin/archlinux-pkgsizes/raw/master/pkgsizes.py
(or you can install it from AUR)
Run script and save the table to a file:
python3 pkgsizes.py > pkgsizes.txt
Examples of usage are in README.md
Offline
1. If I pipe the output of your script to something useful, instead of writing it to a file, it complains about a broken pipe.
2. cat textfile | awk ... | column | less ?
Offline
Awebb, Do you try this from README.md?
View the whole table:
cat pkgsizes.txt | column -t | less
The first 20 lines are the packages with the largest relative sizes:
cat pkgsizes.txt | head -20 | column -t | less
Filter packages with Python:
cat pkgsizes.txt | grep python | column -t | less
Sort by Installed_Size(2) descending:
cat pkgsizes.txt | sort -hrk 2 | column -t | less
Show only columns Name(1) and Relative_Size(7):
cat pkgsizes.txt | awk '{print $1" "$7}' | column -t | less
Output Name(1) and Relative_Size(7) for those rows for which the Used_By (5) field is 0, i.e. packages not used by other packages:
cat pkgsizes.txt | awk '$5 == 0 { print $1" "$7 }' | column -t | less
Offline
Yes, no, I tried to see, if you can use it without writing a text file. You cannot. Something is wrong with it and breaks a pipe. It explains, why you went through great lengths to explain how to cat stuff through awk, which will most likely send chills down most people's spines.
Offline
Cool script, thanks for sharing!
A quick comment about your AUR package though, it is, for all intents and purposes, a -git package. You should either change it to pull the tagged versions from your repository, or rename the package to pkgsizes-git and use a pkgver() function.
Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD
Making lemonade from lemons since 2015.
Offline
Yes, no, I tried to see, if you can use it without writing a text file. You cannot. Something is wrong with it and breaks a pipe.
python pkgsizes.py | column -t | less
Works fine.
Offline
You should either change it to pull the tagged versions from your repository, or ...
I pull the version directly from my script (which is similar to tagged commit). So I decided not to use the git suffix.
Offline
Example of a broken pipe:
$ ./pkgsizes.py | head
Reading local database...
Processing dependency trees for packages...
...
Traceback (most recent call last):
File "./pkgsizes.py", line 243, in <module>
output(packages)
File "./pkgsizes.py", line 229, in output
humanize(pkg.relative_size), sep='\t')
BrokenPipeError: [Errno 32] Broken pipe
Offline
The script reads information about all installed packages in the system. It's just not convenient to run it several times with the pipes.
I wrote a reasonable usage pattern.
Offline
Ah, I didn't notice you already had a pkgver function. Note that extracting the metadata from the script itself is not a great solution; it is not inconceivable that you will forget to update the metadata for a minor fix, at which point it will be possible to have to have two packages, built on different days, with the exact same pkgver, but a different script inside. It is far more reliable to use the git-controlled metadata as the source for your pkgver.
In either case you need the -git suffix.
Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD
Making lemonade from lemons since 2015.
Offline
...
In either case you need the -git suffix.
I understand. I can remake PKGBUILD.
Just wondering which option to use in this simple case:
there is one script that you need to install on the user's computer.
Why download the entire repository? Why should user have 'git' on his computer?
Offline
If the user wants to use VCS packages, then they should be willing to install the required VCS package to facilitate that, even if they remove it immedately after building the package.
According to your wonderful script, git only uses 37.1MiB, relatively, on my system. Admititngly, that is a large amount of space if you only use git to facilitate the installation of this one package, but in the grand scheme of things, 40MiB is nothing on a modern HDD (or even SSD).
Of course, you can go the other way, and turn your existing PKGBUILD into a static versioned package. Change the source to
https://raw.githubusercontent.com/AndreyBalandin/archlinux-pkgsizes/v${pkgver}/pkgsizes.py
and drop the pkgver() function. Then, whenever you make a new tag, you just need to increment the static pkgver variable in your PKGBUILD. Then the user won't need git installed.
Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD
Making lemonade from lemons since 2015.
Offline
WorMzy, thank you for such a detailed explanation!
I will do as you suggest.
Offline
source=("https://github.com/AndreyBalandin/archlinux-pkgsizes/raw/master/$pkgname.py") md5sums=('SKIP') pkgver() { cd $srcdir # extract version from script with pattern: $pkgname v(XX.YY.ZZ) sed -n "0,/.*$pkgname v\([0-9]\+.[0-9]\+.[0-9]\+\).*/s//\1/p" $pkgname.py }
Any package that always checks out the latest sources from a VCS repository is a VCS package and must include the appropriate VCS suffix in the pkgname to conform to our packaging guidelines. Non-VCS packages should download the same version of the source files every time a package is built, or exit with an error if the source files have changed (detected via checksums). You have two options to fix the PKGBUILD:
Modify the current PKGBUILD to check out a fixed (possibly tagged) commit or archive. For example, you can use version tags in your git repo and set the pkgver to the appropriate tag and then use the $pkgver variable in the download URL. In this case, you update the PKGBUILD every time there is a new official release.
Change the name to add the "-git" suffix (you will need to re-upload the package under the new name and request the deletion of the existing package) and add a proper pkgver function that derives a pkgver from the commit without inspecting the source file. See the VCS package guidelines for examples.
A pkgver should uniquely identify the contents of the package. If you want to use a release version, then the source should be static for that version. If you want to track a VCS branch, then the pkgver must derive from the commit. You can track a branch that only uses tagged commits (e.g. via a "release" branch), but the version should never be parsed from the source files directly because, as mentioned above, you may forget to update that version before a commit. Besides, it doesn't make much sense to manually update such a version in the source code when using a VCS. If you need the version to appear in the source code for later display, you could use Git hooks to e.g. insert the most recent tag on the current branch into the source code.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
Modify the current PKGBUILD to check out a fixed (possibly tagged) commit or archive...
Yes, I chose this option.
As WorMzy suggested to me.
PKGBUILD updated: https://aur.archlinux.org/packages/pkgsizes/
Offline
PKGBUILD updated: https://aur.archlinux.org/packages/pkgsizes/
That's much better. Please also change md5sum to sha512sum and replace 'SKIP' with the actual checksum.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
change md5sum to sha512sum
Done.
Offline