You are not logged in.
Background: I opened a feature request about a year ago requesting lrz support in pacman/makepkg. A major step is getting decompression support into libarchive; recently some pretty great news has come on this front thanks to Con Kolivas and Michael Blumenkrantz (author of liblrzip). The full libraries that are GPL licensed have been accepted into libarchive which means that the full lrzip support can be added to libarchive!
My Proposal: Lrzip offers pretty significant advantages (speed and compression ratio) over xz. Once libarchive contains the libs and once pacman/makepkg contain the ability to use lrzip, I propose we consider switching from xz to lrz for repo packages. To this end, I have prepared an analysis of the potential savings doing so would offer the Arch Community.
Please have a look at a preliminary deck of slides I spend most of today working on which summarize the data I collected which supports this proposal. Love to hear how folks feel about this (excuse any errors in the slides, I am really tired).
Link to pdf: http://repo-ck.com/bench/lrzip_comparison_to_xz.pdf
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
For me:
I want se a brenchm,ark for advenbtages and disadventages over both
I remmemver that one reazon for the switch is the sice of the packages (less sice less charge to the Server)
Wath is best lrzip or xz in this term
Well, I suppose that this is somekind of signature, no?
Offline
We have a ton of small packages in the repos and the mirrors are doing fine, so there's no immediate need to switch.
My questions:
- How well lrzip works for small packages? Doesn't 'lr' stand for (among other things) 'long range'? It is optimised for large files.
- How stable is it? Will I be able to access some old archives using a newer version of lrzip and vice versa?
Offline
@jristz- what? Did you read the PDF I put together?
@karol- I should have tested more smaller files. The smallest one was around 8 megs. To the stability question, ck just added some code to insure backwards compatibility. See his blog for more on that... http://ck-hack.blogspot.com/2012/03/lrzip-0612.html
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Numbers are the best form to confound those po..blations and velieve that all you want
i found this Other article: http://ck.kolivas.org/apps/lrzip/lrzip- … benchmarks I know is the kernel, buth is another view point.
PD: and the pdf...ups...colourfit (from text-normal and text-url) are the same ...tanks for noticed this for my machine ... changing
Well, I suppose that this is somekind of signature, no?
Offline
great idea.
saving some bandwidth and decompressing faster are always good. consider report this in arch bug.
"After you do enough distro research, you will choose Arch."
Offline
Hi. Lrzip is quite stable and the file format is always forward compatible. At this stage no major file format changes are planned to the lrzip file format since all the major compression options and features have been established, though any future changes to the format would be planned to always be forward compatible (i.e. newer versions will always decompress older files). Yes it is optimised for large files but that doesn't mean it performs poorly on smaller files, it's just that it performs no better on smaller files. The main advantage of lrzip is that it is a format designed with the future in mind because the compression gets greater and the speed gets faster as machines get more ram, files get bigger, and PCs get more cores.
I did a quick comparison of the compression and decompression of the archlinux/core/os/x86_64/*.xz packages converted to raw tar files on a quad core 3GHz machine
Compression with xz
time `for i in *.tar; do xz -f $i ; done`
real 16m38.543s
user 16m25.092s
sys 0m8.119s
Compression with lrzip
time `for i in *.tar; do lrzip -f $i ; done`
real 7m4.677s
user 13m45.598s
sys 0m32.215s
Decompression with xz
time `for i in *.xz; do xz -dkf $i && sync ; done`
real 2m28.022s
user 0m59.477s
sys 0m10.138s
Decompression with lrzip
time `for i in *.lrz; do lrzip -dkf $i && sync; done`
real 2m8.554s
user 0m53.950s
sys 0m11.721s
Directory sizes:
xz: 828260
lrz: 825292
As you can see the overall sizes are only trivially different. What is lost on smaller packages is made up for on larger packages. However decompression is faster and compression is much faster (on quad core at least). This is a fairly easy comparison that anyone can do.
Offline
Multiple things here....
1) it has not been accepted, there has been a pull request made.
2) are the advantages of lrzip actually applicable to packages... the benchmarks are not informative in that regard.
Before this is even considered, we would need this to be in a released libarchive version _and_ a series of benchmarks showing significant reductions in package size or install time (without regressing with respect to the other).
Edit: I see there were some results posted while I was composing this reply. They show fairly minimal advantage in terms of package size, so we need comparisons of speed while installing packages. i.e. benchmarks done using pacman installing package with lrzip format.
Offline
I applied the patches to libarchive and it appears to have successfully linked to lrzip:
> readelf -d /usr/lib/libarchive.so.12.0.3
Dynamic section at offset 0x8eec0 contains 31 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libacl.so.1]
0x00000001 (NEEDED) Shared library: [libattr.so.1]
0x00000001 (NEEDED) Shared library: [libexpat.so.1]
0x00000001 (NEEDED) Shared library: [liblzma.so.5]
0x00000001 (NEEDED) Shared library: [liblrzip.so.0]
0x00000001 (NEEDED) Shared library: [liblzo2.so.2]
0x00000001 (NEEDED) Shared library: [libpthread.so.0]
0x00000001 (NEEDED) Shared library: [libbz2.so.1.0]
0x00000001 (NEEDED) Shared library: [libz.so.1]
0x00000001 (NEEDED) Shared library: [libnettle.so.4]
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x0000000e (SONAME) Library soname: [libarchive.so.12]
But pacman says no...
> pacman -U pkg-config-0.26-2-i686.pkg.tar.lrz
loading packages...
error: could not open file pkg-config-0.26-2-i686.pkg.tar.lrz: Unrecognized archive format
error: 'pkg-config-0.26-2-i686.pkg.tar.lrz': cannot open package file
bsdtar can not autodetect the format either. So the patchset appears to be insufficient for use in pacman yet.
Offline
The libarchive patches are a work in progress. I still have work to do on them and wasn't pushing for inclusion yet, simply responding to this discussion.
Offline
I'd also like to see a comparison on single core processors, as they are still relevant (netbooks, slightly older hardware).
Whatever the result will be: Nice work, graysky!
Offline
The libarchive patches are a work in progress. I still have work to do on them and wasn't pushing for inclusion yet, simply responding to this discussion.
OK - that is good to know. I'll look forward to testing it out when these are complete.
Offline
I'd also like to see a comparison on single core processors, as they are still relevant (netbooks, slightly older hardware).
+1
Are the memory requirements for decompression the same as for xz?
On older computers, when compiling packages locally I can always use gzip or nothing at all - *.pkg.tar.* works fine.
Offline
I would also like to see lrzip added to pacman (understand: libarchive). Good luck with upstream integration ckolivas !
Btw, I enjoy reading your blog, and hope your project will prove itself useful as package format.
Offline
Memory usage is about the same as xz, as is decompression speed on uniprocessor. Binding the two applications to one core, simulating running on a single CPU machine:
time `for i in *.xz ; do schedtool -a 3 -e xz -dkf $i && sync ; done`
real 2m28.941s
user 0m59.280s
sys 0m11.326s
time `for i in *.lrz ; do schedtool -a 3 -e lrzip -p 1 -df $i && sync ; done`
real 2m25.442s
user 0m54.024s
sys 0m12.280s
I also set -p 1 on lrzip just so it thinks there's only one CPU.
Offline
Imho this is way to early to propose switching to a new compression for our packages; esp. when it's tools are still work in progress and not really finished yet.
Even tough I didn't read all the information, it seems that in respect to size and time xz and lrz are quite similar. However lrz seems to support multi threading which xz lacks in its current version. Afaik Xz 5.1 will make use of multiple cores.
But there is no need to hurry for us. While the switch from gz to xz required some work to pacman and tools like dbscripts and devtools, adding support for another compression format will be as trivial as long as libarchive supports it.
Offline
ckolivas wrote:The libarchive patches are a work in progress. I still have work to do on them and wasn't pushing for inclusion yet, simply responding to this discussion.
OK - that is good to know. I'll look forward to testing it out when these are complete.
Allan - those changes have been merged into libarchive:master a few months ago. Would you be willing to give it a whirl in pacman?
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
> bsdtar -xf glibc-2.16.0-2-i686.pkg.tar.lrz
bsdtar: Error opening archive: Unrecognized archive format
> bsdtar --version
bsdtar 3.0.200a - libarchive 3.0.200a
I do not have the time to test beyond that...
Edit: for clarification. If bsdtar can not autodetect the format, pacman can not either.
Offline
Thanks, Allan. Guess additional work is needed.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
It works when you add "archive_read_support_filter_lrzip(a);" to the filter list in archive_read_support_filter_all.c. I am not sure why the developers did not add it yet to the list, maybe because lrzip implementation considered unstable yet as the discussion in https://github.com/libarchive/libarchive/pull/7 suggests.
Offline
@tobias_ - I added that line but got build errors. Can you clarify?
EDIT: Ah, I see now that the latest libarchive is 3.0.4 released on 26-Feb while the lrzip commits postdate this. I built it from libarchive-git and enabled lrzip per your instruction. Seems to work fine for me. Wonder when upstream plans to invoke it.
EDIT2: https://github.com/libarchive/libarchive/pull/28
Last edited by graysky (2012-10-03 00:42:00)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
@Allan - Can you use the git package mentioned above to conduct whatever pacman tests you were thinking about?
Also, how can I get /usr/bin/makepkg to compress my packages to .pkg.tar.lrz natively? Adding it to the setting `PKGEXT=.pkg.tar.lrz` does not work:
==> Creating package...
-> Generating .PKGINFO file...
-> Adding install file...
-> Compressing package...
==> WARNING: '.pkg.tar.lrz' is not a valid archive extension.
==> Leaving fakeroot environment.
==> Finished making: profile-sync-daemon 3.15-1 (Tue Oct 2 20:27:11 EDT 2012)
However, pacman is able able to install lrz packages that I manually make!
% sudo pacman -U profile-sync-daemon-3.15-1-any.pkg.tar.lrz
loading packages...
Decompressing...
100% 30.00 / 30.00 KB
Average DeCompression Speed: 0.000MB/s
[OK] - 30720 bytes
Total time: 00:00:00.00
warning: profile-sync-daemon-3.15-1 is up to date -- reinstalling
resolving dependencies...
looking for inter-conflicts...
Targets (1): profile-sync-daemon-3.15-1
Total Installed Size: 0.02 MiB
Net Upgrade Size: 0.00 MiB
Proceed with installation? [Y/n]
(1/1) checking package integrity [########################################] 100%
(1/1) loading package files [########################################] 100%
(1/1) checking for file conflicts [########################################] 100%
Decompressing...
100% 30.00 / 30.00 KB
Average DeCompression Speed: 0.000MB/s
[OK] - 30720 bytes
Total time: 00:00:00.00
Decompressing...
100% 30.00 / 30.00 KB
Average DeCompression Speed: 0.000MB/s
[OK] - 30720 bytes
Total time: 00:00:00.00
(1/1) upgrading profile-sync-daemon [########################################] 100%
Last edited by graysky (2012-10-03 00:29:40)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Also, how can I get /usr/bin/makepkg to compress my packages to .pkg.tar.lrz natively? Adding it to the setting `PKGEXT=.pkg.tar.lrz` does not work
Examine the function create_package() within makepkg itself.
Offline
Has anyone reported the lack of lrz support in archive_read_support_filter_all? It is probably a mistake not to have it...
Adding lrz compression to makepkg is easy - a one line patch I think. I have not done it yet as there is little point until a released libarchive supports reading these packages.
Also, I doubt I will have any time to look into the relative benefits/disadvantages of using this compression format until an actual libarchive release with lrzip support is made.
Offline
@Zeke-Thank you. I opened flyspray #31782 w/ patch to include support in makepkg.
@Allan- Tim K commited the fix yesterday.
Last edited by graysky (2012-10-03 08:13:47)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline