You are not logged in.
Hi,
pacman has great non-default options. I liked these ones:
UseDelta, TotalDownload, ShowSize
ShowSize
Display the size of individual packages for --sync and --query modes.
UseDelta
Download delta files instead of complete packages if possible. Requires the xdelta
program to be installed.
TotalDownload
When downloading, display the amount downloaded, download rate, ETA, and completed
percentage of the entire download list rather than the percent of each individual
download target. The progress bar is still based solely on the current file download.
TotalDownload and ShowSize are working ok, but i have some questions about the UseDelta option:
How this works? Where can i find the delta files?
Offline
The idea is to use binary diffs to patch the package if this results in a significantly smaller download. I saw someone posted numbers recently showing it would work great with most pkgrel bumps and minor pkgver bumps.
However, you can't find delta files for official Arch packages anywhere because this is not used yet. The Arch infrastructure is not setup for it and I don't think it is high on anyones priority list...
Online
The idea is to use binary diffs to patch the package if this results in a significantly smaller download. I saw someone posted numbers recently showing it would work great with most pkgrel bumps and minor pkgver bumps.
However, you can't find delta files for official Arch packages anywhere because this is not used yet. The Arch infrastructure is not setup for it and I don't think it is high on anyones priority list...
To make it worse, this feature is currently broken in pacman 3.2 development version, and it will probably still be broken when 3.2 is released.
But who cares, it is not used anyway
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
@ shining
But wont it be much beneficial for a rolling release distro like arch to have delta packages instead of regular packages when we are updating the whole system?? Will help those who have low speed connections...
Offline
@ shining
But wont it be much beneficial for a rolling release distro like arch to have delta packages instead of regular packages when we are updating the whole system?? Will help those who have low speed connections...
I didn't say that it wasn't useful, just that no one is interested in fixing it, and no one is interested to putting it in place.
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
sorry for the bump, but I'd really love to see UseDelta become implemented, it saves tons of time downloading (on Gentoo at least) and it would put use less bandwith. It would greatly benefit dial-up users and people with slow DSL (like me, I get 13Kb/s ...) .
Offline
sorry for the bump, but I'd really love to see UseDelta become implemented, it saves tons of time downloading (on Gentoo at least) and it would put use less bandwith. It would greatly benefit dial-up users and people with slow DSL (like me, I get 13Kb/s ...) .
Help then
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
I seem to remember a discussion about it (from the server side perspective). The reason (IIRC) that it isn't being worked on is that the bandwidth savings were not enough to outweight the additional cpu load of creating the xdeltas. I'm not entirely sure, but I do remember it being mentioned.
[git] | [AURpkgs] | [arch-games]
Offline
I seem to remember a discussion about it (from the server side perspective). The reason (IIRC) that it isn't being worked on is that the bandwidth savings were not enough to outweight the additional cpu load of creating the xdeltas. I'm not entirely sure, but I do remember it being mentioned.
Afaik the delta creation is/was handled completely by makepkg and in most cases the cpu load should be small compared to compiling. The reason this wasn't taken any further are more in the line of: lack of a pacman contributor with much interest in deltas, lack of "someone" maintaining a private repo using deltas, etc.
On a sidenote: The way I see it, most of the people interested in delta updates don't have a broadband connection, which makes things such as maintaining a repo relatively difficult.
Offline
Afaik the delta creation is/was handled completely by makepkg and in most cases the cpu load should be small compared to compiling. The reason this wasn't taken any further are more in the line of: lack of a pacman contributor with much interest in deltas, lack of "someone" maintaining a private repo using deltas, etc.
There was some discussion that it could be better to put it also in repo-add. Or even only in repo-add, which would be simpler than having it in both.
But I am still not sure what is best...
On a sidenote: The way I see it, most of the people interested in delta updates don't have a broadband connection, which makes things such as maintaining a repo relatively difficult.
Indeed, we need altruist people with broadband
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
Garns wrote:Afaik the delta creation is/was handled completely by makepkg and in most cases the cpu load should be small compared to compiling. The reason this wasn't taken any further are more in the line of: lack of a pacman contributor with much interest in deltas, lack of "someone" maintaining a private repo using deltas, etc.
There was some discussion that it could be better to put it also in repo-add. Or even only in repo-add, which would be simpler than having it in both.
But I am still not sure what is best...
I just read a bit up on this. I actually missed the whole delta 2.0 story at first .
On a sidenote: The way I see it, most of the people interested in delta updates don't have a broadband connection, which makes things such as maintaining a repo relatively difficult.
Indeed, we need altruist people with broadband
As always. But first we need new delta generation in makepkg or repo-add, or in both... (the last one could be a bad idea)
Offline
shining wrote:Garns wrote:Afaik the delta creation is/was handled completely by makepkg and in most cases the cpu load should be small compared to compiling. The reason this wasn't taken any further are more in the line of: lack of a pacman contributor with much interest in deltas, lack of "someone" maintaining a private repo using deltas, etc.
There was some discussion that it could be better to put it also in repo-add. Or even only in repo-add, which would be simpler than having it in both.
But I am still not sure what is best...I just read a bit up on this. I actually missed the whole delta 2.0 story at first .
On a sidenote: The way I see it, most of the people interested in delta updates don't have a broadband connection, which makes things such as maintaining a repo relatively difficult.
Indeed, we need altruist people with broadband
As always. But first we need new delta generation in makepkg or repo-add, or in both... (the last one could be a bad idea)
All the information about this stalled delta makepkg / repo-add rework are there :
http://www.nabble.com/Add-delta-creatio … #a15513733
Any help would be really appreciated, because I feel bad about this and even regret Dan and I tried to improve the pacman side, but did not complete the work by working on makepkg / repo-add side as well.
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
FYI, this is #2 (out of 2) on my "Big Things I Want to get Working in Arch" list behind getting a testing repo for [community] packages. But given I haven't started #1 yet, it may take me a while to get this working so others should do it for me
Online
well, I'm glad this thread didn't just go into the dustbin. man, I'd loveeee to have this feature as I said earlier, like now, kde 4.1.1 went in, and I'll have to redownload at least 230M , that will take about 8-12 hours, for about 50M or less of diffs, so please, get started on #1 (and what is #1 btw) , so you can get this implimented
Last edited by g2g591 (2008-08-29 22:54:46)
Offline
I, too, would also be really happy if deltas were used. I get very erratic speeds between four kilobytes to -- when I'm lucky -- two or more megabytes a second, until the throttling kick in. I am stuck in the middle of a five-hundred-megabyte update at around thirty kilobytes a second.
Last edited by Wintervenom (2009-08-05 14:04:48)
Offline
Deltas can make a *huge* difference when releasing package upgrades. When implemented, if we want to be picky about which packages are delta'd, we should probably do the largest packages at the very least.
A good example is OpenOffice.org. The latest openoffice-base-3.0.1-1-x86_64.pkg.tar.gz is 150-odd MB. If there were a package upgrade, say to 3.0.1-2 the delta would probably be a few kb. An actual application upgrade, say, to 3.0.2-1, might not have such a dramatic difference - but I'd bet it would be very good anyway, probably less than 5 MB.
Also, I hope the devs that have been involved with the delta feature so far have taken note of gzip's --rsyncable option which makes deltas much much more effective.
I manage backups for an ISP and we happen to still be using xdelta for some of our servers still using legacy "home-brew" backups. They're running fine and the daily "deltas" don't take up much space. We're not going to fix what ain't broke.
pacman russian roulette: yes | pacman -Rcs $(pacman -Q | LANG=C sort -R | head -n $((RANDOM % 10)))
(yes, I know its broken)
Offline
Delta support was originally a contribution of Nathan Jones, but unfortunately it had to be rewritten due to some design limitations :
http://projects.archlinux.org/?p=pacman … 8c3cf05797
And it was never finished due to lack of interest.
The only recent interest was from Garns who posted above in this thread, which led to this last discussion on the ML :
http://archive.netbsd.se/?ml=pacman-dev … &m=9005926
There are still very important questions that are unanswered, like where should deltas be created : makepkg, repo-add, external tool?
And then some issues like the "gzip -n" one.
There is no need to endlessly repeat how great delta is. What we need is people brainstorming on the implementation issues, and then some coding!
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
According to http://code.google.com/p/xdelta/wiki/Ex … ompression :
Xdelta decompresses the input stream (target) using pipes to the external compression program; it decompresses the source file to a temporary file. There is a hard-coded maximum size of 256MB for external compression.
Recognition of externally-compressed inputs can be disabled by -D.
I was looking into the backup implementation that I mentioned earlier. In the oldest versions it used xdelta3 before compression, in others its using gzip --rsyncable before running xdelta3. Most of the files or folders being backed up are a lot bigger than 256MB so the -D option was probably never looked at or necessary to improve performance.
Would it be simpler to add --rsyncable to the gzip makepkg and then use the -D option when using xdelta3?
The downside to using --rsyncable is a slight increase in the size of the .gz files, of a few kb per MB. I'm not sure if bzip has a similar feature.
Though I can follow the concepts in the mailing lists, I feel lost in the code. If I had the time to get into it, I'd gladly do it myself.
pacman russian roulette: yes | pacman -Rcs $(pacman -Q | LANG=C sort -R | head -n $((RANDOM % 10)))
(yes, I know its broken)
Offline
According to http://code.google.com/p/xdelta/wiki/Ex … ompression :
Xdelta decompresses the input stream (target) using pipes to the external compression program; it decompresses the source file to a temporary file. There is a hard-coded maximum size of 256MB for external compression.
Recognition of externally-compressed inputs can be disabled by -D.
I was looking into the backup implementation that I mentioned earlier. In the oldest versions it used xdelta3 before compression, in others its using gzip --rsyncable before running xdelta3. Most of the files or folders being backed up are a lot bigger than 256MB so the -D option was probably never looked at or necessary to improve performance.
Would it be simpler to add --rsyncable to the gzip makepkg and then use the -D option when using xdelta3?
The downside to using --rsyncable is a slight increase in the size of the .gz files, of a few kb per MB. I'm not sure if bzip has a similar feature.
Though I can follow the concepts in the mailing lists, I feel lost in the code. If I had the time to get into it, I'd gladly do it myself.
What is that --rsyncable option? It is not even an official flag?
Could you do some tests showing the results of using --rsyncable and xdelta3 -D on the size of the delta?
Isn't the resulting delta still much much bigger?
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
I don't think --rsyncable is strictly posix. The way I understand it is that it adds a (tiny) bit of fragmentation to the resulting gzip. The big upside is that the entropy caused by small changes are greatly lessened in the compressed output file. For rsync and deltas, this is a big boost.
I'm putting together some comparative data.
pacman russian roulette: yes | pacman -Rcs $(pacman -Q | LANG=C sort -R | head -n $((RANDOM % 10)))
(yes, I know its broken)
Offline
I took the backups of my own domain's web site, from the 9th to the 3rd of February. I unzipped each days' backups to at least tar format and then regzipped normally and then with --rsyncable. I used no other gzip parameters.
last tar file: 134 656 000 bytes
last tgz file: 60 625 581 bytes
last rsyncable.tgz file: 63 664 000 bytes
I applied all the deltas to get each days "full" tar and put them in separate subfolders. Then I ran xdelta3 to get a delta for each day and for each differentiation of the scheme. These are the total sizes for each type of delta over the 8-day period
"gzip" "xdelta -D" : 222 525 619 bytes (all 7 deltas together, 31 789 374 bytes average per delta)
"gzip --rsyncable" "xdelta -D" : 27 676 472 bytes (3 953 781 bytes average)
"gzip" "xdelta" : 1 247 887 bytes (178 269 bytes average)
Obviously, allowing xdelta to decompress and recompress is the best way in terms of bandwidth. But at the same time, short of fixing the problem we have with xdelta, gzip --rsyncable and xdelta -D isn't too bad a stopgap since its also probably the easiest to implement.
The actual bash commands I used to do all the above are at http pastebin swiftspirit co za/9
The math was done by hand... silly me.
edit...
Forgot to add up the actual total bandwidth used in this comparison of "updating" each day:
no deltas, just download the fresh gzip each day : 394 404 842 bytes
"gzip" "xdelta -D" : 357 181 619 bytes
"gzip --rsyncable" "xdelta -D" : 91 340 472 bytes
"gzip" "xdelta" : 61 873 468 bytes
Last edited by zatricky (2009-02-10 18:51:39)
pacman russian roulette: yes | pacman -Rcs $(pacman -Q | LANG=C sort -R | head -n $((RANDOM % 10)))
(yes, I know its broken)
Offline
Thanks for the numbers, they are very interesting
It would indeed be easier to implement but I think the fact that --rsyncable is not in the official gzip is a showstopper. Arch tries to be vanilla when possible. Also xdelta3 is only in AUR/unsupported, but that is not the problem. If it was really needed, I am sure someone could maintain it in an official repo.
Anyway I sent a mail about what I think is the current status of the implementation : http://www.archlinux.org/pipermail/pacm … 08129.html
There is a lot to discuss, and a lot to implement, and no one motivated to do all the jobs, so we won't go far
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
In all this time, working on the server (where the backups are), I never realised that we don't already have --rsyncable in arch's gzip.
CentOS seems to have --rsyncable builtin by default. Its shown in gzip --help but not in the manpage.
pacman russian roulette: yes | pacman -Rcs $(pacman -Q | LANG=C sort -R | head -n $((RANDOM % 10)))
(yes, I know its broken)
Offline
forgive me for reviving an old thread, but what are the reasons for not enabling by default TotalDownload in /etc/pacman.conf?
the progress report of the total update process is useful information surely?
Offline
We just kept the old behavior by default, and added a new option to change it, that's all.
If you are not happy, feel free to open a feature request.
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline