You are not logged in.
Here's the current delta list and some stats.
Package New Version Full Size Delta Size % Of Full freetype2 2.3.3-1 718,792 148,671 21 % jre6 6-3 37,364,119 3,000,289 8 % libx11 1.1.1-4 2,012,590 163,079 8 % wesnoth 1.2.3-1 71,365,629 2,364,189 3 % qt 3.3.8-3 10,417,784 1,599,413 15 % kdelibs 3.5.6-7 20,752,239 617,196 3 % kdepim 3.5.6-4 22,407,785 226,905 1 % kernel26 2.6.20-5 20,524,202 2,035,759 10 %
All and all for this package set the difference is 185Mb for the full packages versus 10Mb for the deltas.
kernel26 will no doubt be updated again shortly, as there is already a -6 on kernel.org.
Seeing it in action is pretty impressive! I always hated it when there was a new kde* upgrade - they are just too big. Now I might actually download those deltas and patch manually. THX.
Offline
Ok, now I just tried patching libx11 from 1.1.1-3 to 1.1.1-4 (just to see if it works). xdelta patches without any errors or anything, but when i compare the contents of the libx11-1.1.1-4 I can download from any official mirror and the resulting xdelta file, they are different. The uncompressed contents' total size is different (du -c). Additionally I ran a md5sum check over some of the files (patched and downloaded 1.1.1-4) and there are just too many files that are not the same.
Am I doing something wrong?
I just did it like in the pacxfer-script. Download the diff, patch the old file with the diff, done. Or is this normal?
Offline
Ok, now I just tried patching libx11 from 1.1.1-3 to 1.1.1-4 (just to see if it works). xdelta patches without any errors or anything, but when i compare the contents of the libx11-1.1.1-4 I can download from any official mirror and the resulting xdelta file, they are different. The uncompressed contents' total size is different (du -c). Additionally I ran a md5sum check over some of the files (patched and downloaded 1.1.1-4) and there are just too many files that are not the same.
Am I doing something wrong?
I just did it like in the pacxfer-script. Download the diff, patch the old file with the diff, done. Or is this normal?
It's normal, because to create the deltas I am running the whole makepkg-xdelta enchilada on my workstation, not just taking the authorized package and generating a patch. I just did a compare of .FILELIST & .PKGINFO from the delta generated gz and one from a mirrored repo. The FILELIST was identical between the two packages.
.PKGINFO was different in the following areas, which will in itself make a slight difference.
--- arch/PKGINFO 2007-04-05 05:00:33.000000000 +1200
+++ dale/PKGINFO 2007-04-05 23:12:30.000000000 +1200
@@ -1,12 +1,12 @@
-# Generated by makepkg 3.0.0
-# Wed Apr 4 17:00:33 UTC 2007
+# Generated by makepkg 2.9.8
+# Thu Apr 5 23:12:30 NZST 2007
pkgname = libx11
pkgver = 1.1.1-4
pkgdesc = X11 client-side library
url = http://xorg.freedesktop.org/
-builddate = Wed Apr 4 17:00:33 2007
-packager = Jan de Groot <email@omitted>
-size = 5365475
+builddate = Thu Apr 5 11:12:30 2007
+packager = Dale Ogilvie <email@omitted>
+size = 5078375
arch = i686
depend = libxau
depend = libxdmcp
Size differences in the files themselves must come from differences in the build environment. There are not completely "lab conditions" for this experiment, I am just building the package from the ABS PKGBUILD with my workstation toolset. Does the actual package maintainer have a different toolset/build environment/etc? Probably. Does it matter? Well, my workstation is still running, and I have upgraded all these self built packages using wget-xdelta.sh from the delta repo
If we take this beyond experimental, should the advantages of a 95% saving in bandwidth seem compelling , then the repo package and the patched package will be identical, as the delta will be generated by the package maintainer.
FYI, the theoretical benefits of zsync are quite compelling (no need for a previous version on the server, client version agnostic patching) if it can be made to work. I intend to look at a makepkg-zsync, wget-zsync.sh soon...
Offline
FYI, the theoretical benefits of zsync are quite compelling (no need for a previous version on the server, client version agnostic patching) if it can be made to work. I intend to look at a makepkg-zsync, wget-zsync.sh soon...
I've had another go with zsync. For wesnoth 1.2.2->1.2.3 the zsync client seems to pull ~12Mb from the server as opposed to xdelta's 2.3Mb delta.
Offline
To make zsync work you need to patch gzip. Here's an outline of what the patch does, and how it improves thinks.
http://svana.org/kleptog/rgzip.html
One note the patch it refers to is the original proof of concept, the upto date and much improved patch is available at http://security.debian.org/debian-secur … e2.diff.gz
"Instead, people would take pains to tell her that beauty was only skin-deep, as if a man ever fell for an attractive pair of kidneys."
(Terry Pratchett, Maskerade)
Offline
To make zsync work you need to patch gzip. Here's an outline of what the patch does, and how it improves thinks.
http://svana.org/kleptog/rgzip.htmlOne note the patch it refers to is the original proof of concept, the upto date and much improved patch is available at http://security.debian.org/debian-secur … e2.diff.gz
Are you sure about that? zsync -z is supposed to create an optimized gzip. That is what I used to create a gz that yielded the 12Mb download. I have a query in with the zsync developer to check that what I'm doing is sane.
Offline
I have made a small change to the makepkg-xdelta script, pulling the latest old package version from the cache to generate the delta, rather than looking for it in the PKGBUILD directory. This removes the extra step of copying the old version to the build directory, assuming that the package maintainers actually have the previous version in cache. makepkg-xdelta only creates a delta against the last version in the cache. This is by design to encourage frequent updates and save disk space on the mirrors. I don't see great value in maintaining delta's back much further than previous.
I also fixed a bug in wget-xdelta.sh when the package name had a hyphen in it (e.g. amarok-base).
So here are the stats so far. I have built, packaged, and delta-downloaded all these packages (except glibc, I'm not that brave) to my KDE workstation, which continues to run... so far..
Num Packages Total Full Size Total Delta Size % Of Full
17 227,319,063 16,058,883 7 %
Package New Version Full Size Delta Size % Of Full
freetype2 2.3.3-1 718,792 148,671 21 %
freetype2 2.3.3-2 718,781 40,934 6 %
lm_sensors 2.10.3-1 296,792 85,147 29 %
ttf-dejavu 2.16-1 3,540,226 565,565 16 %
libxfont 1.2.8-1 479,796 179,633 37 %
sane 1.0.18-3 3,156,460 1,547,239 49 %
faad2 2.5-3 373,719 85,302 23 %
imagemagick 6.3.3.6-1 2,204,481 1,305,945 59 %
glibc 2.5-8 10,463,678 141,976 1 %
jre6 6-3 37,364,119 3,000,289 8 %
libx11 1.1.1-4 2,012,590 163,079 8 %
wesnoth 1.2.3-1 71,365,629 2,364,189 3 %
qt 3.3.8-3 10,417,784 1,599,413 15 %
kdelibs 3.5.6-7 20,752,239 617,196 3 %
kdepim 3.5.6-4 22,407,785 226,905 1 %
kernel26 2.6.20.5-1 20,524,202 2,035,759 10 %
kernel26 2.6.20.6-3 20,521,990 1,951,641 10 %
At this point I'd like to hear from the devs as to whether they want to serve up deltas from the mirrored repositories. Here's how I see things, should the xdelta step be included in makepkg.
1. Package maintainers would have to install the xdelta package.
2. Assuming the previous version is in the cache on the maintainers machine when building the new package, the delta will be created automatically along with the full package. makepkg will take longer due to the delta creation.
3. package and delta would be copied to the mirrored repo, remove old package and old delta
Based on current numbers, there is a potential bandwidth saving of 93% times number of downloads, at a cost of 7% increase in the size of the repository on the mirrors. Realization of the full potential saving depends on uptake of the xdelta download method, which currently is "opt-in" using wget-xdelta.sh.
Where to from here?
Offline
At this point I'd like to hear from the devs as to whether they want to serve up deltas from the mirrored repositories. Here's how I see things, should the xdelta step be included in makepkg.
Seeing the benefits... I like it. 227 -> 16mb is nothing to joke about.
1. Package maintainers would have to install the xdelta package.
2. Assuming the previous version is in the cache on the maintainers machine when building the new package, the delta will be created automatically along with the full package. makepkg will take longer due to the delta creation.
3. package and delta would be copied to the mirrored repo, remove old package and old delta
with 3.... does the delta have to be in the same directory as the normal package, or a seperate repo?
Given it being in the same dir, and the above, you have what looks like a slick and simple implementation. devtools could be easily modified to upload the delta, and the scripts on archlinux.org would probably only need a line or two added.
Based on current numbers, there is a potential bandwidth saving of 93% times number of downloads, at a cost of 7% increase in the size of the repository on the mirrors. Realization of the full potential saving depends on uptake of the xdelta download method, which currently is "opt-in" using wget-xdelta.sh.
Where to from here?
I think we could implement this given the above, and I can't see any drawbacks, provided that xdelta is reliable. How long does it take to apply a diff?
Don't take my post as word, but I think this is an awesome setup, and if the other devs like it, could be implemented. I take back my criticism from earlier in the thread.
James
Last edited by iphitus (2007-04-09 07:37:18)
Offline
what about if there is no previous package on your disc (if you used pacman -Scc) for delta to work? could the package be created by compressing files that are already installed? the only problem would be with the files that you changed (like config files). but then again pacman could save local package's md5sum (if it doesn't already) and check it after recompressing files. the only question is how long would this compressing, applying xdelta and installing take .
the other option could be creating delta on extracted files of packages but i guess thet this would be unsafe.
Last edited by billy (2007-04-09 17:46:45)
Offline
In case someone wants to use pacman-xdelt with pacman 3 this is an ugly hack to make it working:
replace:
#!/bin/bash
o=$1
u=$2
by
#!/bin/bash
o=$1
u=$2
if [ "$(pacman --version | grep v3. -c)" -eq 1 ] ; then
o=$(basename $o)
fi
in /usr/bin/wget-xdelta.sh
Any resemblance to grammar or spelling mistakes is purely coincidental and independant of the willing of the author
Offline
Well this is looking rather promising, isn't it? xdelta is looking like the right choice, it seems, given the apparenty dissappointing performance of zsync (although I suppose it could use a little more looking into). Yet I feel it would be nice to keep the flexibility of being able to fish for old versions in the PKGBUILD dir (or, better, in the PKGDEST (?) dir) as well as looking in the cache. Certainly, the maintainer may be used to upgrading his own packages from the local disc, so that the old versions would not end up in the cache anyway.
Also, on the md5 summing / compression issue: Unless all the maintainers are made very aware of the problem, it is easy to image someone creating a package without a delta, and then going back and creating the delta later. If they then only upload the freshly created delta, we have a tar.gz with a different md5 sum from that which the delta will produce, resulting in a small amount of mayhem.
Also note that, as it stands, patching in pacman requires more free disk-space (as noted far above) than the standard upgrade process. The requirements increase with unpatched package size. This might be fixed by using xdelta 3, and pipes.
edit: One notable problem is that IIRC xdelta uses /tmp as scratch-space during patching. For systems where /tmp is a tmpfs, this could quickly lead to problems (xdelta patches using the tar, not the tar.gz, so the old package needs to be fully uncompressed first, and the latest version will first be produced as a tar as well... 2xuncompressed-package-size required).
iphitus:
The delta is searched for in the same repo dir as the full package.
billy:
This would not work if any file had changed, since xdelta does its own file verification and expects the tar to be patched to have a certain md5 sum. You would, of course, also need to add the package metadata files for this to have a chance at working.
Last edited by Thikasabrik (2007-04-10 08:06:36)
Offline
does the delta have to be in the same directory as the normal package, or a seperate repo?
Currently I think yes, using XferCommand, as the delta repo is a first class repo in pacman.conf. With more engineering a delta repo could be delta only - this might be attractive to mirrors prepared to serve up 7% of a full arch repo.
I think we could implement this given the above, and I can't see any drawbacks, provided that xdelta is reliable. How long does it take to apply a diff?
xdelta has never failed me in my short experience with it. This is where its maturity in the "market" has it over bdiff for example. Patching the 70Mb wesnoth with a 2Mb delta takes ~20 seconds on my AMD64 3000+.
Don't take my post as word, but I think this is an awesome setup, and if the other devs like it, could be implemented. I take back my criticism from earlier in the thread.
James
Thanks for your positive feedback, I just need to know whether it is worthwhile continuing to work on this. If the arch devs aren't interested in implementing this for whatever reason, I can find other things to do...
Last edited by dale77 (2007-04-10 09:02:59)
Offline
what about if there is no previous package on your disc (if you used pacman -Scc) for delta to work?
If you want smaller downloads you need to be less tidy (with your cache). Do people still have disk space issues in 2007? I don't think we want to try and engineer regenerating the package when the vastly simpler (Arch-ier?) solution is to just not delete the previous package version.
To help things out pacman could be modified to do a delta friendly clean i.e. clean out all but the latest versions from the cache.
Offline
To help things out pacman could be modified to do a delta friendly clean i.e. clean out all but the latest versions from the cache.
pacman -Sc
I use that all the time.
Offline
Taking into account the unfortunately named Thikasabrik's last post I thought an issue list might be in order:
Package Maintainer issues/cons
1. Previous version location - possibly the cache cannot be relied upon, also try PKGDEST and PKGBUILD dir in some intelligent order...
2. Additional repo storage space - 7% increase in repo size on disk
3. Additional repo bandwidth - data flows from repo to mirror increase by 7% for the deltas
4. Additional makepkg time to create delta - insignificant compared to compile time in my experience
5. deltas created post makepkg causing md5sum failures - invest in brown paper bags for guilty maintainers? Getting #1 right will help a lot.
Client Issues
1. No previous package version available - just use pacman -Sc (thanks mcover)
2. Delta requires more disk space, i.e. briefly you have an extra copy of the previous version uncompressed on disk while the patch is applied.
3. "opt-in" uptake of deltas using wget-xdelta.sh. Bandwidth savings don't materialize unless the client opts in. - modify pacman to be xdelta aware in C, and default to deltas where possible.
4. Overflowing a small /tmp with a 300Mb untar of openoffice-base
5. Patch time - seconds for a 70Mb pkg on a 3GHz equiv, traded off against shorter download time.
General Issues
xdelta3 or 1.1? Personally I would just stick with 1.1.
I think those issues are mostly corner cases... No biggies there IMHO. These are the ones I care about:
* Package Maintainer previous version location
* "opt-in" uptake of deltas using wget-xdelta.sh
Dale
Last edited by dale77 (2007-04-11 11:56:25)
Offline
Taking into account the unfortunately named Thikasabrik's last post...
I can't help it, the internet Gods have chosen .
Anyhoo, issue-lists = win.
By the way, I think a move to xdelta 3 should be on there somewhere, owing to its potential advantages (support for pipes, and supposed libification) and the fact that the diff format changed in this version. Thus, to avoid complexities later, if we switch it should be ASAP. Given that, I think we need a seperate xdelta3 package. I'd write a PKGBUILD, but I can't test it for a couple of weeks yet.
Also, I thought a bit more about what billy said, and I guess I didn't take it fully as intended, so here's a better reply:
Reassembling a package from installed files is only possible, as you mention, in the case where there are no editable config files present. In this case it should be possible, but is enourmously complex compared to the patching process dale77 has been working on. That said, there's nothing to stop someone modifying the wget-xdelta script in an attempt to do this, it's just not something to worry about at this stage. Let's get something simple working first.
Offline
I have updated pacman-xdelta in the delta repo
[delta]
Server = http://dale.phraktured.net/delta
I implemented a couple of new features in makepkg-xdelta:
1. Search both PKGDEST (which is build dir if not set) AND cache for latest previous version
2. Skip all xdelta funtimes if xdelta unavailable
I also cleaned out the delta repo. As the deltas are against my home-built packages, they will fail to patch up any previous version that comes from a real repo.
I reckon the next step is to find some kind package maintainer willing to run makepkg-xdelta for his next release to the mirrors. Once this happens, it should be just a quick XferCommand away from trying out arch-xdelta for real. James mentioned that some scripts might have to change to transport deltas to the mirrored repo...
As to which package to choose - how about kernel26? The fixes to the kernel are tiny little bug fixes and seem to be reasonably frequent. xdelta yielded 90% savings with the two kernels tested. Or maybe not - a failed kernel upgrade is probably not what we want to risk...
I probably need to ping pacman-dev list again, hopefully they're not totally absorbed in pacman3. Phrakture has been surprizingly slient on this thread.
Last edited by dale77 (2007-04-11 12:05:55)
Offline
Keep up the good work, I can't wait to see it implemented in all the repos!
Offline
Any news?
Offline
The pacman devs are pretty busy at the moment. So I'm just waiting for some advice from them as to how to proceed.
Offline
Some more xdelta stats from the latest crop of upgrades. The poppler delta was pretty poor at 87% of the full download, I'd be interested to know what the change was between -3 and -4 (edit: looks pretty extensive http://cvs.archlinux.org/cgi-bin/viewcv … ag=CURRENT). On the other hand kdebase at 0.1Mb versus 30Mb is pretty sweet.
Package, New Ver, Full Size, Delta Size, Delta % of Full Size
digikam 0.9.1-3 6,864,490 1,229,928 18 %
dvd+rw-tools 7.0-2 88,309 60,135 68 %
jasper 1.900.1-1 354,620 143,828 41 %
k3b 1.0.1-1 6,378,147 1,129,179 18 %
kdeartwork 3.5.6-2 16,988,168 26,752 0 %
kdebase 3.5.6-5 30,792,081 111,963 0 %
kdegraphics 3.5.6-2 10,640,123 3,947,909 37 %
kdelibs 3.5.6-8 20,773,363 260,219 1 %
kdepim 3.5.6-5 22,425,059 35,185 0 %
kernel26 2.6.20.7-2 20,593,549 2,076,348 10 %
poppler 0.5.4-4 3,412,503 2,959,379 87 %
python 2.5.1-1 12,047,990 1,102,251 9 %
sane 1.0.18-5 3,200,233 6,329 0 %
xine-lib 1.1.6-1 4,661,999 78,950 2 %
Last edited by dale77 (2007-04-23 11:59:28)
Offline
The release of openoffice-base-2.2-4 provides a good test case for the xdelta system.
I have uploaded the delta from 2.2-3 to 2.2-4 to my delta repo. See instructions to try out the system at http://dale.phraktured.net/. The delta download is 7Mb compared to the full 86Mb.
If you try this out, by all means report back to this thread.
Offline
Tried it. It works fine. It was concretely useful to save my remaining monthly free traffic on my UMTS connection. In order to have wider feedback, you could post also the updated makepkg script for pacman3, since most of the users/devs ready to experiment new things are likely to be using pacman3.
Last edited by patroclo7 (2007-04-24 13:27:25)
Mortuus in anima, curam gero cutis
Offline
Yeah, and if/when you complete this for pacman3's makepkg, I'd suggest using the build environment variables in makepkg.conf - i.e. something like adding 'xdelta' to the array will generate the deltas as well as the packages (then we could merge that into makepkg with little difficulty).
Just a question, because I haven't looked to thoroughly: how do you handle the case where I have, say, foo-1.2 and the new version is foo-1.5? Do you try and grab 1.2_to_1.3, 1.3_to_1.4, and 1.4_to_1.5 ? Or do you just download the full file?
I'm thinking of a case where someone hadn't updated in a few months, and some large package had multiple updates in that time period.
Another note: it might be worthwhile to send a HEAD request first, to check if the delta exists... I know curl can do that, but am unsure about wget.
Offline
Just a question, because I haven't looked to thoroughly: how do you handle the case where I have, say, foo-1.2 and the new version is foo-1.5? Do you try and grab 1.2_to_1.3, 1.3_to_1.4, and 1.4_to_1.5 ? Or do you just download the full file?
I'm thinking of a case where someone hadn't updated in a few months, and some large package had multiple updates in that time period.
a good idea would be to eventualy set some rules on how to manage deltas on servers.
i think that deltas from previous versions should stay on server until:
- their sum exceeds some percent of the size of original package (if this number is 75% then data on server would grow to maximum 175% of its present size), in this case the oldest is erased if it's not the only one, or
- the delta is older than four/five/six months.
also deltas should be stored in a seperate directory (like current, extra, community):
- that a mirror that has enough capacity would mirror them or in other case it won't and
- if deltas will ever become oficiall, users could easily enable/disable delta folder in pacman.conf.
what dou you think?
i just have one question. what if there were 5 or 6 deltas in repo to update from older to current version that still comply to upper "rules"? would patching a package six times take a lot of time? should the number of deltas for the same package also be limited?
Offline