You are not logged in.

#26 2007-03-05 10:24:37

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

This sounds like a great idea!

I'm just about to download glibc-2.5-6.pkg.tar.gz having been through -3,-4 & -5

Some details of my pain:

glibc-2.5-3 10M
glibc-2.5-4 11M
glibc-2.5-5 10M

The xdelta patch I just knocked up from -3 to -4 is 300K, from -4 to -5 is 770K

Is the xdelta idea moving ahead?

Dale

Offline

#27 2007-03-05 12:22:25

iBertus
Member
From: Greenville, NC
Registered: 2004-11-04
Posts: 2,228

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I don't think support for deltas is going to be in pacman3. Perhaps I'm mistaken, but I haven't heard anything about this for awhile.

Offline

#28 2007-03-05 15:31:04

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

The xdelta idea is not a good one. Imagine the complexity here:
You have glibc-2.4-1 installed.
You want to update to the current glibc (2.5-5 according to the above).
pacman now has to determine any and all patches in that chain.
2.4-1 to 2.4-2
2.4-2 to 2.4-3
2.4-3 to 2.4-4
...
2.4-X to 2.5-1
2.5-1 to 2.5-2
2.5-2 to 2.5-3
2.5-3 to 2.5-4
2.5-4 to 2.5-5

Then it needs to try and patch each one.  Now, what happens if one in the middle fails? I'd assume each step along the way will need a backup (think: local disk space time 10), to prevent failures.  If any of these patches fails, it needs to rollback and try again, and move all the way up to current.

You're not saving any time doing this, and you're introducing quite a few HUGE points of failure.

Sure this will help dialup users, but arch has never been targeted at dialup users.  It has always been a "networked distro".

Secondly, multiple package updates in a short period of time are a packaging issue.  It should not happen.

Offline

#29 2007-03-06 00:52:05

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

phrakture wrote:

The xdelta idea is not a good one. Imagine the complexity here:
You have glibc-2.4-1 installed.
You want to update to the current glibc (2.5-5 according to the above).
pacman now has to determine any and all patches in that chain.
2.4-1 to 2.4-2
2.4-2 to 2.4-3
2.4-3 to 2.4-4
...
2.4-X to 2.5-1
2.5-1 to 2.5-2
2.5-2 to 2.5-3
2.5-3 to 2.5-4
2.5-4 to 2.5-5

Then it needs to try and patch each one.  Now, what happens if one in the middle fails? I'd assume each step along the way will need a backup (think: local disk space time 10), to prevent failures.  If any of these patches fails, it needs to rollback and try again, and move all the way up to current.

You're not saving any time doing this, and you're introducing quite a few HUGE points of failure.

Sure this will help dialup users, but arch has never been targeted at dialup users.  It has always been a "networked distro".

Sure the longer the chain gets the worse the "delta" approach gets. But is there a use for xdelta in the "up-to-date" networked distro space? I'm mainly thinking about the situation where you *stay current* with frequent updates. If there was *always* a delta available from last to current, then that would be enough. If we forget the chain back to package-0.1 and just concentrate on package-current and package-current-minus-1 would that make the idea more appealing? I would certainly be more inclined to pacman -Syu regularly without concern if the download was not so large (85Mb, last time I looked). And I'm not on dialup, but I do have to pay for downloaded data...

So my suggestion would be to have pacman first check for package-local-to-current.delta, if it exists, fine, download that and apply to the package-local.pkg.tar.gz in the cache. If not just grab package-current.pkg.tar.gz. This would scale in that if the package maintainer chooses to provide a delta to current for a specific "released" version (say the one that went out on the previous install ISO say) he could, and the system would work the same way... So for glibc you might have on the server:

glibc-2.5-1-to-6.delta (however much)
glibc-2.5-5-to-6.delta (770K)
glibc-2.5-6.pkg.tar.gz (10Mb)

phrakture wrote:

Secondly, multiple package updates in a short period of time are a packaging issue.  It should not happen.

glibc has released 4,5 & now 6 since I started up, so I think there is a practical reason for delta's. Think "security fix". These tiny little fixes to massive packages should get rolled out to the community as soon as possible, daily or weekly, and having a much smaller delta to apply would grease the wheels on that process...

Dale

Offline

#30 2007-03-06 04:24:11

AndyRTR
Developer
From: Magdeburg/Germany
Registered: 2005-10-07
Posts: 1,641

Re: Binary Diffs for Pacman, a detailed proposal + evidence

why not setting up a community based additional repo holding those delta packages? if it goes well and shows more pros than cons, we might think again about an implementation.

Offline

#31 2007-03-06 10:54:11

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

AndyRTR wrote:

why not setting up a community based additional repo holding those delta packages? if it goes well and shows more pros than cons, we might think again about an implementation.

Can you explain some more? For this to work makepkg would have to spit out a delta alongside the new package, and pacman would have to check for the existence of a certain delta based on currently installed package version on the user's machine... How would you set up a repo that would be useful without support from the tools?

Offline

#32 2007-03-06 11:51:30

AndyRTR
Developer
From: Magdeburg/Germany
Registered: 2005-10-07
Posts: 1,641

Re: Binary Diffs for Pacman, a detailed proposal + evidence

go on and patch pacman and makepkg ver.3(!) to make it ready for xdelta. pacman -S should check first a "delta-repo" if a delta pkg exists. if not it should fall back to usual update procedure.

i think as long as we make our packages by hand there's not much space for more work. if we will ever have something for automated package building that might change.

Offline

#33 2007-03-06 15:54:35

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I have to agree with Andy here.  As you can probably tell, my opinion is that this is not needed.  I do, however, understand how some people would like it.  But, because I feel it is a bit overboard, I wouldn't want to really work on it until other things are ironed out.  Still, I'm never really against patches, and if you provide one, I can look into inclusion.  You;d get this feature in much faster if you provide a patch.
If you'd like, you can join the pacman-dev mailing list (or #archlinux-pacman on irc) and we can discuss implementation details.

Offline

#34 2007-03-06 17:36:40

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

There is a patch for recent makepkg in this thread, although I don't suppose anyone's looked at it... It spits out a binary diff if an old version of a package exists in the make dir. By hosting binary diffs from the last 1-3 versions of a package to the current version most people who update semi-frequently could get deltas instead of full versions - significant bandwidth saving.
The system I describe degrades gracefully - if no binary diffs are available, pacman works just like it always did since the full current package is always hosted alongside the binary diff.

edit:
There are, however, at least two potential issues:

1. Disk-space requirements during patching

2. MD5 summing issues due to xdelta operating on the tar, not the tgz (not decompressing first significantly bloats the binary diffs) - this can be worked around (see above posts).

Last edited by Thikasabrik (2007-03-06 17:41:33)

Offline

#35 2007-03-06 18:15:04

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thikasabrik wrote:

There is a patch for recent makepkg in this thread, although I don't suppose anyone's looked at it...

It is a patch for the pacman 2.X series.  There has been significant work on the 3.X series to pretty much invalidate this patch.
Please check the ViewCVS root here: http://cvs.archlinux.org/cgi-bin/viewcv … oot=Pacman

Also, please read submitting-patches

Offline

#36 2007-03-06 19:53:21

toofishes
Developer
From: Chicago, IL
Registered: 2006-06-06
Posts: 602
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thikasabrik wrote:

There is a patch for recent makepkg in this thread, although I don't suppose anyone's looked at it... It spits out a binary diff if an old version of a package exists in the make dir. By hosting binary diffs from the last 1-3 versions of a package to the current version most people who update semi-frequently could get deltas instead of full versions - significant bandwidth saving.
The system I describe degrades gracefully - if no binary diffs are available, pacman works just like it always did since the full current package is always hosted alongside the binary diff.

edit:
There are, however, at least two potential issues:

1. Disk-space requirements during patching

2. MD5 summing issues due to xdelta operating on the tar, not the tgz (not decompressing first significantly bloats the binary diffs) - this can be worked around (see above posts).

Realize that a patch for makepkg is quite easy compared to the work pacman will need. I saw your psuedo-code patch for pacman, and it is vastly simpler than how pacman actually work. Pacman operates on the ideas of transactions, so you will need to incorporate much of that, let alone actually code up all of the operations in C. And calling external programs should be used with care in C; libraries are a much better way of doing things.

Long and short- if someone else feels strongly about this and puts together a solid patchset, then it has a chance of inclusion. However, phrakture and I are not as motivated to do this as many other things on our own TODO lists. Look at this as the beauty of open source- developers do the things they find most important. smile

Offline

#37 2007-03-06 20:26:46

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

phrakture wrote:

I have to agree with Andy here.  As you can probably tell, my opinion is that this is not needed.  I do, however, understand how some people would like it.  But, because I feel it is a bit overboard, I wouldn't want to really work on it until other things are ironed out.  Still, I'm never really against patches, and if you provide one, I can look into inclusion.  You;d get this feature in much faster if you provide a patch.
If you'd like, you can join the pacman-dev mailing list (or #archlinux-pacman on irc) and we can discuss implementation details.

I'm not averse to the idea of looking at this and providing a patch. Thikasabrik has already started the process of figuring out how to make this work. I'll investigate further.

Thikasabrik: Do you want to collaborate on a patch?

Offline

#38 2007-03-07 14:37:32

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

dale77 wrote:

I'm not averse to the idea of looking at this and providing a patch. Thikasabrik has already started the process of figuring out how to make this work. I'll investigate further.

Thikasabrik: Do you want to collaborate on a patch?

Well, patching current (pacman 3) makepkg shouldn't be much trouble (it's still a shell script) but, as toofishes points out, I have no idea what I'm doing when it comes to pacman. I don't know C (hides)...
My original purpose in this thread was to sketch a plausible system, and to try to encourage implementation (why do it yourself when someone else might, right? wink ). I would be happy to keep thinking about it, and I will gladly cook up a new makepkg patch when I get a minute. As for pacman, a nice thing that has happened since I looked at this before is that xdelta hit version 3 (http://www.xdelta.org/) and now comes as a library - not only could this make pacman integration easier, but it may also allow some shuffling of data such that disk-space requirements aren't so bad. In fact, the command line version now seems to support stdin/out, which goes a long way towards reducing space requirements itself.
My only problem is that I have little time to spare right now; I'm in the run-up to my final exams here at Durham (UK). I'll see what I can contribute, but it ain't a priority right now.

Offline

#39 2007-03-08 09:15:42

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Well I've started to look at some code. Here's my plan:

User issues update request for package-ver-n.pkg.tar.gz

1. check to see if package-ver-n-less-one.pkg.tar.gz exists in cache
2. if it does goto 3 else 7
3. download package-ver-n-less-one-to-n.delta to cache
4. if download is successful goto 5 else 7
5. perform xdelta patch on package-ver-n-less-one.pkg.tar.gz to produce package-ver-n.pkg.tar.gz
6. if patch is successful goto 8 else 7
7. download package-ver-n.pkg.tar.gz as per normal
8. continue normal update

I've chosen the lately released kdebase-3.5.6-3.pkg.tar.gz as my testbed. This ones's a corker.

kdebase-3.5.6-2-to-3.delta   141871 
kdebase-3.5.6-3.pkg.tar.gz 30792306

Hmmm. That's close to a 30Mb saving for every arch user who cares about kdebase...

I haven't coded in C in a while, any tips as to tools? KDevelop seems to work pretty well after the initial setup.

Offline

#40 2007-03-08 12:27:40

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Well, on the whole that seems reasonable. You will, of course, need to account for md5 summing issues.
xdelta itself verifies the tar to be patched against the diff (which contains the md5 string for the original tar), but pacman operates with md5 sums of the gzipped tar. When xdelta patches an existing package the resulting tgz (due to recompression) may not have the same md5 sum as the current tgz generated by makepkg.

One way to solve this is to have makepkg generate the full tgz _from_ the patch it spits out, but this could get messy e.g. if patches are generated after the full package is in the repo.

Another is to switch pacman to consider the md5 of the tar (not really an option).

Another is to not let xdelta do the decompression/recompression and do it ourselves, trying to match the settings used by makepkg to make the md5 of the tgzs match (not sure if this is feasible)

Offline

#41 2007-03-08 13:22:40

toofishes
Developer
From: Chicago, IL
Registered: 2006-06-06
Posts: 602
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Tools? Vim with BufExplorer and a fresh set of ctags. smile

It is pretty amazing how quick you can start to work with just a text editor, although I will admit it took me some time to become familiar enough with the pacman code to be able to do this.

By the way, you two do know there is a pacman-dev mailing list- you may want to think about joining just to keep up to date on what is going on. Warning, there is quite a bit of mail that flows across here, so send it to an account you don't mind getting a lot of mail on. You can find the link to join on the main page.

Offline

#42 2007-03-08 16:05:57

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

toofishes wrote:

Tools? Vim with BufExplorer and a fresh set of ctags. smile

It is pretty amazing how quick you can start to work with just a text editor, although I will admit it took me some time to become familiar enough with the pacman code to be able to do this.

Just a note on this, a little off topic.

When I was in school, every class I had (note: not a CS student) we used some tool to code with.  This includes things like VHDL, which needed it's own special compiler/uploader to the actual hardware.  We also almost always used Borland or Visual Studio for C/C++ code.  Even Motorola assembly was done with some motorola IDE they make.

Still, it wasn't until I had an x86 assembly class where we were told to use textpad to write code with, that everything *clicked*.  It was the actual step of not using some tool that hides the details that made things make sense.

That is why I almost always recommend this route for programming.  I try to steer people away from the big environments (eclipse, kdevelop, anjuta, etc) because they hide details from you.  If you're forced to do things by hand, you actually learn the compiler flags (gcc -c is a common one people don't know unless they've compiled things by hand) and how to use make and tools like that.  So I'd recommend that here.

Pick a text editor - any will do, though the top contenders are typically vim and emacs.  Write your code with that.  Compile by hand (or use the makefiles we have).  You end up understanding a lot more than if you used an entire IDE.

Offline

#43 2007-03-12 10:59:43

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

OK, I've had a little poke around in some code. I'm thinking that the approach to take is to intercept a low-level call to "download package-current.pkg.tar.gz" and interpose the logic to "download patch, apply patch to create package-current.pkg.tar.gz". The idea is that "dl-patch, apply" is just a slightly more circuitous protocol than straight ftp/http. smile The end result of both protocols is package-current.pkg.tar.gz. For pacman 2.9.8 the code in question might be "downloadfiles_forreal", I haven't investigated the pacman 3 code yet.

Where do people see these delta's living? I would have thought they would sit right next to the package on the mirror:

package-1.0-2.pkg.tar.gz
package-1.0-1-to-2.delta

Offline

#44 2007-03-12 11:12:38

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

As you said some time ago regarding makepkg...

Thikasabrik wrote:

PS:Pehaps a better version would work in tar files until the final step. This would create a patch that works on tar files, so the gunziping and gziping would be done by pacman at patch time instead of by xdelta. This would allow greater flexibility and might even completely avoid the nasty md5 issue. I expect I'll investigate that soon.

I was thinking the same. The differences in mdsums come from potential differences in the gzip algorithm applied by makepkg versus xdelta's internal compression. If we take xdelta compression out of the loop and use the makepkg gzip, or conversely use your other suggestion of having makepkg use the xdelta created gzip, we should have no md5 mismatches.

I lean toward using makepkg gzip, as the xdelta internal compression is only for convenience in adhoc use.

Offline

#45 2007-03-12 15:36:20

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

dale77 wrote:

OK, I've had a little poke around in some code. I'm thinking that the approach to take is to intercept a low-level call to "download package-current.pkg.tar.gz" and interpose the logic to "download patch, apply patch to create package-current.pkg.tar.gz". The idea is that "dl-patch, apply" is just a slightly more circuitous protocol than straight ftp/http. smile The end result of both protocols is package-current.pkg.tar.gz. For pacman 2.9.8 the code in question might be "downloadfiles_forreal", I haven't investigated the pacman 3 code yet.

Yeah, that would be my approach - only the download handler should need to care about this, which hopefully makes cramming it into pacman minimally tortuous.

edit:
Of course, that does mean people won't be able to manually download delta files and just run pacman on them... Possibly the pacman -U response should also handle deltas. hmm Luckily, I don't think there are any other cases we need to deal with - we will not be storing deltas in the cache.

Last edited by Thikasabrik (2007-03-14 09:50:41)

Offline

#46 2007-03-14 11:25:40

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Ok, here's Thikasabrik's patch updated for makepkg 3.0

--- pacman-lib/scripts/makepkg    2007-03-09 18:11:47.000000000 +1300
+++ build/scripts/makepkg    2007-03-15 00:23:54.000000000 +1300
@@ -1040,6 +1040,34 @@
 fi
 
 cd "$startdir"
+
+# Check to see if we have any old versions to create delta's with
+base_file=""
+delta_file=""
+for old_file in $startdir/$pkgname-*-${CARCH}.pkg.tar.gz; do
+    namend=${old_file#"$startdir/$pkgname-"}
+    old_version=${namend%."pkg.tar.gz"}
+    if [ "$old_version" != "$pkgver-$pkgrel-${CARCH}" ] && [ "$old_version" != "*-${CARCH}" ]; then
+       # old_file will include the just built package, only remember the old versions
+       base_file=$old_file 
+       msg "Making delta from version $old_version"
+       delta_file=$PKGDEST/$pkgname-${old_version}_to_$pkgver-$pkgrel-${CARCH}.delta
+       # xdelta will decompress base_file & pkg_file into TMP_DIR (or /tmp if TMP_DIR is unset)
+       # then perform the delta on the resulting tars
+       xdelta delta $base_file $pkg_file $delta_file
+    fi
+done
+
+if [ "$delta_file" != "" ]; then
+  # Generate the final gz using xdelta for compression. xdelta will be our common
+  # denominator compression utility between the packager and the users
+  #
+  # makepkg and pacman must use the same compression algorithm or the compressed
+  # package will likely not match, producing md5 checksum errors.
+  #
+  xdelta patch $delta_file $base_file $pkg_file
+fi
+
 if [ "$CLEANUP" = "1" ]; then
     msg "Cleaning up..."
     rm -rf src pkg

Offline

#47 2007-03-14 13:08:17

Mikos
Member
From: Prague, Czech Republic
Registered: 2005-05-03
Posts: 228
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Why Xdelta? bsdiff seems to be much better, according to bsdiff site:

bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta

If it is true, then please forget about Xdelta, it is much worse than bsdiff.

Offline

#48 2007-03-14 16:56:30

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Mikos wrote:

Why Xdelta? bsdiff seems to be much better, according to bsdiff site:

bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta

If it is true, then please forget about Xdelta, it is much worse than bsdiff.

One problem with bsdiff is that it does not come as a library. Another is that it has not been updated since 2005. Since this time xdelta has hit version 3 and tends to be much faster and produce smaller deltas than RTPatch (also used as an example on the page you link to).
Btw, we should really be working against xdelta3 - only version 1 exists in the official repo (it has been marked out of date).

Offline

#49 2007-03-15 01:30:43

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thikasabrik wrote:

One problem with bsdiff is that it does not come as a library. Another is that it has not been updated since 2005. Since this time xdelta has hit version 3 and tends to be much faster and produce smaller deltas than RTPatch (also used as an example on the page you link to).
Btw, we should really be working against xdelta3 - only version 1 exists in the official repo (it has been marked out of date).

I have had at first look at the xdelta 3 code. How to use it as a library didn't just jump out and hit me between the eyes, and the documentation on xdelta.org leaves a lot to be desired. It looks like most of the docs are in xdelta.c sad It also has a roll your own build system i.e. custom Makefile. In short xdelta3 as a library isn't the sweetest solution available...

Anyway, we'll see how things progress down the pacman route now. Calling xdelta/bsdiff at runtime from pacman is still on the shelf as an option as far as I'm concerned. The code to call an external process to obtain the package is already there in a feature to "use a custom download script" instead of the builtin pacman ftp/http. It's early days on the pacman side for me...

xdelta or bsdiff, I've seen some pretty huge improvements that look like a no-brainer to go for. The last one was the 1.2.1 to 1.2.2 Wesnoth upgrade, the full download was something like 60Mb, the patch about 6Mb. Maybe the pacman code will change my mind, but it's looking OK so far.

Offline

#50 2007-03-21 14:00:16

cr7
Member
Registered: 2006-11-28
Posts: 103

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Why not zsync or httpup?

Offline

Board footer

Powered by FluxBB