You are not logged in.

#51 2007-03-22 04:19:28

mcover
Member
From: Germany
Registered: 2007-01-25
Posts: 134

Re: Binary Diffs for Pacman, a detailed proposal + evidence

At first I thought the whole binary-diff thing might be a great idea, but considering the fact that the patching process on any arch system would take much more cpu-time and memory-space, I am very concerned that users with older machines might be more annoyed than pleased. Like me for example, I still use a PIII 800mhz laptop (with pretty much limited resources) running arch. Arch just runs fastest and smoothest on my laptop and I wouldn't want to switch distro if upgrading a large amount of packages takes 10x as long as before - bandwith is no issue for me up until now and I was always happy with the time it took to upgrade.

Just my 2 cents...

Offline

#52 2007-03-22 17:22:56

stonecrest
Member
From: Boulder
Registered: 2005-01-22
Posts: 1,190

Re: Binary Diffs for Pacman, a detailed proposal + evidence

If the binary diff layer lives on top of the current pkg.tar.gz method (which I think is the current proposal), I imagine that it would be easy to have a pacman config option to disable it.


I am a gated community.

Offline

#53 2007-03-23 08:56:37

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Yes, it seems that some would not like to run the delta stuff, so a command line switch sounds like the order of the day. To either turn it on or turn it off... As for me, I run with a 1Gb monthly data cap, of which I have currently used ~800Mb. With the python 2.5 changes I currently have a ~350Mb download ahead of me. It would be an interesting exercise to see how much that would be "xdelta" style.

Others have suggested bdiff, zsync etc. bdiff is just a "reportedly better xdelta". xdelta gives *amazingly* better download updates e.g. 140K vs 29Mb etc. I'd be happy with that.

zsync has the nice feature that it does not generate a patch from current-1.pkg.tar.gz, rather it just defines a checksum map for "blocks" of current.pkg.tar.gz. Then the out-of-date client downloads this "map" and requests only those blocks from the server which do not match blocks of the local version *whatever* that is. So there would only ever be current.zsync + current.pkg.tar.gz up there. Nice.

Perhaps the zsync modus operandi would be better... My gut feeling is that it is "more complicated" but it does have a good pedigree (rsync) and the advantage of being local version agnostic. I would need to look into it in some more depth... How does it handle compressed content for instance?

Someone else on the mailing list mentioned that makepkg would break if xdelta was not present. The easy fix is to simply make pacman dependent on xdelta in the same way that it is currently dependent on tar...

I plan to get back onto this (looking at the pacman side) soon. My *personal* available time for coding is not high, but I am keen to see this feature in pacman.

Offline

#54 2007-03-25 10:15:38

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Hello all,

change in tack... Trying to integrate xdelta/zsync etc into pacman 3 may not be the best approach just now. How about this vastly simpler option? A custom XferCommand in pacman.conf. I see this as a good way to validate the concept before doing the engineering in pacman 3.

XferCommand = /root/mydelta.sh %o %u

And here's the code. I have now verified this in a custom repo, I'd like to test the waters as to whether people would start generating deltas in a test repo with a modified makepkg, for something like this method. Thoughts? Is this a goer, or do we have to have a c-coded pacman?

#!/bin/bash
o=$1
u=$2
pkgname=${o%%-*}
newname=${o##$pkgname-}
new_version=${newname%.pkg.tar.gz.part}
url=${u%/*}
cached_file=""
# Only check for pkg.tar.gz files in the cache, we download db.tar.gz as well
if [[ "$o" =~ "pkg.tar.gz" ]] # if $o contains pkg.tar.gz
then
  for cached_file in $(ls -r /var/cache/pacman/pkg/${pkgname}-*.pkg.tar.gz); do
    # just take the first one, by name. I suppose we could take the latest by date...
    oldname=${cached_file##*/$pkgname-}
    old_version=${oldname%.pkg.tar.gz}
    if [ "$old_version" = "$new_version" ]; then
      # We already have the new version in the cache! Just continue the download.
      cached_file="" 
    fi
    break
  done
fi
if [ "$cached_file" != "" ]; then
  # Great, we have a cached file, now calculate a patch name from it
  oldname=${cached_file##*/$pkgname-}
  old_version=${oldname%.pkg.tar.gz}
  delta_name=$pkgname-${old_version}_to_${new_version}.delta
  # try to download the delta
  if wget --passive-ftp -c $url/$delta_name; then
    # Now apply the delta to the cached file to produce the new file
    if xdelta patch $delta_name $cached_file $o; then
      # Remove the delta now that we are finished with it
      rm $delta_name      
    else
      # Hmmm. xdelta failed for some reason      
      rm $delta_name
      # just download the file
      wget --passive-ftp -c -O $o $u
    fi
  else
    # just download the file
    wget --passive-ftp -c -O $o $u
  fi  
else
  # just download the file
  wget --passive-ftp -c -O $o $u  
fi

Last edited by dale77 (2007-03-26 11:10:48)

Offline

#55 2007-03-26 13:33:32

cr7
Member
Registered: 2006-11-28
Posts: 103

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Wow, I look forward to test it soon!

Offline

#56 2007-03-26 15:37:35

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Ya know, the XferCommand is near genius 8)

Offline

#57 2007-03-27 14:10:22

hypermegachi
Member
Registered: 2004-07-25
Posts: 311

Re: Binary Diffs for Pacman, a detailed proposal + evidence

phrakture wrote:

Ya know, the XferCommand is near genius 8)

QFT.  KISS FTW!!!

Offline

#58 2007-03-28 08:22:57

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

OK, so what happens now? Both the packaging end (makepkg) and the client end (XferCommand) are "ready" for some more testing.

How can we get some packages created along with a delta, and continue testing? This system will only work if the packages are created "xdelta" aware, and these are available on a repo somewhere.

Offline

#59 2007-03-28 15:47:50

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I'd suggest making your own public repo somewhere, and try to follow a few recent packages... that way everyone can test it out.

Offline

#60 2007-03-28 19:46:07

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

phrakture wrote:

I'd suggest making your own public repo somewhere, and try to follow a few recent packages... that way everyone can test it out.

There is a bit of a catch-22 involved here.

1. I run on a 1Gb data cap.
2. Pulling source for makepkg, uploading deltas and packages will chew that up.
3. The only public repos available to me today are my own web space (all 5 meg of it) & my own workstation (with my 1 Gb data cap & feeble upload rates).

The ideal situation for me would be something like a ssh to a system which is not subject to my bandwidth limits.

Can anyone suggest a solution? Seems like there are a lot of people who aren't bandwidth limited - any sponsors out there? smile

Offline

#61 2007-03-28 19:50:29

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 794
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Why don't you just use google pages?

Offline

#62 2007-03-28 20:14:05

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

dale77 wrote:

The ideal situation for me would be something like a ssh to a system which is not subject to my bandwidth limits.

Can anyone suggest a solution? Seems like there are a lot of people who aren't bandwidth limited - any sponsors out there? smile

I can look into giving you some access on my dreamhost account.  I just want to be careful so you can't delete some important things 8)

I'll get back to you

Offline

#63 2007-03-28 23:27:57

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

phrakture wrote:

I can look into giving you some access on my dreamhost account.  I just want to be careful so you can't delete some important things 8)

I'll get back to you

Thanks. This is how open source *should* work.

I'll have a look at google pages as well. Although that seems to be a "web site" creation framework. I wonder if linux distro repository fits in that definition? wink

Offline

#64 2007-03-29 20:25:16

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 794
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

You get to set up a webpage, yes, but you can also upload any files you need.  There are a couple archlinux repos that are run off of google pages.

Offline

#65 2007-03-29 20:43:14

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I set him up with space on my dreamhost account.

Offline

#66 2007-03-30 01:28:43

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thanks both, I'll see what I can come up with.

Offline

#67 2007-04-02 11:14:57

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Alrighty all. Thanks to Phrakture for that account, I've put it to (hopefully) good use.

See http://dale.phraktured.net/

The only things in the delta repo at the moment are Battle for Wesnoth (full package 70Mb, delta 2Mb), and jre 6-3 (full package 38Mb, delta 3Mb).

I hope to create deltas for each package that pops up from pacman -Syu on my KDE workstation. For me this means I have to "makepkg-xdelta" the updated package myself (with the associated source download), then upload the resulting delta to the repo.

Dale

P.S. All that is required to create a delta is to install the pacman-xdelta package from the delta repo, then "makepkg-xdelta" with the previous package version in the PKGBUILD directory.

Last edited by dale77 (2007-04-02 11:49:27)

Offline

#68 2007-04-03 09:53:07

pix
Member
Registered: 2005-07-07
Posts: 25

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Congratulations dale77, I've tested your xdeltas, it works, My bandwidth greatly appreciated ;-)


Any resemblance to grammar or spelling mistakes is purely coincidental and independant of the willing of the author

Offline

#69 2007-04-03 12:05:35

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I just thought I'd add a cheer from the sidelines.. good work dale77! I was afraid of using the xfercommand method myself, but if pacman devs are happy enough with it being used this way then why not? It can always be changed later if it gets messy.
By the way, zsync looks very interesting as a potential alternative to xdelta - it might save disc space on the servers compared to the xdelta approach, and still promises significant bandwidth savings. It is able to create specially crafted gzip files (still in standard gzip format) from which selected chunks can be downloaded according to what files in the archive have changed. Instead of a delta file you get a 'map' describing which chunks have changed between versions. Another nice thing is that makepkg could always use zsync to create its tar.gz files, whether there are old versions of the package present or not, which reminds me: did you look into the possibility of using makepkg's compression command during patching as opposed to using xdelta's compression during package creation?

Last edited by Thikasabrik (2007-04-03 12:06:27)

Offline

#70 2007-04-03 12:56:31

cr7
Member
Registered: 2006-11-28
Posts: 103

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I've tested it, works fine! Thanks!

However, I quote Thikasabrik, zsync is better.

Offline

#71 2007-04-04 02:46:55

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thikasabrik wrote:

I just thought I'd add a cheer from the sidelines.. good work dale77! I was afraid of using the xfercommand method myself, but if pacman devs are happy enough with it being used this way then why not? It can always be changed later if it gets messy.
By the way, zsync looks very interesting as a potential alternative to xdelta - it might save disc space on the servers compared to the xdelta approach, and still promises significant bandwidth savings. It is able to create specially crafted gzip files (still in standard gzip format) from which selected chunks can be downloaded according to what files in the archive have changed. Instead of a delta file you get a 'map' describing which chunks have changed between versions. Another nice thing is that makepkg could always use zsync to create its tar.gz files, whether there are old versions of the package present or not, which reminds me: did you look into the possibility of using makepkg's compression command during patching as opposed to using xdelta's compression during package creation?

Using XferCommand enables the concept to be tested without large changes to pacman. Now is not really the time for those changes anyway, with pacman3 just around the corner. I personally won't look at C-code until 3.0 hits current. Also using XferCommand makes this an opt-in feature, which is nice. The only con is a slightly clunkier download solution when compared to native pacman.

zsync does sound quite nice. The key benefit being that it is previous package version agnostic. I actually started down the track of using this but it failed to work with my "tomcat" web server. zsync makes quite specific demands of the repository server (http protocol, support for byte range requests etc).

To summarize:

xdelta
pro marketshare, maturity, small delta size compared to full package
con needs one delta per previous version

bdiff
pro smallerdelta size than xdelta
con needs one delta per previous version

zsync
pro only one "delta" for all previous versions
con http based, didn't work with the first web server I tried, more complicated in my view

Having only the latest delta available for each package is enough in my view. With the possible addition of a delta from the last released version to current to help people who download an Arch iso mid way through a dev cycle. If we accept this, then zsync's ability to support all previous versions becomes less important. I see this feature as enabling people to stay "current" (weekly or daily synch) at a reduced bandwidth cost to the community, not to support monthly or quarterly synching.

I did investigate using makepkg compression. In the end I decided to make xdelta's compression algorithm the common denominator between client & server, to minimize md5sum issues. Perhaps I should look back at it again though, currently makepkg-xdelta compresses the package twice, once by makepkg (tar cz) and once by xdelta, which is a bit of a time-waster.

Of course all this work will be academic sad if the arch community doesn't take on creating these deltas (in whatever form) along with a standard makepkg run. For this system to work the delta's have to be on the mirrored repo I would think.

Dale

Offline

#72 2007-04-05 07:04:37

cr7
Member
Registered: 2006-11-28
Posts: 103

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Of course all this work will be academic if the arch community doesn't take on creating these deltas (in whatever form) along with a standard makepkg run. For this system to work the delta's have to be on the mirrored repo I would think.

With zsync, it will be quite simple to manage IMHO.

Offline

#73 2007-04-05 10:56:50

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

cr7 wrote:

Of course all this work will be academic if the arch community doesn't take on creating these deltas (in whatever form) along with a standard makepkg run. For this system to work the delta's have to be on the mirrored repo I would think.

With zsync, it will be quite simple to manage IMHO.

Sure cr7, I guess zsync would have the additional benefit that there is no need to have any previous version around for the makepkg step. But the $64,000 question: will it actually work with all the repos? It didn't when I tried it with *my* web server first up... With xdelta, having pkg_current-1 in the build directory is not a big inconvenience at build time, and (using wesnoth as an example) saving 68Mb per user in bandwidth sounds like a win for both individual user & mirror.

Offline

#74 2007-04-05 11:25:12

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I added libx11 to the delta repo, in this case the package was 2Mb, the delta 160Kb.

Offline

#75 2007-04-07 08:59:30

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Here's the current delta list and some stats.

 Package     New Version Full Size   Delta Size  % Of Full  
 freetype2   2.3.3-1         718,792     148,671        21 %
 jre6        6-3          37,364,119   3,000,289         8 %
 libx11      1.1.1-4       2,012,590     163,079         8 %
 wesnoth     1.2.3-1      71,365,629   2,364,189         3 %
 qt          3.3.8-3      10,417,784   1,599,413        15 %
 kdelibs     3.5.6-7      20,752,239     617,196         3 %
 kdepim      3.5.6-4      22,407,785     226,905         1 %
 kernel26    2.6.20-5     20,524,202   2,035,759        10 %

All and all for this package set the difference is 185Mb for the full packages versus 10Mb for the deltas.

kernel26 will no doubt be updated again shortly, as there is already a -6 on kernel.org.

Offline

Board footer

Powered by FluxBB