You are not logged in.

#101 2007-04-24 21:38:48

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

phrakture wrote:

Yeah, and if/when you complete this for pacman3's makepkg, I'd suggest using the build environment variables in makepkg.conf - i.e. something like adding 'xdelta' to the array will generate the deltas as well as the packages (then we could merge that into makepkg with little difficulty).

So you aren't happy with what it does currently? Today it just automatically generates a delta from the latest version found in PKGDEST (or build dir) and PKGCACHE, if xdelta is on the maintainer's system.

phrakture wrote:

Just a question, because I haven't looked to thoroughly: how do you handle the case where I have, say, foo-1.2 and the new version is foo-1.5? Do you try and grab 1.2_to_1.3, 1.3_to_1.4, and 1.4_to_1.5 ? Or do you just download the full file?

I'm thinking of a case where someone hadn't updated in a few months, and some large package had multiple updates in that time period.

My solution only creates a delta from previous to current. This is the optimal system I think balancing simplicity, mirror storage, and bandwidth savings. If someone doesn't download for months, they should just get the full current version. This system is designed for those who keep arch up-to-date, say once per week. There is of course nothing stopping a maintainer generating xdelta's manually for important pkg versions such as that on the latest release ISO.

phrakture wrote:

Another note: it might be worthwhile to send a HEAD request first, to check if the delta exists... I know curl can do that, but am unsure about wget.

I'll have a look at that, currently wget just errors out on the missing delta and the script moves on to pull the full package instead.

Last edited by dale77 (2007-04-24 21:40:10)

Offline

#102 2007-04-24 22:01:27

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

billy wrote:

a good idea would be to eventualy set some rules on how to manage deltas on servers.
i think that deltas from previous versions should stay on server until:
  - their sum exceeds some percent of the size of original package (if this number is 75% then data on server would grow to maximum 175% of its present size), in this case the oldest is erased if it's not the only one, or
  - the delta is older than four/five/six months.
also deltas should be stored in a seperate directory (like current, extra, community):
  - that a mirror that has enough capacity would mirror them or in other case it won't and
  - if deltas will ever become oficiall, users could easily enable/disable delta folder in pacman.conf.

what dou you think?

i just have one question. what if there were 5 or 6 deltas in repo to update from older to current version that still comply to upper "rules"? would patching a package six times take a lot of time? should the number of deltas for the same package also be limited?

Billy I think your post accurately captures the complexity and cost of maintaining a harem of deltas alongside the full package. So, my suggestion is we use two simple rules for managing the deltas on the servers:

1. We only store the delta from previous to current on the server.
2. As an exception to rule 1, the maintainer might provide a delta from current release ISO version to current

If you want to benefit from the delta you should pacman -Syu at least once per week. Otherwise just keep downloading the full versions every month...

I personally think the delta should just sit right alongside the full package, rather than shunting it off somewhere else. It isn't a new repo, it is "metadata" associated with a package. However, I can see advantages of putting deltas in say a subdirectory of the pkg directory. Which I guess means I don't have too strong an opinion as long as where they live is consistent.

Last edited by dale77 (2007-04-24 22:16:33)

Offline

#103 2007-04-25 00:16:49

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

dale77 wrote:
phrakture wrote:

Yeah, and if/when you complete this for pacman3's makepkg, I'd suggest using the build environment variables in makepkg.conf - i.e. something like adding 'xdelta' to the array will generate the deltas as well as the packages (then we could merge that into makepkg with little difficulty).

So you aren't happy with what it does currently? Today it just automatically generates a delta from the latest version found in PKGDEST (or build dir) and PKGCACHE, if xdelta is on the maintainer's system.

Well, makepkg3 has some decent changes compared to 2.9.8 - to me, it seems more feasible to add xdelta support in the same way  that makepkg supports ccache or distcc - it doesn't do it automatically, but needs a setting in makepkg.conf.  It's really not all that difficult, it just requires wrapping the whole thing an an option check.

Offline

#104 2007-04-25 00:38:38

toofishes
Developer
From: Chicago, IL
Registered: 2006-06-06
Posts: 602
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

phrakture wrote:
dale77 wrote:
phrakture wrote:

Yeah, and if/when you complete this for pacman3's makepkg, I'd suggest using the build environment variables in makepkg.conf - i.e. something like adding 'xdelta' to the array will generate the deltas as well as the packages (then we could merge that into makepkg with little difficulty).

So you aren't happy with what it does currently? Today it just automatically generates a delta from the latest version found in PKGDEST (or build dir) and PKGCACHE, if xdelta is on the maintainer's system.

Well, makepkg3 has some decent changes compared to 2.9.8 - to me, it seems more feasible to add xdelta support in the same way  that makepkg supports ccache or distcc - it doesn't do it automatically, but needs a setting in makepkg.conf.  It's really not all that difficult, it just requires wrapping the whole thing an an option check.

#########################################################################
# BUILD ENVIRONMENT
#########################################################################
#
# Defaults: BUILDENV=(!fakeroot !distcc color !ccache !xdelta)
#
#-- fakeroot: Allow building packages as a non-root user
#-- distcc:   Use the Distributed C/C++/ObjC compiler
#-- color:    Colorize output messages
#-- ccache:   Use ccache to cache compilation
#-- xdelta:   Generate xdelta from previous package version (if available)
#
BUILDENV=(fakeroot !distcc color ccache !xdelta)
#
#-- If using DistCC, your MAKEFLAGS will also need modification. In addition,
#-- specify a space-delimited list of hosts running in the DistCC cluster.
#DISTCC_HOSTS=""

There's your framework. I would recommend looking at the way these other options are used in makepkg and implementing xdelta the same way. This allows for two things:
1. Allows a user to choose whether or not xdelta is used (not dependent on whether it is installed), because you don't ever want to waste time making xdelta's for a local-only repo.
2. Allows disabling xdelta creation in a specific PKGBUILD if it isn't worth it (e.g. the "filesystem" package, there is no _significant_ space savings by making an xdelta).

Offline

#105 2007-04-25 16:52:37

lietuva
Banned
Registered: 2005-09-30
Posts: 36

Re: Binary Diffs for Pacman, a detailed proposal + evidence

http://dale.phraktured.net/ wrote:

NOTE: To utilize binary xdelta diffs for pacman 2.9.8 you must:

What about pacman 3?


The password to this account is lietuvis

Offline

#106 2007-04-25 17:59:12

stonecrest
Member
From: Boulder
Registered: 2005-01-22
Posts: 1,190

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Read the post right above yours?


I am a gated community.

Offline

#107 2007-04-27 11:50:43

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

toofishes wrote:

There's your framework. I would recommend looking at the way these other options are used in makepkg and implementing xdelta the same way. This allows for two things:
1. Allows a user to choose whether or not xdelta is used (not dependent on whether it is installed), because you don't ever want to waste time making xdelta's for a local-only repo.
2. Allows disabling xdelta creation in a specific PKGBUILD if it isn't worth it (e.g. the "filesystem" package, there is no _significant_ space savings by making an xdelta).

OK, as recommended I have modified my code for pacman 3.0.2. See http://dale.phraktured.net/, as before openoffice-base 2.2-3 to 2.2-4 is in the new delta3 repo.

Offline

#108 2007-05-10 16:03:35

cr7
Member
Registered: 2006-11-28
Posts: 103

Re: Binary Diffs for Pacman, a detailed proposal + evidence

What about the state of the project?

Offline

#109 2007-05-10 17:13:45

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

It's being worked on.  I've been sidetracked from pacman development for the time being, other things seem more pertinent.  This _will_ be merged as a makepkg feature, and the XferCommand script will be provided as well, for users wanting to use this.  I really do want to thank Dale for taking the time to provide this feature.  It has quite a bit of potential in the long run.

Offline

#110 2007-05-11 08:56:13

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

No worries, I hope this proves to be useful to the community. smile

Offline

#111 2007-05-23 00:41:41

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Oooo. I see KDE 3.5.7 has just been released.

If there was some interest I could post up the delta's for this nasty big upgrade. Unfortunately I personally only have 40Mb left on my data cap 'til the 27th so it would probably have to wait 'til then.

Anyone interested in using those deltas?

Offline

#112 2007-05-23 07:58:34

billy
Member
From: Slovenia
Registered: 2006-09-13
Posts: 164

Re: Binary Diffs for Pacman, a detailed proposal + evidence

i would be if i was using kde smile

Last edited by billy (2007-05-23 07:59:27)

Offline

#113 2007-05-29 09:45:48

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Hello, I got bored revising for my last exam, and so I did a little investigation into xdelta3 and also the md5 summing issue. Here is what I found:

1. The md5 summing issue is simply a fact-of-gzip. The reason for the issue is that gzip tries to store the timestamps of the original file during compression. If they are not available (e.g. if it is processing a piped stream) then it uses the current time. Thus we get the following result...

gzip -dc gcc-4.2.0-2.pkg.tar.gz.orig | gzip -c | md5sum
14781f26d5f2f1e7e70c58be342e0ec9  -
gzip -dc gcc-4.2.0-2.pkg.tar.gz.orig | gzip -c | md5sum
513e0303434401b3dd61a571e613a919  -
gzip -dc gcc-4.2.0-2.pkg.tar.gz.orig | gzip -c | md5sum
21d422dc8520a891ce5fb8fde92bcb62  -

However, it is easy to prevent this with the -n option, which causes gzip to skip storing this info. Now...

gzip -dc gcc-4.2.0-2.pkg.tar.gz.orig | gzip -nc | md5sum
8472a8d8c23ca872f835b02e1d22a1f0  -
gzip -dc gcc-4.2.0-2.pkg.tar.gz.orig | gzip -nc | md5sum
8472a8d8c23ca872f835b02e1d22a1f0  -

etc.
This simple fact, which I expect everyone knows except me smile solves the issue. All we need to do is change makepkg to use this option by default, and ensure that the option is also used when recompressing after patching the tar files.

For xdelta 1 this simply invovles taking control of the temporary files it uses and compressing/decompressing them ourselves. For xdelta3 the same can be achieved with pipes. e.g. for xdelta1...

gzip -dc /var/cache/pacman/pkg/origpkg.tar.gz > /tmp/origpkg.tar
xdelta patch patchfile /tmp/origpkg.tar /tmp/newpkg.tar
rm /tmp/origpkg.tar
gzip -nc /tmp/newpkg.tar > /var/cache/pacman/pkg/newpkg.tar.gz
rm /tmp/newpkg.tar

(this is pretty much what xdelta1 does itself, except for the -n option on gzip)

2. Xdelta3 is quite nice. The source package is rather ugly (no configure script etc), but the end result is good (I will post a PKGBUILD soon). Its patch format enables it to operate in-stream, so that we can do things like:

./xdelta3 decode -cs gcc-4.1.2-4.pkg.tar < gcc-4.1.2-4_to_4.2.0-2.vcdiff | gzip -c > gcc-4.2.0-2.pkg.tar.gz.frompatch

Since three files are involved, the source file cannot be piped in and must be specified with the -s option, but the other files can. This means patching requires disk/tmp space equivalent to the size of the decompressed original package, but that's all - Xdelta 1 required twice this. Obviously it's worth checking the delta sizes xdelta3 produces, since it uses a totally new format, but it looks promising. I will do some more tests when I get bored again smile.

Offline

#114 2007-05-29 11:11:12

space-m0nkey
Member
From: UK
Registered: 2007-03-26
Posts: 16

Re: Binary Diffs for Pacman, a detailed proposal + evidence

To make it easier for the devs to integrate this into pacman (there's been a lot of changes in makepkg since 3.0.4) I've created a git branch for the xdelta changes.

http://repo.or.cz/w/pacman.git?a=shortlog;h=xdelta


"Instead, people would take pains to tell her that beauty was only skin-deep, as if a man ever fell for an attractive pair of kidneys."
(Terry Pratchett, Maskerade)

Offline

#115 2007-05-29 19:08:24

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

space-m0nkey wrote:

To make it easier for the devs to integrate this into pacman (there's been a lot of changes in makepkg since 3.0.4) I've created a git branch for the xdelta changes.

http://repo.or.cz/w/pacman.git?a=shortlog;h=xdelta

How handy. I created patches against your tree with changes to eliminate the md5 summing issue (for xdelta 1):

1. This one makes makepkg ignore timestamps in creating its gzip.. it is logically independent of the xdelta mechanism, natch:

--- ./scripts/makepkg.in    2007-05-29 19:00:42.000000000 +0100
+++ ./scripts/makepkg.in.xdelta2    2007-05-29 19:05:50.000000000 +0100
@@ -576,7 +576,7 @@
     local pkg_file="$PKGDEST/${pkgname}-${pkgver}-${pkgrel}-${CARCH}.${PKGEXT}"
     comp_files="$comp_files .PKGINFO .FILELIST"
 
-    if ! tar -czf "$pkg_file" $comp_files *; then
+    if ! tar -c $comp_files * | gzip -nc > "$pkg_file"; then
         error "$(gettext "Failed to create package file.")"
         exit 1 # TODO: error code
     fi

2. This one manually handles the tar shuffling for xdelta, so that xdelta doesn't know we originally had tar.gzs. It also removes the now unnecessary recreation of the final pkg from the delta file.

--- ./scripts/makepkg.in    2007-05-29 19:09:32.000000000 +0100
+++ ./scripts/makepkg.in.xdelta2    2007-05-29 19:35:26.000000000 +0100
@@ -612,17 +612,23 @@
     if [ "$base_file" != "" ]; then
         msg "Making delta from version $latest_version"
         local delta_file="$PKGDEST/$pkgname-${old_version}_to_$pkgver-$pkgrel-$CARCH.delta"
+        local base_tar=$(basename "${base_file}" .gz)
+        local pkg_tar=$(basename "${pkg_file}" .gz)
 
-        # xdelta will decompress base_file & pkg_file into TMP_DIR (or /tmp if TMP_DIR is unset)
+        # manually decompress base_file & pkg_file into TMP_DIR (or /tmp if TMP_DIR is unset)
+        # (we want xdelta to produce a delta between the tar files and not the tar.gz files)
+        if [ "$TMP_DIR" = "" ]; then
+            local TMP_DIR="/tmp"
+        fi
+        gzip -dc "$base_file" > "$TMP_DIR/$base_tar"
+        gzip -dc "$pkg_file" > "$TMP_DIR/$pkg_tar"
+        
         # then perform the delta on the resulting tars
-        xdelta delta "$base_file" "$pkg_file" "$delta_file"
-
-        # Generate the final gz using xdelta for compression. xdelta will be our common
-        # denominator compression utility between the packager and the users
-        #
-        # makepkg and pacman must use the same compression algorithm or the delta generated
-        # package may not match, producing md5 checksum errors.
-        xdelta patch "$delta_file" "$base_file" "$pkg_file"
+        xdelta delta -p "$TMP_DIR/$base_tar" "$TMP_DIR/$pkg_tar" "$delta_file"
+        
+        # delete the tars now they are no longer needed
+        rm "$TMP_DIR/$base_file"
+        rm "$TMP_DIR/$pkg_file"
     else
         msg "No previous version found, skipping xdelta"
     fi

That patch goes together with changes to the xdelta downloader.
edit: I noticed that the manual tar handling might not be required... it seems xdelta might effectively use the -n option itself. However, the final step of creating the tar.gz from the delta is still unnecessary,  and making the tar handling explicit makes error handling more flexible I guess..

3. The modified downloader is here:

#!/bin/bash
o=$(basename $1)
o_tar=$(basename "${o}" .gz)
u=$2
CARCH="i686" # Hmmm where to get this from? /etc/makepkg.conf?
cached_file=""
if [ "$TMP_DIR" = "" ]; then
    local TMP_DIR="/tmp"
fi
# Only check for pkg.tar.gz files in the cache, we download db.tar.gz as well
if [[ "$o" =~ "pkg.tar.gz" ]] # if $o contains pkg.tar.gz
then
  pkgname=${o%-*-[0-9]-${CARCH}.pkg.tar.gz.part}   # Parse out the package name
  newend=${o##$pkgname-}                  # Parse out everything following pkgname
  new_version=${newend%-${CARCH}.pkg.tar.gz.part}  # Strip off .pkg.tar.gz.part leaving version
  url=${u%/*}
  for cached_file in $(ls -r /var/cache/pacman/pkg/${pkgname}-*-${CARCH}.pkg.tar.gz 2>/dev/null); do
    # just take the first one, by name. I suppose we could take the latest by date...
    oldend=${cached_file##*/$pkgname-}
    old_version=${oldend%-${CARCH}.pkg.tar.gz}
    if [ "$old_version" = "$new_version" ]; then
      # We already have the new version in the cache! Just continue the download.
      cached_file=""
    fi
    break
  done
fi
if [ "$cached_file" != "" ]; then
  cached_tar=$(basename "${cached_file}" .gz)
  # Great, we have a cached file, now calculate a patch name from it
  delta_name=$pkgname-${old_version}_to_${new_version}-${CARCH}.delta
  # try to download the delta
  if wget --passive-ftp -c $url/$delta_name; then
    # extract tar to patch
    gzip -dc "$cached_file" > "$TMP_DIR/$cached_tar"
    # Now apply the delta to the cached file to produce the new file
    echo Applying delta...
    if  xdelta patch "$delta_name" "$TMP_DIR/$cached_tar" "$TMP_DIR/$o_tar"; then
      # Remove the delta now that we are finished with it
      rm $delta_name
      # gzip the resulting tar and remove the originals
      gzip -nc "$TMP_DIR/$o_tar" > "$o"
      rm "$TMP_DIR/$o_tar"
      rm "$TMP_DIR/$cached_tar"
    else
      # Hmmm. xdelta failed for some reason
      rm $delta_name
      rm "$TMP_DIR/$cached_tar"
      # just download the file
      wget --passive-ftp -c -O $o $u
    fi
  else
    # just download the file
    wget --passive-ftp -c -O $o $u
  fi  
else
  # just download the file
  wget --passive-ftp -c -O $o $u  
fi

It ain't terribly pretty, but it should work (this script is in need of prettifying wink).
Note that these are not tested, so its probably best to give 'em a sanity check first... but what could possible go wrong, eh? big_smile

Last edited by Thikasabrik (2007-05-29 20:29:26)

Offline

#116 2007-05-29 19:53:14

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

space-m0nkey wrote:

To make it easier for the devs to integrate this into pacman (there's been a lot of changes in makepkg since 3.0.4) I've created a git branch for the xdelta changes.

http://repo.or.cz/w/pacman.git?a=shortlog;h=xdelta

Just saw that.  Thanks, I'll merge this in tonight when I get a chance - my first priority is package cleanup.

Offline

#117 2007-05-29 22:49:26

space-m0nkey
Member
From: UK
Registered: 2007-03-26
Posts: 16

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thikasabrik wrote:

How handy. I created patches against your tree with changes to eliminate the md5 summing issue (for xdelta 1):

1. This one makes makepkg ignore timestamps in creating its gzip.. it is logically independent of the xdelta mechanism, natch:

--- ./scripts/makepkg.in    2007-05-29 19:00:42.000000000 +0100
+++ ./scripts/makepkg.in.xdelta2    2007-05-29 19:05:50.000000000 +0100
@@ -576,7 +576,7 @@
     local pkg_file="$PKGDEST/${pkgname}-${pkgver}-${pkgrel}-${CARCH}.${PKGEXT}"
     comp_files="$comp_files .PKGINFO .FILELIST"
 
-    if ! tar -czf "$pkg_file" $comp_files *; then
+    if ! tar -c $comp_files * | gzip -nc > "$pkg_file"; then
         error "$(gettext "Failed to create package file.")"
         exit 1 # TODO: error code
     fi

2. This one manually handles the tar shuffling for xdelta, so that xdelta doesn't know we originally had tar.gzs. It also removes the now unnecessary recreation of the final pkg from the delta file.

--- ./scripts/makepkg.in    2007-05-29 19:09:32.000000000 +0100
+++ ./scripts/makepkg.in.xdelta2    2007-05-29 19:35:26.000000000 +0100
@@ -612,17 +612,23 @@
     if [ "$base_file" != "" ]; then
         msg "Making delta from version $latest_version"
         local delta_file="$PKGDEST/$pkgname-${old_version}_to_$pkgver-$pkgrel-$CARCH.delta"
+        local base_tar=$(basename "${base_file}" .gz)
+        local pkg_tar=$(basename "${pkg_file}" .gz)
 
-        # xdelta will decompress base_file & pkg_file into TMP_DIR (or /tmp if TMP_DIR is unset)
+        # manually decompress base_file & pkg_file into TMP_DIR (or /tmp if TMP_DIR is unset)
+        # (we want xdelta to produce a delta between the tar files and not the tar.gz files)
+        if [ "$TMP_DIR" = "" ]; then
+            local TMP_DIR="/tmp"
+        fi
+        gzip -dc "$base_file" > "$TMP_DIR/$base_tar"
+        gzip -dc "$pkg_file" > "$TMP_DIR/$pkg_tar"
+        
         # then perform the delta on the resulting tars
-        xdelta delta "$base_file" "$pkg_file" "$delta_file"
-
-        # Generate the final gz using xdelta for compression. xdelta will be our common
-        # denominator compression utility between the packager and the users
-        #
-        # makepkg and pacman must use the same compression algorithm or the delta generated
-        # package may not match, producing md5 checksum errors.
-        xdelta patch "$delta_file" "$base_file" "$pkg_file"
+        xdelta delta -p "$TMP_DIR/$base_tar" "$TMP_DIR/$pkg_tar" "$delta_file"
+        
+        # delete the tars now they are no longer needed
+        rm "$TMP_DIR/$base_file"
+        rm "$TMP_DIR/$pkg_file"
     else
         msg "No previous version found, skipping xdelta"
     fi

This is going to be VERY slow for large packages (glibc, kernel, gcc...). But at the moment I can't think of a better solution.


"Instead, people would take pains to tell her that beauty was only skin-deep, as if a man ever fell for an attractive pair of kidneys."
(Terry Pratchett, Maskerade)

Offline

#118 2007-05-30 04:42:09

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

space-m0nkey wrote:

To make it easier for the devs to integrate this into pacman (there's been a lot of changes in makepkg since 3.0.4) I've created a git branch for the xdelta changes.

http://repo.or.cz/w/pacman.git?a=shortlog;h=xdelta

Hey Andrew,
I'm not entirely sure on this, but I thought that dale should be indicated somewhere in these patches (I mean, he did do most of the work here), via the --author flag to git-commit.

Seeing as it looks to be only two patches, could you possibly commit them with some indication of where it came from, just so I can give the proper credit when I pull this (I pulled now, but didn't merge to the master branch just yet due to this issue).

Offline

#119 2007-05-30 04:47:06

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Side note, feel free to add the xdelta xfer command to the contrib/ directory too, so we can 'git' that in there.

Offline

#120 2007-05-30 07:46:17

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

space-m0nkey wrote:

This is going to be VERY slow for large packages (glibc, kernel, gcc...). But at the moment I can't think of a better solution.

Well this is exactly what xdelta does itself - I just made it explicit. Xdelta3 can make this better (only the original needs to be gunzipped for delta creation).

But you made me notice that the patching process is now more disk-hungry than before. Xdelta1 left to its own devices (according to one of my original posts) only needs the source file gunzipped to patch. However, having done some investigation it seems it does not produce a result tar.gz with the same md5sum of the original, even when we use gzip -nc for that, so this extra space is necessary... unless.
1. We let xdelta do its thing and then gzip -dc newpkg | gzip -c > newpkg to get a clean gzip.
2. We use xdelta3 where we can easily strap on our own (de)compression without increasing disc usage.

Anyway, as far as I can tell speed isn't as much of an issue as space - tmpfs on /tmp will make the xdelta1 process quite fast, but if it gets filled up then it fails.

PS: This means one of my above touted advantages of xdelta3 is gone - less space usage on patching (doh, I really should have reread my original post...). It still uses less space on delta creation, and appears faster anyway.. will test.

Last edited by Thikasabrik (2007-05-30 08:14:51)

Offline

#121 2007-05-30 09:20:26

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thikasabrik wrote:

It ain't terribly pretty, but it should work (this script is in need of prettifying wink).

Hey, that's my code and it's pretty to it's Dad. big_smile

One further issue to think about - the CARCH part of the package suffix doesn't seem to be a given these days, as it is currently missing in the repos for 2.9.8 compatibility. I wonder when packages will be in the repos with CARCH? wget-xdelta didn't cope with this (and you noticed that i686 thing too!).

Further possibilities for the gzip/md5sum thing:

1. touch the package tar on client and server with the same constant time before gzip
2. Use your own static i686 build of gzip included in the pacman-xdelta package to compress on both client and server
3. Ignore the xtra disk required (shrug, who worries about disk space today- we have 8GB flash keys!)

Keep up the good work. I'll make some time to look at this again soon...

Dale

Offline

#122 2007-05-30 09:29:34

dale77
Member
From: Down under
Registered: 2007-02-10
Posts: 102
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thikasabrik's explicit decompression of new and old packages... (But why decompress the new package - why shouldn't it be uncompressed anyway at this point in makepkg, it's only compressed now because thats what we told it to do earlier...)

space-m0nkey wrote:

This is going to be VERY slow for large packages (glibc, kernel, gcc...). But at the moment I can't think of a better solution.

Trueish, but not many people will be sitting at their keyboard eagerly waiting for their glibc compile to finish anyway. I submit that the larger the package, the smaller the "gzip-time" in proportion to the "actual build time" for makepkg. And of course, this is only a one off time cost for a n-user download benefit.

P.S. I did this for some pretty big packages - didn't notice the extra time...

Last edited by dale77 (2007-05-30 09:31:15)

Offline

#123 2007-05-30 10:56:13

space-m0nkey
Member
From: UK
Registered: 2007-03-26
Posts: 16

Re: Binary Diffs for Pacman, a detailed proposal + evidence

phrakture wrote:
space-m0nkey wrote:

To make it easier for the devs to integrate this into pacman (there's been a lot of changes in makepkg since 3.0.4) I've created a git branch for the xdelta changes.

http://repo.or.cz/w/pacman.git?a=shortlog;h=xdelta

Hey Andrew,
I'm not entirely sure on this, but I thought that dale should be indicated somewhere in these patches (I mean, he did do most of the work here), via the --author flag to git-commit.

Seeing as it looks to be only two patches, could you possibly commit them with some indication of where it came from, just so I can give the proper credit when I pull this (I pulled now, but didn't merge to the master branch just yet due to this issue).

Ooops my bad, I've fixed the patches, added the xfer script, and made Dale the author.


"Instead, people would take pains to tell her that beauty was only skin-deep, as if a man ever fell for an attractive pair of kidneys."
(Terry Pratchett, Maskerade)

Offline

#124 2007-05-30 11:47:56

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

dale77 wrote:

Hey, that's my code and it's pretty to it's Dad. big_smile

I do apologise - such a healthy young script!

dale77 wrote:

(But why decompress the new package - why shouldn't it be uncompressed anyway at this point in makepkg, it's only compressed now because thats what we told it to do earlier...)

I left it like this mainly so that the diff was easy to read, but we could of course dump a tarball to disk earlier and avoid the decompression step. The only problem there is that it increases disk usage for non-delta producing makepkg runs. Since gzip decompression requires very little CPU time I'm inclined to leave it like it is, but I don't really care that much.. tongue

dale77 wrote:

Further possibilities for the gzip/md5sum thing:

1. touch the package tar on client and server with the same constant time before gzip
2. Use your own static i686 build of gzip included in the pacman-xdelta package to compress on both client and server
3. Ignore the xtra disk required (shrug, who worries about disk space today- we have 8GB flash keys!)

1. We could touch the tar's I guess.. I am still trying to work out exactly how xdelta is using gzip - it may be using a null timestamp.. will look into it.
2. I don't think we need to go that far... I hope. Can different CFLAGS for gzip cause differences in the gzip files produced?
3. I'm happy enough with that, now that openoffice isn't so fat!

edit: more on 1... it seems xdelta does *not* put a timestamp in its gzip output, but that it is still somehow compressing differently - the compressed size of the gzip made by xdelta =/= the compressed size of the same tar with gzip -n. It claims to use the same default compression level (6) as gzip, so whatever is going on ain't simple.

Last edited by Thikasabrik (2007-05-30 13:59:56)

Offline

#125 2007-05-30 13:04:42

billy
Member
From: Slovenia
Registered: 2006-09-13
Posts: 164

Re: Binary Diffs for Pacman, a detailed proposal + evidence

sorry for interupting this conversation hmm.

if i'm right, now we have a full package in the repo (lets say gtkmm) and an xdelta package made against previous version (gtkmm-xdelta). what i want to ask is, how much more space would another xdelta package like gtkmm-docs-xdelta or gtkmm-delta-docs take? could this work, having an xdelta for packages to have more options or being compiled with different ./configure options?
i know it's a bad example with gtkmm (i use srcpac tu have docs installed, and i think having another nonxdelta package like pygtk-docs just for docs is a better idea), but maybe this could be useful for having binary packages with different functionality enabled just by applying an xdelta to it.

Last edited by billy (2007-05-30 13:06:59)

Offline

Board footer

Powered by FluxBB