You are not logged in.

#1 2006-07-02 18:55:19

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Binary Diffs for Pacman, a detailed proposal + evidence

Evening,
The other week I was thinking about how much server/user bandwidth could be saved, and how many dialup users placated (possibly not so many, but they still exist!), by offering binary diff package updates alongside the full downloads. I know this has been discussed before, but I think previous discussions got lost because implementation details were either not really discussed or involved a new server-based scripts/software. I reckon it should be done by adding a little logic, and use of, say, xdelta, to pacman and makepkg. I came up with two slightly different systems, and the following is the one that seems simpler.

My system puts the decision of whether or not to make deltas with the packager and the decision of whether or not to use them with the user (where they should be), and degrades gracefully to the normal system if:
e.g. The packager can't be bothered to keep old versions / take the cpu time to make the diffs in some cases.

Now, it does depend on the user having a previous version of a package in the cache, but I think this is a rather common situation, and probably something dial-up users (or the otherwise bandwidth constrained) would be happy with.

So, without further delay, here it is seperated into sections depending on what part of the existing package system gets changed (note 'delta' is used as equivalent to 'bdiff':

In this version diffs of the full (tar.gz) packages are made using xdelta (see below). Single-space indents in numbered lists represent optional behaviour.

makepkg:
1. makes package as normal
 2. checks for previous tgz versions in the PKGBUILD dir
 3. makes delta(s) using xdelta to available previous versions (these deltas will offer a direct upgrade from each of the available versions)
 
packager:
- tries to keep a few previous (repo) versions in the build dir when making a final package
- uploads these to the repo with the package (all existing diffs for that package are cleared)

repository:
- has binary diff from the two (or so) previous .tgz files as well as the current tgz
- bdiffs have a name that reveals 'from - to' info: e.g. application-1.2-1_to_1.2-3.pkg.tar.gz.delta
- when an upgraded pkg is added to a repo. all existing deltas are removed

pacman:
1. checks existing version in cache
 2. checks repo for available delta against this version
      -could be as simple as a download attempt based on the naming rule (repo db doesn't need to change, no extra metadata)
 3. download and attempt to use this delta (xdelta automatically verifies result using embedded metadata, inc. md5)
 4. if patch fails eg. due to the result not matching the output md5 in the delta then abort/fall back to regular download
5. upgrade

Ok. Now xdelta gunzips gzipped files before making a diff, so since we are diffing between tar.gz files, we are really making diffs between tar files. On smaller packages the results are speedy, but big ones like openoffice are, of course, the real test. Here is what I found..

time to create patch:
time xdelta delta openoffice-base-2.0.0-1.pkg.tar.gz /var/cache/pacman/pkg/openoffice-base-2.0.2-2.pkg.tar.gz  openoffice-base-2.0.0-1_to_2.0.2-2.pkg.tar.gz.delta

real    8m31.279s
user    1m37.318s
sys     0m13.201s

time to patch:
time xdelta patch openoffice-base-2.0.0-1_to_2.0.2-2.pkg.tar.gz.delta openoffice-base-2.0.0-1.pkg.tar.gz result.pkg.tar.gz

real    1m29.460s
user    0m58.208s
sys     0m3.034s

Disk space:

result:
openoffice-base-2.0.2-2.pkg.tar.gz: 117.7 mb
openoffice-base-2.0.0-1.pkg.tar.gz: 102.5 mb
openoffice-base-2.0.0-1_to_2.0.2-2.pkg.tar.gz.delta: 61.9 mb
..that's less than 60% of the full download!


Diskspace requirements for patching: 
uncompressed original (temp) ~ about 212mb for OOo2 example
+ 
current compressed size ~ 117.7 mb for OOo2 example
[+ original compressed package in cache 102.5 mb + delta file 61.9 mb]
=329mb
[=495mb]


disk space req. for creation of patch:
space for uncompressed current version (temp)
+
space for uncompressed prev. version (temp)
+
space for patch
[+space for compressed versions]

It should be pointed out that for most packages, which are under 10mb, the time is generally less than a few seconds at both ends and the space requirements are (obviously) much less. If large packages like this are too resource-hungry then this should at least be considered for everything below abround 20mb, since this would still save a lot fo bandwidth. Of course, if it's optional this is very easy to do.

Important: since xdelta gunzips the tgzs and makes a delta from the tars the md5 of the file resulting from the patch may not be the same as for the current package.
But xdelta embeds md5 info in the patch file - the result is automatically verified and this
is just as secure checking the md5 for the current package.

here's the xdelta patch metadata, for good measure..
xdelta info openoffice-base-2.0.0-1_to_2.0.2-2.pkg.tar.gz.delta
xdelta: version 1.1.3 found patch version 1.1 in openoffice-base-2.0.0-1_to_2.0.2-2.pkg.tar.gz.delta (compressed)
xdelta: generated with a gzipped FROM file
xdelta: generated with a gzipped TO file
xdelta: output name:   openoffice-base-2.0.2-2.pkg.tar.gz
xdelta: output length: 317757440
xdelta: output md5:    c54596755567c827936a349bec74f23c
xdelta: patch from segments: 2
xdelta: MD5                                     Length  Copies  Used    Seq?    Name
xdelta: 6aac82b146e21cf2d200b547a8c58f95        105482062       1730318 105482062       no      (patch data)
xdelta: eb065df0621f1a25a73b41fea2c90312        283832320       2323310 212275378       no      openoffice-base-2.0.0-1.pkg.tar.gz

Important2: xdelta uses /tmp for working space by default, which may not be big enough for huge packages if tmpfs is used... Another dir can be specified.

FYI:

cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 3200+
stepping        : 0
cpu MHz         : 2199.465
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts
bogomips        : 4400.67

PS: The other system involves the packager just making a diff to the previous repo version, and pacman chaining diffs together. Of course, this would make large packages like OOo considerably more tortuous to update for the user.

PPS: Although I'd love to provide patches, learning C is rather too much of a step for me at the moment... I'm rather busy learning PHP for work-experience reasons and am currently working a full week anyway. Implementation, I hope, shouldn't be too difficult for those who work on these apps.  :oops:

Offline

#2 2006-07-02 23:12:26

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Hmm... So this would give pacman Conary-type capabilities? Sounds nice... I'm for it in 3.x, as long as it isn't a PITA for the developers.

(Maybe it would be better implemented in 4.x... On the other hand, can it be considered KISS? Would it be complicated and potentially buggy and annoying? I do not know at all...)

Edit: there should be another poll option - "Sounds nice but I don't have a clue about the implementation". lol

Offline

#3 2006-07-03 01:18:08

iphitus
Forum Fellow
From: Melbourne, Australia
Registered: 2004-10-09
Posts: 4,927

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Not all too KISS, and has questionable benefits.

Offline

#4 2006-07-03 11:46:31

wain
Member
From: France
Registered: 2005-05-01
Posts: 289
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I don't need delta-based packages, but it sounds good !
The disadvantage is the more important need for space on the repos  :?

Offline

#5 2006-07-03 12:11:01

kth5
Member
Registered: 2004-04-29
Posts: 657
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

wain wrote:

I don't need delta-based packages, but it sounds good !
The disadvantage is the more important need for space on the repos  :?

and way more dev ressources too!


I recognize that while theory and practice are, in theory, the same, they are, in practice, different. -Mark Mitchell

Offline

#6 2006-07-03 13:28:52

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Well, if it takes more dev resources it might be prudent to hold off on it for a while...

Offline

#7 2006-07-03 13:49:07

gothicknight
Member
From: Portugal
Registered: 2006-04-08
Posts: 219

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I think that dialup is getting banished... even if a person has dialup internet access in his home, there's a lot of wifi point's out there where you can get broadband internet access.
  Even so, the number of dialup user's doesn't seem bigger enought to the man power needed to get that baby running. Arch has already a (dumb) fame that is unstable so... what we need now is a 3.X stable and fast as hell.
  I sure hope pacman dev's use pacman-drive script on 3.x it made my pacman go from minutes of pacman -Syu to seconds.
  I'm sorry but diff's on a binary distro doesn't sound right, and the repo's that are out there are awsome and quite fast. So for now... it's better get with stability  wink

Offline

#8 2006-07-03 14:01:28

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Well rPath and Foresight use diffs and they're binary, not sure how well that works though...

(Regarding Pacman, I'm pretty sure I've heard that it won't be using flat files in 3.x, so no need to worry.)

Offline

#9 2006-07-03 14:16:36

gothicknight
Member
From: Portugal
Registered: 2006-04-08
Posts: 219

Re: Binary Diffs for Pacman, a detailed proposal + evidence

What do you mean by "flat files"?

Offline

#10 2006-07-03 15:15:25

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I mean text files in /var/lib/pacman/*/*.

Offline

#11 2006-07-03 17:14:48

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Well although this took a little while to describe, I think the system I sketch out is pretty simple. The logic is simple and no new apps are needed. The hard work is all done by xdelta.
Of course it will take some developer time to implement, but once it's done it should not add more than a few seconds to the makepkg process for most packages. It may increase server space requirements by around 50% (a rough guesstimate) but with the benefit of significantly reducing bandwidth as people use it. Since bandwidth is a continuing cost while storage space is one-off, I think it's worth considering.

Offline

#12 2006-07-03 17:25:46

Dusty
Schwag Merchant
From: Medicine Hat, Alberta, Canada
Registered: 2004-01-18
Posts: 5,986
Website

Re: Binary Diffs for Pacman, a detailed proposal + evidence

In terms of developer resources, its a good idea. While it will take up more server space, this is cheap compared to bandwidth, and this system would reduce bandwidth requirements. However, the real question is whether it will be faster for the user -- does it take less time to download and apply a diff, or to download the full file? The issue is it does take time to apply the diff... do you have any benchmarks?

Dusty

Offline

#13 2006-07-03 18:24:49

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Well I have a benchmark for OpenOffice.org 2 in my first post. It takes about 1m30s to apply the patch, but the most costly part is the disk space requirement - the old tar.gz needs to be decompressed to a tar (about 200mb for OOo) before it can be patched. Of course, for smaller packages the time and disk space is much less.

So, I've dug out some packages from an out-of-date mirror. Some mior version updates, some major. Some bigger, some smaller. have a butcher's:

Acroread 7.0.1 to 7.0.5
----------------------------------

patch creation:
time xdelta delta acroread-7.0.1-1.pkg.tar.gz acroread-7.0.5-1.pkg.tar.gz acroread-7.0.1-1_to_7.0.5-1.pkg.tar.gz.delta

real    0m42.607s
user    0m35.018s
sys     0m1.365s


uncompressed size of full pkg ~ 100mb

pkg compressed size ~ 37mb
delta size ~ 25mb


patching time:
time xdelta patch acroread-7.0.1-1_to_7.0.5-1.pkg.tar.gz.delta acroread-7.0.1-1.pkg.tar.gz

real    0m21.611s
user    0m19.105s
sys     0m0.863s

Mono 1.1.13.2-1 to 1.1.15-1
----------------------------------------

patch creation:
time xdelta delta mono-1.1.13.2-1.pkg.tar.gz mono-1.1.15-1.pkg.tar.gz mono-1.1.13.2-1_to_1.1.15-1.kg.tar.gz.delta

real    0m28.449s
user    0m23.946s
sys     0m0.918s

uncompressed pkg size ~ 67mb

pkg compressed ~ 24mb
delta ~ 17mb

patching time:
time xdelta patch mono-1.1.13.2-1_to_1.1.15-1.kg.tar.gz.delta mono-1.1.13.2-1.pkg.tar.gz

real    0m19.062s
user    0m16.985s
sys     0m0.638s


MySQL 4.1.13-2 to 5.0.22-1
----------------------------------------

patch creation:
time xdelta delta mysql-4.1.13-2.pkg.tar.gz mysql-5.0.22-1.pkg.tar.gz mysql-4.1.13-2_5.0.22-1.pkg.tar.gz.delta

real    0m6.828s
user    0m6.117s
sys     0m0.327s

uncompressed pkg ~25mb

pkg compressed: old 10.8 mb, new 14.4mb
delta: 4.9mb

patching time:
time xdelta patch mysql-4.1.13-2_5.0.22-1.pkg.tar.gz.delta mysql-4.1.13-2.pkg.tar.gz

real    0m6.423s
user    0m5.852s
sys     0m0.226s


mplayer pre7 to pre8
------------------------------

patch creation:
time xdelta delta mplayer-1.0pre7-6.pkg.tar.gz mplayer-1.0pre8-1.pkg.tar.gz mplayer-1.0pre7-6_to_1.0pre8-1.pkg.tar.gz.delta

real    0m5.679s
user    0m5.239s
sys     0m0.182s

uncompressed pkg ~ 12mb

pkg compressed: old 5 mb, new 6.5 mb
delta: 5.3mb

patching time:
time xdelta patch mplayer-1.0pre7-6_to_1.0pre8-1.pkg.tar.gz.delta mplayer-1.0pre7-6.pkg.tar.gz

real    0m3.290s
user    0m2.799s
sys     0m0.111s


sylpheed 2.0.4 to 2.2.6
----------------------------------
patch creation:
time xdelta delta sylpheed-2.0.4-2.pkg.tar.gz sylpheed-2.2.6-1.pkg.tar.gz sylpheed-2.0.4-2_to_2.2.6-1.pkg.tar.gz.delta

real    0m1.506s
user    0m1.213s
sys     0m0.057s

uncompressed pkg ~ 4mb

pkg compressed: ~ 1.5mb
delta: ~1mb

patching time:
time xdelta patch sylpheed-2.0.4-2_to_2.2.6-1.pkg.tar.gz.delta sylpheed-2.0.4-2.pkg.tar.gz

real    0m1.217s
user    0m0.917s
sys     0m0.036s

Check out mysql!

Conclusion, you'd need a much slower system than mine or a very fast connection for the full DL to be quicker than the patch.

PS: I should point out this is using my tmpfs /tmp dir for scratch space, which is a little faster, esp. for big pkgs, than a purely disk based tmp.

Offline

#14 2006-07-03 18:34:25

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Important: since xdelta gunzips the tgzs and makes a delta from the tars the md5 of the file resulting from the patch may not be the same as for the current package.
But xdelta embeds md5 info in the patch file - the result is automatically verified and this
is just as secure checking the md5 for the current package.

I just realised what the astute among you probably already did -  that this might be a problem for repeated updates (updates from a pkg in the cache which already results from a patch application, since the patch is created against the fresh pkg). I will test whether the md5 difference is in the tar itself or the gz to work out how much of a problem this is. Watch this space.

UPDATE: I can confirm that this is not a problem. Phew  roll . xdelta, sensibly, only works in terms of the .tar md5sums. There is no problem with upgrading from a patch-result using another patch, or using a full pkg.

Yet this does remind me that pacman needs to remember that a certain pkg stored in the cache was produced from a delta, since the md5sum in the repo db will not match the sum of the tgz. This means if one reinstalls a pkg from the cache, and the version in cache is the result of a patch, the standard verification will fail. A simple way of fixing this:

1. After patching a file in the cache, sum the result.

2. append the MD5 to the filename of patch-resultant tgzs .

This way, when a user issues an upgrade/install of a package, pacman can check these files with md5s in the names alongside the rest of the cache and respond appropriately. This means verifying the file against the appended md5 in the case of an (re)install.

To reiterate: When testing for files in the cache in the hope of downloading a patch, Pacman would treat both kinds of files identically, since xdelta doesn't care about the tgz sum, only the tar sum, which remains the same.

Alternative to filename appendage: .sum file.

Offline

#15 2006-07-04 04:47:37

ujjwal
Member
Registered: 2006-01-31
Posts: 27

Re: Binary Diffs for Pacman, a detailed proposal + evidence

I am in favour of having such a system for some package upgrades, especially newer releases of the same package. In such a case, the update may only change a small thing, like a dependancy, or a .desktop file, or a shell script etc.

But speeding up pacman and keeping it stable seems to be a higher priority according to me.

Offline

#16 2006-07-04 13:55:52

test1000
Member
Registered: 2005-04-03
Posts: 834

Re: Binary Diffs for Pacman, a detailed proposal + evidence

if this is to be implemented i pray it will be done the Right Way and not some hack on top of... on top of...

pacman has been very unstable for me, not to mention slow before it's loaded in memory.

Also i'm not sure if I like the new mysql db way pacman 3 takes, when a flat db works great and fast enough the 'pacman-drive' way.

I'm concerned this will introduce new bugs in pacman and continually be dependent that mysql doesn't fsck up...

which isn't very KISS in my opinion.


KISS = "It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience." - Albert Einstein

Offline

#17 2006-07-04 14:15:02

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: Binary Diffs for Pacman, a detailed proposal + evidence

test1000 wrote:

if this is to be implemented i pray it will be done the Right Way and not some hack on top of... on top of...

pacman has been very unstable for me, not to mention slow before it's loaded in memory.

Unstable? That's quite odd, I've never seen pacman become anything near unstable...

Also i'm not sure if I like the new mysql db way pacman 3 takes, when a flat db works great and fast enough the 'pacman-drive' way.

I'm concerned this will introduce new bugs in pacman and continually be dependent that mysql doesn't fsck up...

which isn't very KISS in my opinion.

Try filing a feature request on Flyspray.

Offline

#18 2006-07-04 22:06:27

test1000
Member
Registered: 2005-04-03
Posts: 834

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Gullible Jones wrote:

Unstable? That's quite odd, I've never seen pacman become anything near unstable...

It's a very sneaky bug i'm reffering to. No error msg'es or elsewise will be printed anywhere and the only time you will understand that it's something wrong is when you realise that your missing some files that pacman should have had installed correctly in the first place.

For example, have you ever had the need to reinstall some package for some unknown reason? It's a good chance that you have run into the bug.

here's one bugreport of it, though it probably isn't the original one:
http://bugs.archlinux.org/task/4821

Gullible Jones wrote:

Try filing a feature request on Flyspray.

ok, i guess i will do that. it's just that i'm a bit defiant. since you know, once the ball starts rolling... maybe, just maybe they won't listen to reason and go back.


KISS = "It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience." - Albert Einstein

Offline

#19 2006-07-05 02:54:40

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: Binary Diffs for Pacman, a detailed proposal + evidence

test1000 wrote:
Gullible Jones wrote:

Unstable? That's quite odd, I've never seen pacman become anything near unstable...

It's a very sneaky bug i'm reffering to. No error msg'es or elsewise will be printed anywhere and the only time you will understand that it's something wrong is when you realise that your missing some files that pacman should have had installed correctly in the first place.

For example, have you ever had the need to reinstall some package for some unknown reason? It's a good chance that you have run into the bug.

here's one bugreport of it, though it probably isn't the original one:
http://bugs.archlinux.org/task/4821

Ah, like having to install fontconfig twice for fonts to work properly?

Offline

#20 2006-07-05 10:00:53

iphitus
Forum Fellow
From: Melbourne, Australia
Registered: 2004-10-09
Posts: 4,927

Re: Binary Diffs for Pacman, a detailed proposal + evidence

the kernel-headers is an example of when files are moved from one package to another, in this case, glibc to kernel-headers. a bug in pacman causes the moved files to dissappear, i'm guessing that kernel-headers is installed first, and then glibc is upgraded, removing the files.

As for fontconfig, that'd be a bug in the .install script, not pacman. And you're the only one i've ever seen with that, and considering how much you screw around with your fonts, i'd put that down to pebkac.

James

Offline

#21 2006-07-05 10:51:12

test1000
Member
Registered: 2005-04-03
Posts: 834

Re: Binary Diffs for Pacman, a detailed proposal + evidence

yeah i hope he has pebkac, either way... Is someone working on this bug?


KISS = "It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience." - Albert Einstein

Offline

#22 2006-07-05 17:31:52

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Thikasabrik wrote:

1. After patching a file in the cache, sum the result.

2. append the MD5 to the filename of patch-resultant tgzs .

This way, when a user issues an upgrade/install of a package, pacman can check these files with md5s in the names alongside the rest of the cache and respond appropriately. This means verifying the file against the appended md5 in the case of an (re)install.

To reiterate: When testing for files in the cache in the hope of downloading a patch, Pacman would treat both kinds of files identically, since xdelta doesn't care about the tgz sum, only the tar sum, which remains the same.

Alternative to filename appendage: .sum file.

Well I just thought of a much cleaner solution to this problem. When makepkg creates a new pkg and old versions are available to make deltas, then it should make the patch and then apply it, using the from-patch result as the new current version. This way there will be no md5 discrepancy and no need for hackish filename appendages or different treatment of from-patch packages in the cache.

Offline

#23 2006-07-06 19:23:12

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Ok, so I noticed that makepkg is a bash script, and knocked together a patch to do that side of things....

--- ./makepkg    2006-02-02 23:40:57.000000000 +0000
+++ ./makepkg    2006-07-06 19:48:23.000000000 +0100
@@ -29,6 +29,7 @@
 BUILDSCRIPT="./PKGBUILD"
 CLEANUP=0
 CLEANCACHE=0
+DELTAS=0
 DEP_BIN=0
 DEP_SRC=0
 DEP_SUDO=0
@@ -234,6 +235,7 @@
     echo "  -B, --noccache   Do not use ccache during build"
     echo "  -c, --clean      Clean up work files after build"
     echo "  -C, --cleancache Clean up source files from the cache"
+    echo "  -D, --makedeltas Make binary patches from old versions found in the working dir"
     echo "  -d, --nodeps     Skip all dependency checks"
     echo "  -e, --noextract  Do not extract source files (use existing src/ dir)"
     echo "  -f, --force      Overwrite existing package"
@@ -272,6 +274,7 @@
         --syncdeps)   DEP_BIN=1 ;;
         --sudosync)   DEP_SUDO=1 ;;
         --builddeps)  DEP_SRC=1 ;;
+        --makedeltas) DELTAS=1 ;;
         --noccache)   NOCCACHE=1 ;;
         --nodeps)     NODEPS=1 ;;
         --noextract)  NOEXTRACT=1 ;;
@@ -291,13 +294,14 @@
             exit 1
             ;;
         -*)
-            while getopts "bBcCdefghij:mnop:rsSw:-" opt; do
+            while getopts "bBcCdDefghij:mnop:rsSw:-" opt; do
                 case $opt in
                     b) DEP_SRC=1 ;;
                     B) NOCCACHE=1 ;;
                     c) CLEANUP=1 ;;
                     C) CLEANCACHE=1 ;;
                     d) NODEPS=1 ;;
+                    D) DELTAS=1 ;;
                     e) NOEXTRACT=1 ;;
                     f) FORCE=1 ;;
                     g) GENMD5=1 ;;
@@ -774,6 +778,19 @@
 fi
 $cmd | sort >../filelist
 
+#make deltas
+if [ "$DELTAS" = "1" ]; then
+    cd $startdir
+    for oldfile in $startdir/$pkgname-*.pkg.tar.gz; do
+        namend=${oldfile#"$startdir/$pkgname-"}
+        if [ "$oldfile" != "$startdir/$pkgname-$pkgver-$pkgrel.pkg.tar.gz" ]; then
+            msg "Making delta from version ${namend%."pkg.tar.gz"}"
+            xdelta delta $oldfile $PKGDEST/$pkgname-$pkgver-$pkgrel.pkg.tar.gz $PKGDEST/$pkgname-${namend%."pkg.tar.gz"}_to_$pkgver-$pkgrel.pkg.tar.gz.delta
+            xdelta patch $PKGDEST/$pkgname-${namend%."pkg.tar.gz"}_to_$pkgver-$pkgrel.pkg.tar.gz.delta $oldfile $PKGDEST/$pkgname-$pkgver-$pkgrel.pkg.tar.gz
+        fi
+    done
+fi
+
 cd $startdir
 if [ "$CLEANUP" = "1" ]; then
     msg "Cleaning up..."

This one does the 'creates final complete pkg from delta' thing too. This version's a little inefficient in that part, since it does it as many times as patches are created, which is silly, but it works just fine for trying stuff out.

PS:Pehaps a better version would work in tar files until the final step. This would create a patch that works on tar files, so the gunziping and gziping would be done by pacman at patch time instead of by xdelta. This would allow greater flexibility and might even completely avoid the nasty md5 issue. I expect I'll investigate that soon.

PPS: I hope there's some dev interest in this..

Offline

#24 2006-07-06 19:52:48

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

And here's a pseudo-coded pacman part in line with the makepkg patch:

...in the middle of the package upgrade routine lies something new and scary....

if ($option_useDeltas = TRUE AND "/var/cache/pacman/pkg/$pkgname-$localVersion.pkg.tar.gz" exists) then

  Delta = ftp_download($REPO_BASE_URL + "$pkgname-$localVersion_to_$currentVersion.pkg.tar.gz.delta", toTMP)

  if ($Delta = the path of the download) then
    exec("xdelta patch $Delta /var/cache/pacman/pkg/$pkgname-$localVersion.pkg.tar.gz /var/cache/pacman/pkg/$pkgname-$currentVersion.pkg.tar.gz", exitOnFail)
  else
    msg("No delta from current version available, proceding with full download")
  end if
end if

...pacman checks for current version in cache, finds it, upgrades. win.

Offline

#25 2006-07-13 19:33:43

Thikasabrik
Member
Registered: 2004-02-23
Posts: 92

Re: Binary Diffs for Pacman, a detailed proposal + evidence

Well, it seems this one died off for a while.. It'd be nice to know what some devs think, now I've posted those patches. I'm away for the next week, so I won't be able to respond until I get back, but please post any comments you have.

I realise this adds some complexity (although not that much as far as I can see), but it could save a lot of bandwidth.

I have thought about working at the tar level with the patches, and I think it would increase patch-time/build-time disk usage. I may work out a way to prevent this, but until I do please consider the stuff I've posted so far.

Offline

Board footer

Powered by FluxBB