You are not logged in.

#1 2004-01-02 17:41:46

gumbo
Member
Registered: 2003-10-18
Posts: 24

Binary diffs of updated packages?

I think this could be a great bandwidth saver for the servers, as well as those of us on dialup.  Programs to do binary diffs (xdelta.sf.net is one, I think) are available.

Downloading 60MB or whatever for an Xfree update when a diff would have been 1 or 2 MB just seems silly.  I realize it might take a bit of time to generate the diff, but not having to use it on many packages (do you really want a diff of a 100K package?) would cut down on the percieved time a lot.

Offline

#2 2004-01-10 19:26:34

sarah31
Member
From: Middle of Canada
Registered: 2002-08-20
Posts: 2,975
Website

Re: Binary diffs of updated packages?

i kinda like this idea but i am not too sure how this works in a binary distro. could you elaborate with examples.

i like the idea.


AKA uknowme

I am not your friend

Offline

#3 2004-01-10 20:14:11

zen_guerrilla
Member
From: Greece
Registered: 2002-12-22
Posts: 259

Re: Binary diffs of updated packages?

I've used xdelta before in sorcerer & it's a great saver for bandwidth, however it's a damn cpu&disk-eating beast on the server (aka the box that creates xdelta patches) side, so I'm not sure Judd would like to use it on archlinux.org...

Offline

#4 2004-01-10 20:25:42

apeiro
Daddy
From: Victoria, BC, Canada
Registered: 2002-08-12
Posts: 771
Website

Re: Binary diffs of updated packages?

Dale did some testing on this and found that it took gobs of cpu time, but didn't save us much on space.

http://bugs.archlinux.org/index.php?do=details&id=313

Offline

#5 2004-01-10 20:39:48

zen_guerrilla
Member
From: Greece
Registered: 2002-12-22
Posts: 259

Re: Binary diffs of updated packages?

apeiro wrote:

Dale did some testing on this and found that it took gobs of cpu time, but didn't save us much on space.

Sorcerer used it for source tarballs (it's a source-based distro after all smile) and I remember I could fetch the whole kde enchilada in just 2 mb's in xdelta patches...

Offline

#6 2004-01-11 15:09:38

gumbo
Member
Registered: 2003-10-18
Posts: 24

Re: Binary diffs of updated packages?

I think I can answer eveyone's questions in one post.

sarah: The idea is for very large packages (KDE, GNOME, XFree86, etc) a binary patch is provided on the server.  Think of it like a patch for source, but it patches the file you have already downloaded (say Xfree 4.0) to the newer version (Xfree 4.0.1).  This change may only be 100K distributed throughout the download, and by providing a patch (and checking in pacman if getting the patch is possible), you decrease the used bandwidth especially for people that upgrade often.

zen_guerrilla: I wasn't really meaning on the server, more as a final step in making the package for upload.  However, this would require the package maintainers to have a copy of the last uploaded package on their hard drive, if it is not already required by the build system (I don't know).

apeiro: Hmm, those results don't look very promising. sad  However I don't see a program used to generate the patches.  I'll do some looking for a suitable package.  Also, I thought TrollTech (QT) used this, but now I can't find any mention of it on their site.

Offline

#7 2004-01-11 22:32:10

andy
Member
From: Germany
Registered: 2002-10-11
Posts: 374

Re: Binary diffs of updated packages?

I think (as did Xentac) there is a problem in the tests performed and posted a follow up in the bug tracker.

Offline

#8 2004-01-18 05:30:05

gumbo
Member
Registered: 2003-10-18
Posts: 24

Re: Binary diffs of updated packages?

Using BSDiff, I can get a 28K patch out of a 368K package.

Packages used:
alsa-lib-1.0.0rc2-1
alsa-lib-1.0.1-1

Both packages were uncompressed into tars and then fed into bsdiff.  Time was about 10 seconds on an 800Mhz Duron.

The problem is that bsdiff uses a ton of memory - I tried to do a diff of python-2.3.2 against alsa-lib-1.0.1 (to see memory usage), and ran out.  The box only has 96MB of memory available, and about half was full, so a better equipped box should be able to do this fine.

If anyone has larger packages to test (I thought I did, but I can't find my older copies anywhere), the results (diskspace-wise) look promising.

Offline

#9 2004-01-19 12:44:11

dariball
Member
From: Germany - Frankfurt
Registered: 2002-10-20
Posts: 118
Website

Re: Binary diffs of updated packages?

personally, i wouldn't like to see binary diffs, because I don't really trust them smile


but i did some research on this point... AND made a pkg for bsdiff which can be found here....


but now some infos:

[dariball@meissna pkg]# time bsdiff mysql-4.0.13-3.pkg.tar mysql-4.0.17-2.pkg.tar mysql-diff.diff

real    0m45.790s
user    0m33.026s
sys     0m0.661s

[dariball@meissna pkg]# ls -lh mysql-*
-rw-r--r--    1 root     root         7.3M Jun 28  2003 mysql-4.0.13-3.pkg.tar
-rw-r--r--    1 root     root         8.8M Jan 12 18:20 mysql-4.0.17-2.pkg.tar
-rw-r--r--    1 root     root         1.5M Jan 19 13:40 mysql-diff.diff

all done on an athlon 800 with 256MB RAM,
for me this looks pretty nice, as the mysql-diff.diff is only 1.5MB in size...
the diff process took 26.4 percent of my ram, what is about 53MBs, what seems quite alot.... and of course nearly anything of my cpu time

any other opinions about that?


nothing,
maybe I have a perfect signature _someday_

Offline

#10 2004-01-19 18:33:34

contrasutra
Member
From: New Jersey
Registered: 2003-07-26
Posts: 507

Re: Binary diffs of updated packages?

Since I filed the bug report, I obviously want it.

I think it should be a choice. You can choose to use a diff or just normal packages.

On dialup, every little bit helps.


Or we could help solve the problem by using .bz2 (w/ max compression) instead of .gz for the packages. This is less likely to happen though.


"Contrary to popular belief, penguins are not the salvation of modern technology.  Neither do they throw parties for the urban proletariat."

Offline

#11 2004-01-19 18:39:36

sarah31
Member
From: Middle of Canada
Registered: 2002-08-20
Posts: 2,975
Website

Re: Binary diffs of updated packages?

i am not on dial up but is support  it too because i personally find it annoying to download something like OO or X when all that has been done to it was a 1kb patch. 

that being said it does not sound troo feasible right now because i would not want some memory consumtive upgrade even if it is only temporary.


AKA uknowme

I am not your friend

Offline

#12 2004-01-19 18:49:57

dariball
Member
From: Germany - Frankfurt
Registered: 2002-10-20
Posts: 118
Website

Re: Binary diffs of updated packages?

if i take a objective look at it i have to say it would be way better....

but is it _really_ depndable? won't it maybe break the package??


nothing,
maybe I have a perfect signature _someday_

Offline

#13 2004-01-19 23:04:34

gumbo
Member
Registered: 2003-10-18
Posts: 24

Re: Binary diffs of updated packages?

I just did a rather unscientific test of taring random files, adding some files, making a diff, and patching the first file.  Sizes ranged from a few hundred K (packages I had multiple versions of) to the entire GNOME and XFree86 installs.  Everything I tried gave bit-identical results.

I think it should certaintly be made available - if a user is on broadband and doesn't want to risk a bad patch, download the entire thing.

I wonder if the memory requirements could be reduced by freeing and allocating buffers only when necessary, though I can't imagne the original author not omptimizing somewhat for memory usage.

contrastutra: Bzip compression didn't really reduce file sizes compared to gzip in my tests - not enough to be worth it anyway.  Patches save multi-megabytes of downloading, and can be applied sequentially so a person  with package-1.2.3 could upgrade to 2.3.4 by following the patch trail (assuming they all exist).  Patches + higher compession would yield the highest benfit, at the expense of time.   How much time is it worth to save uploading 1 MB?  10MB? 50?

Offline

#14 2004-01-20 00:18:58

dp
Member
From: Zürich, Switzerland
Registered: 2003-05-27
Posts: 3,363
Website

Re: Binary diffs of updated packages?

dont want to be negative --- the idea of using bin-diffs is really great --- but i see some (solvable!) problem (in realisation):

=> if you want to use something like a diff, you must have the previous pkg in your cache --- this is hdd-space-consuming! //-is this right what i think of the principle?

some of you made a some statistics with smaller packages ... what i see is not bad // average of 80-90% savings of space could be possible --- what i'm interessted in: anyone wants to try it on OO or xfree86? this are the 2 packages i waited downloading several times

bin-diff for pacman would be a really great enhancement, but i'm for keeping the standard functionality (just to make sure, arch is still running, if something goes wrong with diffing wink )

here a procedure, that would be really nice for pacman using this idea:

0) check if old package exists in cache

[yes]

1) download a checksum of the new package from the server
2) download the diff of the packages (cache <-> new)
3) patch the cached package
4) check checksum
5) if(ok){upgrade}else{download the new package the regular way and upgrade}

[no]

1) download the new package (like it is now)
2) upgrade

something like this would make upgrading in most cases faster ... and the other cases pacman would react as it does now

TODO [++]:bin-diff-package-catcher-module ?


The impossible missions are the only ones which succeed.

Offline

#15 2004-01-20 00:42:27

contrasutra
Member
From: New Jersey
Registered: 2003-07-26
Posts: 507

Re: Binary diffs of updated packages?

Well, when I filed the bug report, I did not have "binary diff" packages in mind.

What I wanted, was something that would compare the files inside two packages, and create a package that only had new/modified files.

So it wouldnt give you a diff of each binary, just the complete file, only changed files though. This would be for -1, -2 packages.

Because normally, those are only 1 or 2 files that have to be changed. For example, you screw up a config file in the XF86 package, you only have to create a "patch" package containing that one config file. That's where it would be useful.

Basically, this patch package would up the version count in the db, and overwrite the installed files with the ones in the package.


This wouldn't be used as much, but a life saver in small changes.


"Contrary to popular belief, penguins are not the salvation of modern technology.  Neither do they throw parties for the urban proletariat."

Offline

#16 2004-01-20 01:23:41

Xentac
Forum Fellow
From: Victoria, BC
Registered: 2003-01-17
Posts: 1,797
Website

Re: Binary diffs of updated packages?

Another thing you'll want to think about: What happens when the accumulated patch size (you have to apply them all in order) gets larger than the total package size?  It happens with new SuSE installs and they handle it by downloading all the patches...


I have discovered that all of mans unhappiness derives from only one source, not being able to sit quietly in a room
- Blaise Pascal

Offline

#17 2004-01-20 01:59:06

Dusty
Schwag Merchant
From: Medicine Hat, Alberta, Canada
Registered: 2004-01-18
Posts: 5,986
Website

Re: Binary diffs of updated packages?

Xentac wrote:

Another thing you'll want to think about: What happens when the accumulated patch size (you have to apply them all in order) gets larger than the total package size?

That's another good reason to allow the user to choose. An intelligent package manager would probably be able to decide whether to download the full version or the patches, depending on things like time to download patches, time to apply patches, time to download full version, etc.

I really like the idea. I don't know if it applies to Arch, but there is certainly room for a binary-patch based distro on the market.

Dusty

Offline

#18 2004-01-20 03:53:15

gumbo
Member
Registered: 2003-10-18
Posts: 24

Re: Binary diffs of updated packages?

contrasutra wrote:

Well, when I filed the bug report, I did not have "binary diff" packages in mind.

What I wanted, was something that would compare the files inside two packages, and create a package that only had new/modified files.

So it wouldnt give you a diff of each binary, just the complete file, only changed files though. This would be for -1, -2 packages.

Because normally, those are only 1 or 2 files that have to be changed. For example, you screw up a config file in the XF86 package, you only have to create a "patch" package containing that one config file. That's where it would be useful.

Basically, this patch package would up the version count in the db, and overwrite the installed files with the ones in the package.


This wouldn't be used as much, but a life saver in small changes.

That requires two complete directory trees to do correctly.  It would be simple though - if a file exists in both trees, check md5sum (or other change determining method).  If changed, add to archive.  If file is only in new tree, add to archive.  If file is only in old tree, add to file of files that need to be deleted.  If a directory is missing (either way), all the files inside that directory are automatically marked.

I wonder how much CPU time that would be though.

Offline

Board footer

Powered by FluxBB