You are not logged in.

#1 2012-08-24 08:01:12

marko90
Member
Registered: 2011-12-04
Posts: 8

PkgDiff Implementation

Hello!
I'd like to request a new feature for the update process - pkgdiff. It makes difference between installed package and new one in the repository and downloads only the difference between them. Thus the amount of downloaded data is drasticaly decreased.
AFAIK, this is alredy implemented in Fedora. It would be brilliant to add this to Arch too.

Offline

#2 2012-08-24 08:11:32

jasonwryan
Anarchist
From: .nz
Registered: 2009-05-09
Posts: 30,424
Website

Re: PkgDiff Implementation

You mean like:

man pkgdelta

See pacman.conf to enable it...


Arch + dwm   •   Mercurial repos  •   Surfraw

Registered Linux User #482438

Offline

#3 2012-08-24 08:50:27

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: PkgDiff Implementation

Enabling will do nothing though...   Arch does not supply deltas.

Offline

#4 2012-08-24 09:17:30

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: PkgDiff Implementation

Can pacman chain together deltas between pairs of sequentially released packages (e.g. use "a_to_b" and "b_to_c" to generate "c" from "a"), or does it need the delta between the existing and the target package ("a_to_c" in the previous example)?


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#5 2012-08-24 09:21:09

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: PkgDiff Implementation

It can chain.  But if a_to_c is there, it will use it preferentially.

Offline

#6 2012-08-24 09:52:29

jasonwryan
Anarchist
From: .nz
Registered: 2009-05-09
Posts: 30,424
Website

Re: PkgDiff Implementation

Allan wrote:

Enabling will do nothing though...   Arch does not supply deltas.


Then why is there an option for it? Is it a planned feature?


Arch + dwm   •   Mercurial repos  •   Surfraw

Registered Linux User #482438

Offline

#7 2012-08-24 10:06:27

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: PkgDiff Implementation

jasonwryan wrote:
Allan wrote:

Enabling will do nothing though...   Arch does not supply deltas.

Then why is there an option for it? Is it a planned feature?

Arch official packages don't use it, but you can provide your own mirror that uses deltas, like archdelta did: https://bbs.archlinux.org/viewtopic.php?id=92085

Currently only one is listed (I haven't tried it):
* https://wiki.archlinux.org/index.php/Deltup
* https://wiki.archlinux.org/index.php/Mirrors#France

http://delta.archlinux.fr/ - With Delta package support. Needs xdelta3 package from extra to run.

We already use efficient xz compression (and may switch to e.g. lrzip), so https://bbs.archlinux.org/viewtopic.php … 24#p747024

Last edited by karol (2012-08-24 10:07:11)

Offline

#8 2012-08-24 10:11:47

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: PkgDiff Implementation

jasonwryan wrote:
Allan wrote:

Enabling will do nothing though...   Arch does not supply deltas.

Then why is there an option for it? Is it a planned feature?

The first release of pacman that had delta support was 2008-01-10...   Arch devs are just too lazy to use it.

Offline

#9 2012-08-24 10:12:12

jasonwryan
Anarchist
From: .nz
Registered: 2009-05-09
Posts: 30,424
Website

Re: PkgDiff Implementation

Cheers karol.

Thanks Allan.


Arch + dwm   •   Mercurial repos  •   Surfraw

Registered Linux User #482438

Offline

#10 2012-08-24 12:35:51

dolby
Member
From: 1992
Registered: 2006-08-08
Posts: 1,581

Re: PkgDiff Implementation

How cpu intensive is applying the deltas? Ive seen the same technology in action in fedora and quite frankly personally i prefer wasting bandwidth. Its less time consuming.


There shouldn't be any reason to learn more editor types than emacs or vi -- mg (1)
[You learn that sarcasm does not often work well in international forums.  That is why we avoid it. -- ewaller (arch linux forum moderator)

Offline

#11 2012-08-24 13:35:24

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: PkgDiff Implementation

This is one of the problems. Especially with large packages. You need to recompress them just to verify the signature. I think this is still unsolved.

Offline

#12 2012-08-24 13:37:54

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: PkgDiff Implementation

It is fairly cpu intensive....  The delta management stuff provided in pacman (at least in git...) puts limits on the maximum delta size compared to the package size and a minimum on the absolute size difference so there should some benefit gained when they are used.


Edit: the other issue is that it will increase our repo size by ~70%.   I'd prefer more packages instead tongue

Offline

#13 2012-08-24 22:50:42

Awebb
Member
Registered: 2010-05-06
Posts: 6,275

Re: PkgDiff Implementation

Now I imagine there won't be a delta for every possible package combination. If there is a gap of, say, three versions between your version and the latest, you would have to download four packages, verify them and install them. I imagine this could be even more wasted bandwidth in the long run.

Offline

#14 2012-08-25 01:36:21

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: PkgDiff Implementation

pacman does a calculation and just downloads the full package if it does not save you 30% bandwidth (by default).

Offline

#15 2012-08-25 02:17:47

AaronBP
Member
Registered: 2012-08-06
Posts: 149
Website

Re: PkgDiff Implementation

Allan wrote:

It is fairly cpu intensive....  The delta management stuff provided in pacman (at least in git...) puts limits on the maximum delta size compared to the package size and a minimum on the absolute size difference so there should some benefit gained when they are used.


Edit: the other issue is that it will increase our repo size by ~70%.   I'd prefer more packages instead tongue

Would it be worthwhile to limit yourself to the largest packages? Say packages larger than 50 or 100M?

Offline

#16 2012-08-25 12:21:29

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: PkgDiff Implementation

AaronBP wrote:
Allan wrote:

It is fairly cpu intensive....  The delta management stuff provided in pacman (at least in git...) puts limits on the maximum delta size compared to the package size and a minimum on the absolute size difference so there should some benefit gained when they are used.


Edit: the other issue is that it will increase our repo size by ~70%.   I'd prefer more packages instead tongue

Would it be worthwhile to limit yourself to the largest packages? Say packages larger than 50 or 100M?

What do you mean by 'worthwhile'? It's a trade-off and only you can tell if it makes sense for you.
If you are using a slow or capped Internet, you may want to give it a go.

Offline

#17 2012-08-25 16:14:28

AaronBP
Member
Registered: 2012-08-06
Posts: 149
Website

Re: PkgDiff Implementation

karol wrote:
AaronBP wrote:
Allan wrote:

It is fairly cpu intensive....  The delta management stuff provided in pacman (at least in git...) puts limits on the maximum delta size compared to the package size and a minimum on the absolute size difference so there should some benefit gained when they are used.


Edit: the other issue is that it will increase our repo size by ~70%.   I'd prefer more packages instead tongue

Would it be worthwhile to limit yourself to the largest packages? Say packages larger than 50 or 100M?

What do you mean by 'worthwhile'? It's a trade-off and only you can tell if it makes sense for you.
If you are using a slow or capped Internet, you may want to give it a go.

Worthwhile for the mirrors.

Offline

#18 2012-08-25 18:33:58

Mr.Elendig
#archlinux@freenode channel op
From: The intertubes
Registered: 2004-11-07
Posts: 4,092

Re: PkgDiff Implementation

Deltas would also probably be somewhat more useful on fixed/hybrid release distroes, where you don't have as many variants in package versions. Aka you don't have 100 users with 10 different versions of some package, wanting to upgrade to any single version.

Last edited by Mr.Elendig (2012-08-25 18:34:19)


Evil #archlinux@libera.chat channel op and general support dude.
. files on github, Screenshots, Random pics and the rest

Offline

#19 2012-08-26 19:59:09

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: PkgDiff Implementation

Before changing his mind, Xyne wrote:
Mr.Elendig wrote:

Aka you don't have 100 users with 10 different versions of some package, wanting to upgrade to any single version.

You don't really have to support them.  Usually a package does not receive more than 1 update per week (in the non-testing repos at least), which is ample time for most users to upgrade. When a new package is uploaded, the release scripts would only need to generate deltas between it and the current release. All other combinations could be ignored. The added albeit unreliable bonus of this approach is that ARM would likely track those deltas and allow latecomers to chain them together, although there would obviously be a point where too many deltas would be needed to make it worthwhile.

The only real problem with this approach would be closely timed package release bumps, e.g. foo-2-2 (c) released the same day as foo-2-1 (b), which is an upgrade of foo-1-1 (a). The release of b would have generated a delta for a -> b (ab). Users who have already upgraded to b that day will want b -> c (bc) while those who haven't will want a -> c (ac).

Of course, generating ac in that case would require the server to still have a in the cache, which it presumably wouldn't due to it having been replace in the repos with b.


Actually, now that I'm thinking about it, you could simply implement a policy of generating deltas between the current version and the new version for each uploaded package. The deltas could be retained for x days and then purged.

Benefits:

  • Most users will update within x days so the saved bandwidth should offset the bandwidth cost of mirroring the deltas.

  • Deltas will be purged after x days, so at any given point there will only be a small percentage of possible deltas. Normally this should not require much storage space. Even when it does, it will only require it for x days.

  • Keeping deltas for x days will avoid the issues described above for rapid releases. Chainable deltas will persist together. Even if they don't yield any benefits over a normal download, the release scripts would not need to take them into consideration, which eliminates script complexity, i.e. the script does not need to be aware of release history.

The release scripts could even check how much bandwidth the delta would save and delete it immediately if it is not considered enough to be worth it.

Finally, if server CPU and memory overhead is an issue (hopefully not, given current package update intervals), pkgtools could be modified to generate the deltas locally on the submitter's computer (along with threshold checks to determine if the delta is worth uploading). Signature files are already detected, so deltas shouldn't require much more effort to upload.

edit
Follow-up question: does Pacman support signed deltas?

Last edited by Xyne (2012-08-26 20:06:56)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#20 2012-08-28 03:36:05

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: PkgDiff Implementation

Xyne wrote:

Follow-up question: does Pacman support signed deltas?

The deltas are not signed, but the reconstituted package is...

Offline

#21 2012-08-29 18:51:57

AaronBP
Member
Registered: 2012-08-06
Posts: 149
Website

Re: PkgDiff Implementation

AaronBP wrote:
karol wrote:
AaronBP wrote:

Would it be worthwhile to limit yourself to the largest packages? Say packages larger than 50 or 100M?

What do you mean by 'worthwhile'? It's a trade-off and only you can tell if it makes sense for you.
If you are using a slow or capped Internet, you may want to give it a go.

Worthwhile for the mirrors.

Oh! Here's a great example!

How many assets in wesnoth-data actually changed? Probably not a whole lot, right? It's probably more worthwhile to generate deltas for large packages with just a few files changed than every single package, many of which are already <1M.

Offline

#22 2012-08-29 18:58:52

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: PkgDiff Implementation

AaronBP wrote:

Oh! Here's a great example!

How many assets in wesnoth-data actually changed? Probably not a whole lot, right? It's probably more worthwhile to generate deltas for large packages with just a few files changed than every single package, many of which are already <1M.

Yes it does make a difference. Why don't you just use the already mentioned delta-mirror at archlinux.fr?

wesnoth-data-1.10.3-1_to_1.10.4-1-any.delta	2012-Aug-29 15:22:17	1.8M	application/octet-stream
wesnoth-data-1.10.4-1-any.pkg.tar.xz	2012-Aug-29 10:00:36	297.1M	application/octet-stream
wesnoth-data-1.10.4-1-any.pkg.tar.xz.sig	2012-Aug-29 10:00:36	0.5K	application/pgp-signature

| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Online

#23 2012-08-29 21:31:40

AaronBP
Member
Registered: 2012-08-06
Posts: 149
Website

Re: PkgDiff Implementation

progandy wrote:
AaronBP wrote:

Oh! Here's a great example!

How many assets in wesnoth-data actually changed? Probably not a whole lot, right? It's probably more worthwhile to generate deltas for large packages with just a few files changed than every single package, many of which are already <1M.

Yes it does make a difference. Why don't you just use the already mentioned delta-mirror at archlinux.fr?

wesnoth-data-1.10.3-1_to_1.10.4-1-any.delta	2012-Aug-29 15:22:17	1.8M	application/octet-stream
wesnoth-data-1.10.4-1-any.pkg.tar.xz	2012-Aug-29 10:00:36	297.1M	application/octet-stream
wesnoth-data-1.10.4-1-any.pkg.tar.xz.sig	2012-Aug-29 10:00:36	0.5K	application/pgp-signature

Sure. It's not much of an issue for end users, though, except maybe if you have dial-up or satellite. It seems like it could save a good bit of bandwidth for the servers, without being as much work or taking up as much storage space as trying to maintain deltas for every package. I'm not sure if it's a big deal either way.

Offline

#24 2012-08-31 03:27:33

TheSaint
Member
From: my computer
Registered: 2007-08-19
Posts: 1,523

Re: PkgDiff Implementation

I don't know anything how  the delta works and all of the rest
My idea:

  • each package database contains a file list with and every x file version

  • repository should be structured like a linux system and diveded into directories like a real system.
    Every directory will contain a list of compressed file and named with its own version, for (say) a x number of version

  • pacman should act a difference from local to repository and ask for the different files only

  • then pacman should download in a parallel manner in order to reduce the lag between each package

  • later will reconstuct the file on their own position locally. In case backing up the respective old one
    Then downgrading would be a matter to reset old version on their previous place.

This mode split down packages into their smallest component and then moving those parts that are really changed. Furthermore one can also have the chance to download only his/hers own language without to move all unnecessary one.

The only negative aspect that I can see it's the repository design which should grow slightly more that simply containing packages which might contain parts that 80% of the user will never look at.


do it good first, it will be faster than do it twice the saint wink

Offline

#25 2012-08-31 03:37:05

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: PkgDiff Implementation

Some other nifty ideas: http://nixos.org/nix/

Offline

Board footer

Powered by FluxBB