You are not logged in.
Hello!
I'd like to request a new feature for the update process - pkgdiff. It makes difference between installed package and new one in the repository and downloads only the difference between them. Thus the amount of downloaded data is drasticaly decreased.
AFAIK, this is alredy implemented in Fedora. It would be brilliant to add this to Arch too.
Offline
You mean like:
man pkgdelta
See pacman.conf to enable it...
Offline
Enabling will do nothing though... Arch does not supply deltas.
Offline
Can pacman chain together deltas between pairs of sequentially released packages (e.g. use "a_to_b" and "b_to_c" to generate "c" from "a"), or does it need the delta between the existing and the target package ("a_to_c" in the previous example)?
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
It can chain. But if a_to_c is there, it will use it preferentially.
Offline
Enabling will do nothing though... Arch does not supply deltas.
Then why is there an option for it? Is it a planned feature?
Offline
Allan wrote:Enabling will do nothing though... Arch does not supply deltas.
Then why is there an option for it? Is it a planned feature?
Arch official packages don't use it, but you can provide your own mirror that uses deltas, like archdelta did: https://bbs.archlinux.org/viewtopic.php?id=92085
Currently only one is listed (I haven't tried it):
* https://wiki.archlinux.org/index.php/Deltup
* https://wiki.archlinux.org/index.php/Mirrors#France
http://delta.archlinux.fr/ - With Delta package support. Needs xdelta3 package from extra to run.
We already use efficient xz compression (and may switch to e.g. lrzip), so https://bbs.archlinux.org/viewtopic.php … 24#p747024
Last edited by karol (2012-08-24 10:07:11)
Offline
Allan wrote:Enabling will do nothing though... Arch does not supply deltas.
Then why is there an option for it? Is it a planned feature?
The first release of pacman that had delta support was 2008-01-10... Arch devs are just too lazy to use it.
Offline
Cheers karol.
Thanks Allan.
Offline
How cpu intensive is applying the deltas? Ive seen the same technology in action in fedora and quite frankly personally i prefer wasting bandwidth. Its less time consuming.
There shouldn't be any reason to learn more editor types than emacs or vi -- mg (1)
[You learn that sarcasm does not often work well in international forums. That is why we avoid it. -- ewaller (arch linux forum moderator)
Offline
This is one of the problems. Especially with large packages. You need to recompress them just to verify the signature. I think this is still unsolved.
Offline
It is fairly cpu intensive.... The delta management stuff provided in pacman (at least in git...) puts limits on the maximum delta size compared to the package size and a minimum on the absolute size difference so there should some benefit gained when they are used.
Edit: the other issue is that it will increase our repo size by ~70%. I'd prefer more packages instead
Offline
Now I imagine there won't be a delta for every possible package combination. If there is a gap of, say, three versions between your version and the latest, you would have to download four packages, verify them and install them. I imagine this could be even more wasted bandwidth in the long run.
Offline
pacman does a calculation and just downloads the full package if it does not save you 30% bandwidth (by default).
Offline
It is fairly cpu intensive.... The delta management stuff provided in pacman (at least in git...) puts limits on the maximum delta size compared to the package size and a minimum on the absolute size difference so there should some benefit gained when they are used.
Edit: the other issue is that it will increase our repo size by ~70%. I'd prefer more packages instead
Would it be worthwhile to limit yourself to the largest packages? Say packages larger than 50 or 100M?
Offline
Allan wrote:It is fairly cpu intensive.... The delta management stuff provided in pacman (at least in git...) puts limits on the maximum delta size compared to the package size and a minimum on the absolute size difference so there should some benefit gained when they are used.
Edit: the other issue is that it will increase our repo size by ~70%. I'd prefer more packages instead
Would it be worthwhile to limit yourself to the largest packages? Say packages larger than 50 or 100M?
What do you mean by 'worthwhile'? It's a trade-off and only you can tell if it makes sense for you.
If you are using a slow or capped Internet, you may want to give it a go.
Offline
AaronBP wrote:Allan wrote:It is fairly cpu intensive.... The delta management stuff provided in pacman (at least in git...) puts limits on the maximum delta size compared to the package size and a minimum on the absolute size difference so there should some benefit gained when they are used.
Edit: the other issue is that it will increase our repo size by ~70%. I'd prefer more packages instead
Would it be worthwhile to limit yourself to the largest packages? Say packages larger than 50 or 100M?
What do you mean by 'worthwhile'? It's a trade-off and only you can tell if it makes sense for you.
If you are using a slow or capped Internet, you may want to give it a go.
Worthwhile for the mirrors.
Offline
Deltas would also probably be somewhat more useful on fixed/hybrid release distroes, where you don't have as many variants in package versions. Aka you don't have 100 users with 10 different versions of some package, wanting to upgrade to any single version.
Last edited by Mr.Elendig (2012-08-25 18:34:19)
Evil #archlinux@libera.chat channel op and general support dude.
. files on github, Screenshots, Random pics and the rest
Offline
Mr.Elendig wrote:Aka you don't have 100 users with 10 different versions of some package, wanting to upgrade to any single version.
You don't really have to support them. Usually a package does not receive more than 1 update per week (in the non-testing repos at least), which is ample time for most users to upgrade. When a new package is uploaded, the release scripts would only need to generate deltas between it and the current release. All other combinations could be ignored. The added albeit unreliable bonus of this approach is that ARM would likely track those deltas and allow latecomers to chain them together, although there would obviously be a point where too many deltas would be needed to make it worthwhile.
The only real problem with this approach would be closely timed package release bumps, e.g. foo-2-2 (c) released the same day as foo-2-1 (b), which is an upgrade of foo-1-1 (a). The release of b would have generated a delta for a -> b (ab). Users who have already upgraded to b that day will want b -> c (bc) while those who haven't will want a -> c (ac).
Of course, generating ac in that case would require the server to still have a in the cache, which it presumably wouldn't due to it having been replace in the repos with b.
Actually, now that I'm thinking about it, you could simply implement a policy of generating deltas between the current version and the new version for each uploaded package. The deltas could be retained for x days and then purged.
Benefits:
Most users will update within x days so the saved bandwidth should offset the bandwidth cost of mirroring the deltas.
Deltas will be purged after x days, so at any given point there will only be a small percentage of possible deltas. Normally this should not require much storage space. Even when it does, it will only require it for x days.
Keeping deltas for x days will avoid the issues described above for rapid releases. Chainable deltas will persist together. Even if they don't yield any benefits over a normal download, the release scripts would not need to take them into consideration, which eliminates script complexity, i.e. the script does not need to be aware of release history.
The release scripts could even check how much bandwidth the delta would save and delete it immediately if it is not considered enough to be worth it.
Finally, if server CPU and memory overhead is an issue (hopefully not, given current package update intervals), pkgtools could be modified to generate the deltas locally on the submitter's computer (along with threshold checks to determine if the delta is worth uploading). Signature files are already detected, so deltas shouldn't require much more effort to upload.
edit
Follow-up question: does Pacman support signed deltas?
Last edited by Xyne (2012-08-26 20:06:56)
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
Follow-up question: does Pacman support signed deltas?
The deltas are not signed, but the reconstituted package is...
Offline
karol wrote:AaronBP wrote:Would it be worthwhile to limit yourself to the largest packages? Say packages larger than 50 or 100M?
What do you mean by 'worthwhile'? It's a trade-off and only you can tell if it makes sense for you.
If you are using a slow or capped Internet, you may want to give it a go.Worthwhile for the mirrors.
How many assets in wesnoth-data actually changed? Probably not a whole lot, right? It's probably more worthwhile to generate deltas for large packages with just a few files changed than every single package, many of which are already <1M.
Offline
How many assets in wesnoth-data actually changed? Probably not a whole lot, right? It's probably more worthwhile to generate deltas for large packages with just a few files changed than every single package, many of which are already <1M.
Yes it does make a difference. Why don't you just use the already mentioned delta-mirror at archlinux.fr?
wesnoth-data-1.10.3-1_to_1.10.4-1-any.delta 2012-Aug-29 15:22:17 1.8M application/octet-stream
wesnoth-data-1.10.4-1-any.pkg.tar.xz 2012-Aug-29 10:00:36 297.1M application/octet-stream
wesnoth-data-1.10.4-1-any.pkg.tar.xz.sig 2012-Aug-29 10:00:36 0.5K application/pgp-signature
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
AaronBP wrote:How many assets in wesnoth-data actually changed? Probably not a whole lot, right? It's probably more worthwhile to generate deltas for large packages with just a few files changed than every single package, many of which are already <1M.
Yes it does make a difference. Why don't you just use the already mentioned delta-mirror at archlinux.fr?
wesnoth-data-1.10.3-1_to_1.10.4-1-any.delta 2012-Aug-29 15:22:17 1.8M application/octet-stream wesnoth-data-1.10.4-1-any.pkg.tar.xz 2012-Aug-29 10:00:36 297.1M application/octet-stream wesnoth-data-1.10.4-1-any.pkg.tar.xz.sig 2012-Aug-29 10:00:36 0.5K application/pgp-signature
Sure. It's not much of an issue for end users, though, except maybe if you have dial-up or satellite. It seems like it could save a good bit of bandwidth for the servers, without being as much work or taking up as much storage space as trying to maintain deltas for every package. I'm not sure if it's a big deal either way.
Offline
I don't know anything how the delta works and all of the rest
My idea:
each package database contains a file list with and every x file version
repository should be structured like a linux system and diveded into directories like a real system.
Every directory will contain a list of compressed file and named with its own version, for (say) a x number of version
pacman should act a difference from local to repository and ask for the different files only
then pacman should download in a parallel manner in order to reduce the lag between each package
later will reconstuct the file on their own position locally. In case backing up the respective old one
Then downgrading would be a matter to reset old version on their previous place.
This mode split down packages into their smallest component and then moving those parts that are really changed. Furthermore one can also have the chance to download only his/hers own language without to move all unnecessary one.
The only negative aspect that I can see it's the repository design which should grow slightly more that simply containing packages which might contain parts that 80% of the user will never look at.
do it good first, it will be faster than do it twice the saint
Offline
Some other nifty ideas: http://nixos.org/nix/
Offline