You are not logged in.

#76 2010-05-23 16:32:51

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

>> today's sync was about 300mb
> that means 300mb was downloaded today in order to sync the mirror from yesterday
<facepalm> I somehow misread 'sync' as 'download'. Need more coffee.
That would mean 5 mins of cpu time to create the deltas assuming the speed of 1MB of xz-compressed per second. It would take a while to download the packages and upload the deltas: I have 1840 kbps/240 kbps here.

> they could even register an email address or something so they could be notified that packages they depend on may be removed soon
I don't get it - if I need sth, I grab it *now* (I don't have quotas on my broadband). I'm not saying no to an archive so I can grab old versions of packages I just started to use - a year ago I didn't know they existed, now I know and love them but the last two versions have some nasty bugs so I need older ones. Yes, I know I can roll my own but ARM could help avoid that ultimate move.

> upstream is irrelevant here i think, because we care about the state of the packages as they appear in Arch repos
I disagree. ARM has
xdelta3-3.0w-1-i686.pkg.tar.gz
xdelta3-3.0y-1-i686.pkg.tar.xz
but I need an older version - where can I easily get it? If you keep the last 4 versions - even if they are years old - that could help in such situations. Remember that no one can force an unpaid open source dev to fix the bugs asap. Storage is cheap and with xdeltas you need less of it at the cost of added complexity: keep the xz packages or deltas, remove packages older than 6 months but exclude from the purging the last four versions etc.
Yes, that would mean that ARM becomes a bit of an Arch-ive(tm)
I'm not sure how it fits into scripted / pacman based downgrading: can I still point to '2009/11/01/core/os/i686' and get all then-current packages?

> there's no point in storing deltas for the kernel when it only achieves 1mb or 2mb saving

-rw-r--r-- 1 22M May 23 00:39 kernel26-2.6.33.3-1-i686.pkg.tar.xz
-rw-r--r-- 1 22M May 23 00:37 kernel26-2.6.33.3-2-i686.pkg.tar.xz
-rw-r--r-- 1 22M May 23 00:38 kernel26-2.6.33.4-1-i686.pkg.tar.xz

xdelta3 -e -9 -f -S djw -s kernel26-2.6.33.3-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time: 43 sec
size: 6.3 MB

xdelta3 -e -9 -f -s kernel26-2.6.33.3-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time: 40 sec
size: 6.9 MB

xdelta3 -e -1 -f -S djw -s kernel26-2.6.33.3-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time: 25 sec
size: 6.9 MB

xdelta3 -e -1 -f -s kernel26-2.6.33.3-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time: 23 sec
size: 7.5 MB

The second and third commands produce deltas of the same size, but with lower compression you can do it much faster. '-S djw' option saves you about 10% time but at a penalty of about 10% larger delta.

xz-compressed kernels take up 22 MB, the deltas 7 MB.


I'll post more later.

Offline

#77 2010-05-23 17:10:22

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

Backward xdeltas

xdelta3 -e -9 -f -S djw -s kernel26-2.6.33.4-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time:  48 sec
size:  7.6 MB

xdelta3 -e -9 -f -s kernel26-2.6.33.4-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time:  47 sec
size:  8.4 MB

xdelta3 -e -1 -f -S djw -s kernel26-2.6.33.4-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time:  28 sec
size:  8.6 MB

xdelta3 -e -1 -f -s kernel26-2.6.33.4-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time:  26 sec
size: 9.5 MB

Forward xdeltas

xdelta3 -e -9 -f -S djw -s kernel26-2.6.33.3-2-i686.pkg.tar.xz kernel26-2.6.33.4-1-i686.pkg.tar.xz 1.xd3
time: 42 sec
size: 5.7 MB

xdelta3 -e -9 -f -s kernel26-2.6.33.3-2-i686.pkg.tar.xz kernel26-2.6.33.4-1-i686.pkg.tar.xz 1.xd3
time: 40 sec
size: 6.3 MB

xdelta3 -e -1 -f -S djw -s kernel26-2.6.33.3-2-i686.pkg.tar.xz kernel26-2.6.33.4-1-i686.pkg.tar.xz 1.xd3
time: 25 sec
size: 6.8 MB

xdelta3 -e -1 -f -s kernel26-2.6.33.3-2-i686.pkg.tar.xz kernel26-2.6.33.4-1-i686.pkg.tar.xz 1.xd3
time: 24 sec
size: 7.4 MB

Seems that my assumption that forward xdeltas take the same time and space as backward xdeltas doesn't hold :-(


xdelta3 -e -1 -S djw -s kernel26-2.6.33.4-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz E.xd3
time: 27 sec
size: 8.6 MB

xdelta3 -e -1 -S djw -s kernel26-2.6.33.3-2-i686.pkg.tar.xz kernel26-2.6.33.3-1-i686.pkg.tar.xz D.xd3
time: 24 sec
size: 5.3 MB

xdelta3 -e -1 -S djw -s kernel26-2.6.33.3-1-i686.pkg.tar.xz kernel26-2.6.33.2-1-i686.pkg.tar.xz C.xd3
time: 44 sec
size: 26 MB

xdelta3 -e -1 -S djw -s kernel26-2.6.33.2-1-i686.pkg.tar.xz kernel26-2.6.32.10-1-i686.pkg.tar.xz B.xd3
time: 44 sec
size: 27 MB

xdelta3 -e -1 -S djw -s kernel26-2.6.32.10-1-i686.pkg.tar.xz kernel26-2.6.32.9-1-i686.pkg.tar.xz A.xd3
time: 26 sec
size: 9 MB

If we 'skip' B.xd3 and C.xd3 and get a CB.xd3 it doesn't help much - it would be 27 MB big.
If on avg a delta is 1/n of the size of the xz-compressed package, getting n deltas is a chore from the user's perspective - the download is as big as the regular package and he has to create the via xdelta merge.
xz decompression takes about 10s in the above examples.


Coming up next: merging.
Update: Bad news - I get segmentation faults on every "reverse" delta merge and on half forward ones.
I've tried the current version of xdetla3, some previous ones - still no go.

Seems that xdelta is best suited for big packages with relatively minor changes between versions.
Updating is great, downgrading is good, but getting n versions back or forward (for n>1) is painful if not done in a single jump otherwise you have to recreate every version until you get to the one you need.

Last edited by karol (2010-05-23 21:13:20)

Offline

#78 2010-05-24 07:27:02

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

> that means 300mb was downloaded today in order to sync the mirror from yesterday
<facepalm> I somehow misread 'sync' as 'download'. Need more coffee.
That would mean 5 mins of cpu time to create the deltas assuming the speed of 1MB of xz-compressed per second. It would take a while to download the packages and upload the deltas: I have 1840 kbps/240 kbps here.

if this works i don't mind doing, since the the repo is never going to be delta-only there is no urgency for the deltas to be uploaded. i have 16mbps/768kbps and comparision to your times i can do them twice as fast.

> upstream is irrelevant here i think, because we care about the state of the packages as they appear in Arch repos
I disagree. ARM has
xdelta3-3.0w-1-i686.pkg.tar.gz
xdelta3-3.0y-1-i686.pkg.tar.xz
but I need an older version - where can I easily get it? If you keep the last 4 versions - even if they are years old - that could help in such situations. Remember that no one can force an unpaid open source dev to fix the bugs asap. Storage is cheap and with xdeltas you need less of it at the cost of added complexity: keep the xz packages or deltas, remove packages older than 6 months but exclude from the purging the last four versions etc.
Yes, that would mean that ARM becomes a bit of an Arch-ive(tm)
I'm not sure how it fits into scripted / pacman based downgrading: can I still point to '2009/11/01/core/os/i686' and get all then-current packages?

to put this into context of xdelta3-3.0w-1-i686.pkg.tar.gz that appeared in the ARM 2009/11/19 and was last snapped on 2010/03/07. the current policy? is to have a rough 6-8 months retention for the repos + maybe 1 month for packages only. i'll just use max 8 months and say 2009/1 and 2010/03 to make it simpler.

in this case the repo 2010/03(containing the last reference) is purged in 2010/11 when put into perspective that's actually 1 year (2009/11 - 2010/11) which is completely out of scope so if you want packages that old you'll just have to download them. you'd know which packages they are anyway. the idea of registering to keep them a little longer or for reminder is in the case that you accidentally delete one, i know i do it all the time, i just totally forget that i have a pkg on hold and go ahead and delete them then upgrade some time later to see if a pkg fixed a bug and notice i now have to re-download it. ofcourse i could just hardlink it somewhere else but you get the idea.

> there's no point in storing deltas for the kernel when it only achieves 1mb or 2mb saving

-rw-r--r-- 1 22M May 23 00:39 kernel26-2.6.33.3-1-i686.pkg.tar.xz
-rw-r--r-- 1 22M May 23 00:37 kernel26-2.6.33.3-2-i686.pkg.tar.xz
-rw-r--r-- 1 22M May 23 00:38 kernel26-2.6.33.4-1-i686.pkg.tar.xz

xdelta3 -e -9 -f -S djw -s kernel26-2.6.33.3-1-i686.pkg.tar.xz kernel26-2.6.33.3-2-i686.pkg.tar.xz 1.xd3
time: 43 sec
size: 6.3 MB

i think you uncovered this scenario in your second post where the delta was almost the same size of the pkg. in these cases there really is no point in using the delta it just makes the user wait even longer while not really lessening the download time.

Coming up next: merging.
Update: Bad news - I get segmentation faults on every "reverse" delta merge and on half forward ones.
I've tried the current version of xdetla3, some previous ones - still no go.

Seems that xdelta is best suited for big packages with relatively minor changes between versions.
Updating is great, downgrading is good, but getting n versions back or forward (for n>1) is painful if not done in a single jump otherwise you have to recreate every version until you get to the one you need.

am not sure what's going on here, have you tried it through gdb, maybe there is a bug in xdelta3

Offline

#79 2010-05-24 12:34:25

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

> there is no urgency for the deltas to be uploaded
Sure, you can sync as usual and build xdeltas in the meantime. When they're ready, upload them and once a month remove the pkg.tar.xz packages if possible.
The scripts have to accept either pkg.tar.xz or xdeltas.
More work for the server - not only sync every day & purge once in a blue moon but also get xdeltas (you upload them to the server) and remove pkg.tar.xz.

htop 0.8.3-1
Build Date:      2009-09-22 00:16:11 UTC

It's the only htop in ARM, will it get purged even if there's no newer version? If understand correctly it won't if it's in the current Arch repo at the time of the purge. If a new version arrives by that time, it will get removed. Would it be possible to keep the package on the server available through the search box but not through 2009/11/19 snapshot until there are four or five htop packages? I'm not sure whether this would work for bigger packages with many dependencies.


As for the merging - it definitely is experimental. I can't really say whether it works - often it sure does not.

[karol@black ddelta]$ time xdelta3 merge -f -m B.xd3 A.xd3 AB.xd3
xdelta3: out of memory: Cannot allocate memory
xdelta3: further input required: Cannot allocate memory

vlc doesn't want to play too:

[karol@black ddelta]$ time xdelta3 merge -f -m 112.xd3 123.xd3 113.xd3
Segmentation fault

real    0m2.659s
user    0m2.077s
sys    0m0.477s
[karol@black ddelta]$ time xdelta3 merge -f -m 123.xd3 112.xd3 113.xd3
*** glibc detected *** xdelta3: munmap_chunk(): invalid pointer: 0x0949f0b8 ***
======= Backtrace: =========
/lib/libc.so.6(+0x6c4f1)[0xb77c24f1]
/lib/libc.so.6(+0x6cc9e)[0xb77c2c9e]
xdelta3[0x8052829]
xdelta3[0x8053284]
xdelta3[0x8069708]
xdelta3[0x806c0ca]
xdelta3[0x806e106]
/lib/libc.so.6(__libc_start_main+0xe6)[0xb776cb96]
xdelta3[0x8048e11]
======= Memory map: ========
08048000-08073000 r-xp 00000000 08:03 177060     /usr/bin/xdelta3
08073000-08074000 rw-p 0002b000 08:03 177060     /usr/bin/xdelta3
08074000-0807e000 rw-p 00000000 00:00 0 
0926e000-0953c000 rw-p 00000000 00:00 0          [heap]
b31fe000-b51fb000 rw-p 00000000 00:00 0 
b54a8000-b64a5000 rw-p 00000000 00:00 0 
b6b7a000-b6b95000 r-xp 00000000 08:03 8295       /usr/lib/libgcc_s.so.1
b6b95000-b6b96000 rw-p 0001a000 08:03 8295       /usr/lib/libgcc_s.so.1
b6ba5000-b6e9f000 rw-p 00000000 00:00 0 
b749b000-b7756000 rw-p 00000000 00:00 0 
b7756000-b789b000 r-xp 00000000 08:03 196862     /lib/libc-2.11.1.so
b789b000-b789c000 ---p 00145000 08:03 196862     /lib/libc-2.11.1.so
b789c000-b789e000 r--p 00145000 08:03 196862     /lib/libc-2.11.1.so
b789e000-b789f000 rw-p 00147000 08:03 196862     /lib/libc-2.11.1.so
b789f000-b78a2000 rw-p 00000000 00:00 0 
b78a2000-b78c5000 r-xp 00000000 08:03 196832     /lib/libm-2.11.1.so
b78c5000-b78c6000 r--p 00022000 08:03 196832     /lib/libm-2.11.1.so
b78c6000-b78c7000 rw-p 00023000 08:03 196832     /lib/libm-2.11.1.so
b78d6000-b78d7000 rw-p 00000000 00:00 0 
b78d7000-b78d8000 r-xp 00000000 00:00 0          [vdso]
b78d8000-b78f4000 r-xp 00000000 08:03 196844     /lib/ld-2.11.1.so
b78f4000-b78f5000 r--p 0001b000 08:03 196844     /lib/ld-2.11.1.so
b78f5000-b78f6000 rw-p 0001c000 08:03 196844     /lib/ld-2.11.1.so
bfe2f000-bfe44000 rw-p 00000000 00:00 0          [stack]
Aborted

real    0m1.578s
user    0m1.157s
sys    0m0.300s
[karol@black ddelta]$

> have you tried it through gdb, maybe there is a bug in xdelta3
I'm a noob, but I'm good at copy-pasting commands, so if you tell me what to type I can have a go at it :-)
I'm going to post a bug report upstream if one isn't filed already.


> the delta was almost the same size of the pkg
There's no way to know beforehand whether this will happen. If it does often, there's no point in merging because by eliminating the big xdeltas you break the chain: keeping just A, B and E won't help much - we need C and D too.
We can try keeping an old version of the package and deltas wrt to that version. This was the user can get the version he wants using simple xdelta decoding. The problem is, he has to download two packages: the old version of xz-compressed package and xdelta of the version he wants. How do you script that - I don't know. Using (pkg).xd3 instead of pkg.tar.xz for xdeltas could help.
Unfortunately the xdeltas grow bigger as the distance between the packages increases.
kernel26-2.6.32.10-1-i686.pkg.tar -> kernel26-2.6.33.4-1-i686.pkg.tar is 29 MB.
BTW, how would you name that delta: kernel26-2.6.32.10-1--2.6.33.4-1-i686.pkg.tar?

I think we will keep kernel as an xz-compressed package and fiddle with some other, vlc responds nicely to xdelta treatment.
The deltas are created wrt vlc-1.0.5-6-i686.pkg.tar.

-rw-r--r-- 1 1.1M May 23 23:37 12.xd3
-rw-r--r-- 1 3.6M May 23 23:37 13.xd3
-rw-r--r-- 1 3.7M May 23 23:37 14.xd3
-rw-r--r-- 1 3.6M May 23 23:38 15.xd3
-rw-r--r-- 1  32M May 23 23:21 vlc-1.0.5-6-i686.pkg.tar
-rw-r--r-- 1  32M May 23 22:57 vlc-1.0.6-1-i686.pkg.tar
-rw-r--r-- 1  32M May 22 17:54 vlc-1.0.6-2-i686.pkg.tar
-rw-r--r-- 1  32M May 22 17:54 vlc-1.0.6-3-i686.pkg.tar
-rw-r--r-- 1  32M May 23 22:57 vlc-1.0.6-4-i686.pkg.tar

When I was working on pkg.tar archives (for both vlc and the kernel) applying the deltas (xdelta decoding) took 1-2 seconds.
When I was working with pkg.tar.xz kernel packages, recreating the packages took 3-4 minutes!
Creating a forward xdelta for the *uncompressed* kernel package took 8 seconds, a backward one - 10 seconds for the small xdeltas and about 3 times as much for the big ones (the twenty-something MB ones). Memory usage: about 120 MB.

Offline

#80 2010-05-24 13:01:00

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

1. Needs more testing, sth from community this time.
2. I'll try to tweak the memory switches - maybe we can get it to work on the server.
2a. Use 'xdelta3 -e -9 -f -S djw -s' if you want small deltas, 'xdelta3 -e -1 -f -S djw -s'  if you want to create them fast(er).
3. Xdelta works great for next / prev versions. For getting 5 or more versions back you'll be better off with a regular xz-compressed package. Keeping every fifth version as a base one for xdeltas may be a sane compromise.
4. There are forward deltas and backward deltas. A is an older version than B. AB.xd3 is a forward delta, BA.xd3 is a backward one.
5. There are base deltas and chain deltas. AB.xd3 is both at the same time. AC.xd3 is base wrt to version A, BC.xd3 is chain. Works for backward deltas too: DA.xd3 is base delta wrt to D, ED.xd3 is chain delta.

6. If we decide to use forward chain deltas, we can use those provided by sabooky. He uses forward chain deltas with a nice naming convention: dhcpcd-5.1.3-1_to_5.1.4-1-i686.delta
Let's say we keep every forth package so (hopefully) the deltas don't get too big:
foo-A.pkg.tar.xz
foo-AB.xd3
foo-BC.xd3
foo-CD.xd3
foo-E.pkg.tar.xz
...

Using foo-A.pkg.tar.xz and foo-AB.xd3 will give you foo-B.pkg.tar.xz. The user has to download 2 files. If he needs foo-D.pkg.tar.xz he has to download 4 files which may be together about twice as big as foo-D.pkg.tar.xz.
If foo-A.pkg.tar.xz gets removed from ARM, all the deltas up to the next regular package are worthless.
Keeping at least two regular packages makes sure there are at least five versions of the package under current assumptions of keeping one in four versions as a regular xz-xompressed package, not a delta ;P

Pros:
- less space needed on the server
- some of the work may be done by somebody else (sabooky?)
- ARM is still a regular mirror
http://bbs.archlinux.org/viewtopic.php? … 27#p763027
ARM is not an archival service in the strictest sense. It serves one purpose, to aid in  downgrade. It must appear to be just another mirror.

Cons:
- more bandwidth used
- added complexity
- more time needed to install an old version of the package (but you can turn off recompression using -R switch)
- the deltas have to be created offline and uploaded to the server because creating them uses too much cpu & mem

Questions:
- what to do w/ rogue deltas i.e. very big ones? They can be bigger than the regular package, but if we use chain deltas we need to keep them sad
We can complicate things more - instead of:
foo-A.pkg.tar.xz
foo-AB.xd3
foo-BC.xd3 # too big
foo-CD.xd3 # too big
foo-E.pkg.tar.xz
foo-EF.xd3
foo-FG.xd3
...

we can switch to:
foo-A.pkg.tar.xz
foo-AB.xd3
foo-C.pkg.tar.xz
foo-D.pkg.tar.xz
foo-DE.xd3 # assuming it's small
foo-EF.xd3
foo-FG.xd3
foo-H.pkg.tar.xz
...

=============================
7. We can also use forward base deltas:
foo-A.pkg.tar.xz
foo-AB.xd3
foo-AC.xd3
foo-AD.xd3
foo-E.pkg.tar.xz
...

Pros:
- less space needed on the server, but more than with forward chain deltas
- with base deltas users download not more than two files (with chain deltas it could be more than two)

Cons:
- added complexity
- we have to do all the work ourselves
- the deltas have to be created offline and uploaded to the server because creating them uses too much cpu & mem
- more bandwidth used, but less than with forward chain deltas (on avg)
- more time needed to install an old version of the package, but less than with forward chain deltas (on avg)
- I think that when an unusually big xdelta occurs we're screwed because it means that   the are big changes in that package so in many cases all the following packages will produce big deltas until the next full package (f.e. foo-E.pkg.tar.xz) will act as the base one

Questions:
- can ARM be used as a regular mirror?


I think forward chain deltas are much better than forward base deltas.

Last edited by karol (2010-05-24 15:44:54)

Offline

#81 2010-05-24 15:47:32

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

8. We can use backward base deltas ... although I think it's the worst possible combination. ARM syncs as usual, provides different versions of packages:
foo-A.pkg.tar.xz
foo-B.pkg.tar.xz
foo-C.pkg.tar.xz
foo-D.pkg.tar.xz
foo-E.pkg.tar.xz
...

Once in a while we create deltas and remove the full packages:
foo-A.pkg.tar.xz
foo-EB.xd3
foo-EC.xd3
foo-ED.xd3
foo-E.pkg.tar.xz
...

Pros:
- same as forward base deltas but:
- we can remove packages one by one instead of "batches" because the base package is newer than the deltas it was used to compute

Cons:
- same as forward base deltas but:
- we have to wait until foo-E.pkg.tar.xz comes along, defeating the space-saving purpose of deltas

Questions:
- can ARM be used as a regular mirror? It will be a regular mirror up until foo-E.pkg.tar.xz comes and we create deltas and remove the full packages

Last edited by karol (2010-05-24 15:48:41)

Offline

#82 2010-05-24 16:32:24

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

9. Last but not least, we can use backward chain deltas:
foo-A.pkg.tar.xz
foo-CB.xd3
foo-DC.xd3
foo-ED.xd3
foo-E.pkg.tar.xz

When foo-F.pkg.tar.xz comes, we create another delta - foo-FE.xd3 - and remove foo-E.pkg.tar.xz.


Pros:
- less space needed on the server
- probably the most bandwidth-efficient choice: if we get it right we can actually save some bandwidth
- ARM is still a regular mirror - foo-E.pkg.tar.xz is a regular package, foo-ED.xd3 is   the delta for the previous version. Downgrading from foo-E to foo-D is simple: 'xdelta3 -d -s foo-E.pkg.tar.xz foo-ED.xd3 && pacman -U foo-D.pkg.tar.xz'

Cons:
- added complexity
- we have to provide the deltas ourselves
- more time needed to install an old version of the package (but you can turn off recompression using -R switch)
- the deltas have to be created offline and uploaded to the server because creating them uses too much cpu & mem

Questions:
- what to do w/ rogue deltas i.e. very big ones? (see forward chain deltas)



Did I miss sth or got sth wrong?

Offline

#83 2010-05-24 18:28:45

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

Some "old" stuff with new comments

http://bbs.archlinux.org/viewtopic.php? … 27#p763027

kumyco wrote:

we don't care about the upgrade scenario, A - B. We only care about the downgrade scenario, B - A.

Backward chain deltas seem the right thing. They work even if the user doesn't have the latest version of the needed package.

I'll look into doing both forward and backward deltas in one step - that would be nice.
Update: Nope, it's not possible atm: http://code.google.com/p/xdelta/issues/detail?id=62

Wrt full downgrade over a long period - dear user, you're on your own; ARM may provide you with some packages / deltas but that's that.
I think keeping dated repos for 8 months and packages for a year would be great.


Here's sabooky's post again
http://bbs.archlinux.org/viewtopic.php? … 17#p724617

sabooky wrote:

Here's HD usage locally on my box:
# du -hcs repos/*/os/i686
7.3G    repos/community/os/i686
270M    repos/core/os/i686
11G    repos/extra/os/i686
18G    total

# du -hcs repos/*/os/i686/deltas
217M    repos/community/os/i686/deltas
49M    repos/core/os/i686/deltas
2.2G    repos/extra/os/i686/deltas
2.5G    total

217MB of community deltas is simply weird and it seems some packages are missing from the delta repo he provides:
nexuiz-data-2.5.2-1-..> 01-Oct-2009 12:51  842M
sauerbraten-2009_05_..> 23-May-2009 21:45  333M
flightgear-data-2.0...> 27-Feb-2010 14:00  303M
...

Last edited by karol (2010-05-24 18:35:04)

Offline

#84 2010-05-24 21:37:56

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

It's the only htop in ARM, will it get purged even if there's no newer version? If understand correctly it won't if it's in the current Arch repo at the time of the purge. If a new version arrives by that time, it will get removed.

that htop is in today's snapshot http://arm.konnichi.com/2010/05/24/extr … pkg.tar.gz so it will exist roughly 8 months after today. the countdown actually starts from the last day it's seen, so if it's updated tomorrow then the countdown starts tomorrow. i know it sounds strange, the ARM is a set of *separate* mirrors(each of them are snapshots from a single day). so when one mirror(for a single day) gets purged all its files are gone, but none of the others are affected and it's still in the master mirror until i remove it. the master holds the last reference to every pkg so if i remove all the dated repos then the every file still exist because they are also in the master.

Would it be possible to keep the package on the server available through the search box but not through 2009/11/19 snapshot until there are four or five htop packages? I'm not sure whether this would work for bigger packages with many dependencies.

the ARM is a strange beast. the search is not connected to the dated repos, they are connected to the master only.

xdelta3: further input required: Cannot allocate memory

you can run a command through gdb by doing

 gdb xdelta3

then when it starts

merge -f -m B.xd3 A.xd3 AB.xd3

then when/if it crashes

bt

and it'll tell you where it crashed and maybe some hint as to why.
from the logs i tell it's mostly likely just running out of memory as it tries to allocate large buffers.

There's no way to know beforehand whether this will happen. If it does often, there's no point in merging because by eliminating the big xdeltas you break the chain: keeping just A, B and E won't help much - we need C and D too.

this is only an issue for us, so we can simply decide before upload what's worth uploading and what's not. if the delta is almost as big a the pkg then we might as well just discard, it's fine if the chain breaks, it just means that anyone who wants that pkg will have to download the full
pkg but the delta would have been almost as big anyway and then they would have to wait again once it downloaded so it's just not worth it.

BTW, how would you name that delta: kernel26-2.6.32.10-1--2.6.33.4-1-i686.pkg.tar?

prefferably anything that is close to how pacman understands it if we can't use pacman and must instead handle it ourselfs then anything that's easy to parse, so probably not using more hyphens(-) or dots(.) but maybe something else like underscores or plus or colon. and maybe .delta extension to make it clear.

When I was working on pkg.tar archives (for both vlc and the kernel) applying the deltas (xdelta decoding) took 1-2 seconds.
When I was working with pkg.tar.xz kernel packages, recreating the packages took 3-4 minutes!

if that behaviour is consistent througout, then we can use uncompressed pkgs instead or maybe just for the big pkgs, if we're to use our tool to handle it then we can cater for both cases easily.
-----
I think we can simplify by only looking the common use-case. that, i just upgraded to pkg-5 but there is a bug, i most likely still have pkg-5 in most cases you'd have upgraded from pkg-4 so making deltas only between consecutive pkgs might not be such a bad idea. if i understand correctly there might be a chance for creating pkg-2 from pkg-5 and aa couple deltas so that should be fine. we can always calculate the download size to see which is smaller, deltas or pkg-2. so 2-1 3-2 4-3 5-4


Questions:
- what to do w/ rogue deltas i.e. very big ones? They can be bigger than the regular package, but if we use chain deltas we need to keep them sad

if like in the case of the kernel btween versions where the deltas is almost as  big as the pkg just don't bother with the delta, it can be a limit of say, it mut be at least 3-4mb smaller than the pkg otherwise just discard it.


Questions:
- can ARM be used as a regular mirror?

yes ARM is a set of mirrors, point your mirror in pacman.d/mirrorlist to Server = http://arm.konnichi.com/2010/03/30/$repo/os/i686 or for the master(always the latest) Server = http://arm.konnichi.com/$repo/os/i686
-------

the pkgs will never be removed so deltas can simply be added on top of it.
so b-a c-b d-c should be fine, the user will have have one pkg that match one of those if they are available so it should be fine. no need to worry about going from d to a.

- what to do w/ rogue deltas i.e. very big ones? (see forward chain deltas)

just discard them and if we can tell which deltas we'll need then we can calculate the size and if it's almost as big or bigger than the pkg then  just download the pkg.


Wrt full downgrade over a long period - dear user, you're on your own; ARM may provide you with some packages / deltas but that's that.
I think keeping dated repos for 8 months and packages for a year would be great.

dated repos are purged after 8 months, the packages are still alive for maybe another month. but because of the way the ARM works, most pkgs also exist in other dates repos so the pkg itself stays alive even longer. e.g htop it will exist next year so that's over 1 year without explicit retention of 1 year.

-----

if i get time tomorrow i think i might draw up some diagrams of the ARM, it's simple but it's made up of indetpendent parts are strongly related to each other.

Offline

#85 2010-05-24 23:02:28

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

> the ARM is a strange beast. the search is not connected to the dated repos, they are connected to the master only.
> e.g htop it will exist next year so that's over 1 year without explicit retention of 1 year
So I will be able to get the packages even after the date repos are gone - that's what I asked for :-)

> can ARM be used as a regular mirror?
I wanted to know would ARM still be a mirror if we adopted -say- backward chain deltas.
It sure is a mirror _now_ but I haven't yet tried to downgrade anything via pacman + deltas.

I'm not worried about breaking the chained deltas, if the deltas get too big, we start the chain anew, but this is getting dynamic, I see many many variables and somebody will have to code this unless you want to do it by hand :-) You upload (or rather keep, because it's already synced) a full package and the next three versions are deltas *unless* they get too big and then you upload the regular packages and reset "delta-counter" to 0 - meaning it's a full package. Or you can count down and when it hits '0' you don't create a delta but upload the full package.
You can set different '"delta-profiles" for different packages: you treat the kernel this way, and you compress vlc more and so on.

Do we use deltas for packages smaller than 3-4M? You probably will get 1M delta instead of 3M package.
I don't know what can you do on the server, but maybe creating deltas of small packages will be fine? The worst part would be xz-decompression.

How about dhcpcd-5.1.3-1_to_5.1.4-1-i686.delta for the name?


Twiddling the memory switches
The memory switches work nicely and you can limit memory (but not cpu) consumption to next to nothing, but then the size difference between the delta and the original file is also minimal.
Maxing out all memory settings doesn't buy you much - if anything.
Compression: '-9' takes longer and needs more memory but that might not be an issue while doing deltas offline.
I haven't yet tested this extensively, but kernel xdeltas got smaller, the process used less memory (80M v. 120M) and was faster when changing '-W'

xdelta3 -e -1 -S djw -f -W 1048576 -s

The '-W' switch is set to 8M by default. By setting it to 1M I got the best results.

Offline

#86 2010-05-25 00:26:02

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

Actually, why ARM can't be delta-only? We replace the packages by deltas (assuming they're small enough) and just keep the newest full package (for the forward chain deltas). Yeah, some deltas will be too big but other than that?

full package
               delta
full package
               delta
full package
               delta
full package
               ...

Offline

#87 2010-05-25 18:49:17

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

> can ARM be used as a regular mirror?
I wanted to know would ARM still be a mirror if we adopted -say- backward chain deltas.
It sure is a mirror _now_ but I haven't yet tried to downgrade anything via pacman + deltas.

it's a mirror now and it will always be a mirror for its entire existense on arm.konnichi.com - deltas is just an addition service, just /search and /find dated repos the master repo they are separate, they just happen to be related.

I'm not worried about breaking the chained deltas, if the deltas get too big, we start the chain anew, but this is getting dynamic, I see many many variables and somebody will have to code this unless you want to do it by hand :-) You upload (or rather keep, because it's already synced) a full package and the next three versions are deltas *unless* they get too big and then you upload the regular packages and reset "delta-counter" to 0 - meaning it's a full package. Or you can count down and when it hits '0' you don't create a delta but upload the full package.
You can set different '"delta-profiles" for different packages: you treat the kernel this way, and you compress vlc more and so on.

what's with with b-a c-b d-c e-d, we can be sure the user will have one of the packages in that group so we just find the appropriate single delta and use it, if it's not there then they'lll just have to download the full pkg

Do we use deltas for packages smaller than 3-4M? You probably will get 1M delta instead of 3M package.

if the delta is less than 2m then i think it's fine.

I don't know what can you do on the server, but maybe creating deltas of small packages will be fine? The worst part would be xz-decompression.

nothing will be done on the server, the problem is cpu usage, so limiting memory doesn't solve anything

How about dhcpcd-5.1.3-1_to_5.1.4-1-i686.delta for the name?

it's easy enough to parse


Twiddling the memory switches
The memory switches work nicely and you can limit memory (but not cpu) consumption to next to nothing, but then the size difference between the delta and the original file is also minimal.
Maxing out all memory settings doesn't buy you much - if anything.
Compression: '-9' takes longer and needs more memory but that might not be an issue while doing deltas offline.
I haven't yet tested this extensively, but kernel xdeltas got smaller, the process used less memory (80M v. 120M) and was faster when changing '-W'

xdelta3 -e -1 -S djw -f -W 1048576 -s

The '-W' switch is set to 8M by default. By setting it to 1M I got the best results.

i have the processing power and memory to handle that so no problem there.

---------

Actually, why ARM can't be delta-only? We replace the packages by deltas (assuming they're small enough) and just keep the newest full package (for the forward chain deltas). Yeah, some deltas will be too big but other than that?

that just added complexity for no gain so better to just keep them separate by simply laying delta on top of it.

-------------------------
-------------------------

http://arm.konnichi.com/find - allows searching for packages based on files they contain, it's just a live test.
still need to revise the file format. it's worse-case is approx 4 seconds, vs 5-6 seconds constant? case for searching files.tar.gz directly vs over 1 minute after cache for pkgfile.
I can make it faster still, so stay tuned. it doesn't yet give a download url but that's just addition data to add to the index.

Offline

#88 2010-05-25 19:17:56

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

There's a lot of packages >1M. Deltas will be even smaller but the question is: is it worth the hassle?

> i have the processing power and memory to handle that so no problem there.
So you may want to keep the default settings just in case and use '-6' or '-9' compression level.
Of course you may run a test to compare time/size effects w/ the default settings and with lowered '-W' & '-1' compression on a hundred packages :-)

> what's with with b-a c-b d-c e-d, we can be sure the user will have one of the packages in that group
> so we just find the appropriate single delta and use it, if it's not there then they'lll just have to download the full pkg
If you provide both deltas and full packages that is. If sb will be interested in mirroring / leeching only deltas, will it be possible?

Correct me if I'm wrong, http://arm.konnichi.com/find is like 'pacman -Qo' but it works for all packages not only the ones I have installed, right?
'1 Result Found in 0.005ms' <- this is quite fast :-D

Offline

#89 2010-07-07 05:18:19

Duca
Member
Registered: 2009-01-12
Posts: 23

Re: Project ARM :: Arch Rollback Machine

Greetings kumyco,

I would like to contribute! The first thing that comes to my mind is that i got some space and a high speed at the university lab i work at. The chief professor already agreed with it. How much space does the entire repo require?

I would also help with scripting, i have a not-that-big experience in python and bash

Offline

#90 2010-08-16 21:18:25

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

kumyco, have you tried using xdelta with cpulimit on the server?

sudo cpulimit -e xdelta3 -l 25

limits the allowed use of cpu to 25% for the first PID (lowest one?) of xdelta3 it finds. You can specify the PID using '-p' instead of '-e'.

Offline

#91 2010-08-29 08:36:28

Mohandas
Member
Registered: 2010-08-08
Posts: 4

Re: Project ARM :: Arch Rollback Machine

Something wrong with vlc 1.1.3 and vlc-plugin 1.1.3. I've got 403 error when trying to download them wink

Offline

#92 2010-08-29 13:45:30

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

Mohandas wrote:

Something wrong with vlc 1.1.3 and vlc-plugin 1.1.3. I've got 403 error when trying to download them wink

vlc-1.1.4-1-i686.pkg.tar.xz is not there too.

Same for
kernel26-2.6.35.2-1-i686.pkg.tar.xz
kernel26-2.6.35.3-1-i686.pkg.tar.xz

Last edited by karol (2010-08-29 13:46:57)

Offline

#93 2010-09-01 22:17:04

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

The last updates are from 16th August. I think Ibiblio server stopped syncing and ARM mirrors that server.
I've already contacted kumyco.

Offline

#94 2010-09-02 12:18:14

DisposaBoy
Member
Registered: 2010-09-02
Posts: 8

Re: Project ARM :: Arch Rollback Machine

fixed. for now, Arch repos started spamming the place with symlinks instead of actually moving the files leaving broken sysmlinks everywhere meaning the files appeared to be there when they were in-fact not.

Offline

#95 2010-09-02 12:27:31

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

DisposaBoy wrote:

fixed. for now, Arch repos started spamming the place with symlinks instead of actually moving the files leaving broken sysmlinks everywhere meaning the files appeared to be there when they were in-fact not.

OK, thanks, umm, DisposaBoy? :-)

I think the symlinks were in use for some time, one reason was to symlink files from testing to e.g. core without redownloading them.

Offline

#96 2010-09-02 20:56:44

Mustard
Member
From: Noblesville, Indiana
Registered: 2010-03-02
Posts: 39
Website

Re: Project ARM :: Arch Rollback Machine

karol wrote:

OK, thanks, umm, DisposaBoy? :-)

I think he hangs out with Kick Ass and Red Mist.  A real superhero!

Offline

#97 2010-09-10 14:34:49

DisposaBoy
Member
Registered: 2010-09-02
Posts: 8

Re: Project ARM :: Arch Rollback Machine

Yes I'm making a new start at life so I needed to become someone with less sinister(kumyco comes from kumiko = Suicide Club 2). Disposa-Boy is an episode of Oliver Beene, and at least me thinks it's the coolest name ever.
But anyway....

I'm aware of the broken pakages(403 errors) from august and september, maybe july as well. I checked and there aren't that many I just haven't cleaned the symlinks because I'm planning to try and recover as many of them as possible hopefully over the weekend followed by purge 1 and plans for ARM3 and maybe start using the mailing list cos i'm out of touch with this forum thing these days.

Offline

#98 2010-10-14 16:28:18

steve___
Member
Registered: 2008-02-24
Posts: 452

Re: Project ARM :: Arch Rollback Machine

Where is the mailing list?

Offline

#99 2011-01-09 13:37:08

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

@DisposaBoy
Have you thought about the xdelta compression? There's a new stable release: Release 3.0.0 http://xdelta.org/

It (says it) fixes some bugs http://code.google.com/p/xdelta/issues/ … can=1&q=xz

Offline

#100 2011-01-26 01:38:04

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

karol wrote:

@DisposaBoy
Have you thought about the xdelta compression? There's a new stable release: Release 3.0.0 http://xdelta.org/

And sabooky says he needs somebody to take over archdelta.net.

Offline

Board footer

Powered by FluxBB