You are not logged in.

#51 2009-11-26 10:11:42

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

I'll just write a general reply - to clarify things.

I have unlimited bandwidth and space, but as we know there is no such thing, it all depends on reasonable use.
I decided to start over because to process that much data was too much for no benefit IMHO.
To show checksums and other pkg details was overkill in the search interface, I can't think of a need to do it at all.
ARM1 kept track of what was available and when. ARM2 uses hardlinks instead, meaning less duplicates, actually it should be elimitated by now due to the Arch mirror re-organization(-any architecture) and my rsync setup(specificying all directories that could share the same package([*testing] and the other repos).

I thought about a -latest pkgver setup on purging, so let's say retention was 3 months when 3 months came the latest pkgrel of the latest pkgver would be kept for up to a year or so. The problem with this is that the latest version is often the one with the problems, and I've had the latest pkgrel break as well, if any additional processing was to be done to keep - say the 2 latest pkgrels etc, then you might as well just keep all of them. People don't submit bug reports so there is no way to be sure that the latest pkg isn't in fact breaking something.
For other mirror layouts all the info should be available to do allow it.
I personally don't think it's necessary but if you think it is, convince me.

For an idea of how the system(ARM2) works.
the repos are synced to the latest on iblibio but no packages are deleted, that's what http://arm.konnichi.com/core et al. are(global repos).
then a date organised directory is created and the latest repo is hardlinked there - no duplication.
a list of the packages are generated and from this the search index? (it's literall just a list of: pkgname filename\n) no info is stored. during this, if purgin is enabled a stat is done to see the hardlink count, and all packages that can be purged and is deleted. if the hardlink count is 1 then the pkg is removed from the global repo.

It really is that simple. rationale:
If i have a broken system I prefer to revert to the last upgrade, pacman -Suu is here now and it works in most cases(some version requiresments may break it) so i didn't bother writing a script. There are other scripts that allow downgrading a single pkg, beyond that the waters are murky because any script that attempts a full rollback will likely break the system.
A simple way to allow this if you want, is to make a note of the packages tht were installed including their versions, then write a script that will download and install them as necessay, there is a script from Ghost1227 that does this (I don't remember if it does it by version).

I don't plan to deviate from the date-based retention which will be at least a shortly after gnome 2.x.1 is released - gnome because it usually/always? get released after kde. There should be no problem with this as only the repo only a particular repo is removed and many of the packages are the same in later repos.
This means that if you want to create a specific mirror type then all the packages and the necessary info should be available.

----------------------------------------------------

note: the domain is changing to the main arm.konnichi.com - so script-writers please remember to update this on your next release. arm.kh.nu won't be gone until about a year so no rush.

I will setup a mailing list so everyone can get updates more frequently and promptly.

Offline

#52 2009-11-26 12:35:03

DonVla
Member
From: Bonn, Germany
Registered: 2007-06-07
Posts: 997

Re: Project ARM :: Arch Rollback Machine

Hi kumyco,

it seems that the regex search still doesn't work.
http://arm.konnichi.com/search/index.ph … ommunity=1
does not show anything.

Offline

#53 2009-11-26 16:49:42

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

DonVla wrote:

Hi kumyco,

it seems that the regex search still doesn't work.
http://arm.konnichi.com/search/index.ph … ommunity=1
does not show anything.

Sorry, bad revert(of the indexing script). I took the time to make the pkg lists visible at http://arm.konnichi.com/lists/ for anyone who wants them.

Offline

#54 2010-05-07 16:21:35

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

Does anyone want the packages?

I can't be sure how much longer I'll be using Arch and if I decide to shut the ARM down I don't want it to simply disappear like last time when people complained that there was no warning.

Offline

#55 2010-05-07 16:46:36

DonVla
Member
From: Bonn, Germany
Registered: 2007-06-07
Posts: 997

Re: Project ARM :: Arch Rollback Machine

kumyco wrote:

Does anyone want the packages?

Yes! Definitely!

I can't be sure how much longer I'll be using Arch and if I decide to shut the ARM down I don't want it to simply disappear like last time when people complained that there was no warning.

ARM is a pretty cool project which simplifies Arch's downgrade issues.
Perhaps someone else is interested in maintaining it.

Offline

#56 2010-05-07 21:26:06

Gen2ly
Member
From: Sevierville, TN
Registered: 2009-03-06
Posts: 1,529
Website

Re: Project ARM :: Arch Rollback Machine

Yes, for me too; had to use ARM a couple times when I had problems.  Good of you to put up an advance, hopefully there is someone out there that can host them.


Setting Up a Scripting Environment | Proud donor to wikipedia - link

Offline

#57 2010-05-08 04:43:59

Stythys
Member
From: SF Bay Area
Registered: 2008-05-18
Posts: 878
Website

Re: Project ARM :: Arch Rollback Machine

kumyco wrote:

Does anyone want the packages?

I can't be sure how much longer I'll be using Arch and if I decide to shut the ARM down I don't want it to simply disappear like last time when people complained that there was no warning.

I'll host them big_smile. sending ya an email with more details.


[home page] -- [code / configs]

"Once you go Arch, you must remain there for life or else Allan will track you down and break you."
-- Bregol

Offline

#58 2010-05-08 05:57:14

DonVla
Member
From: Bonn, Germany
Registered: 2007-06-07
Posts: 997

Re: Project ARM :: Arch Rollback Machine

Woohoo! Thanks Stythys!

Offline

#59 2010-05-08 11:00:35

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

@Stythys Reponding here cos' I'm lazy.
I'll let you know if I do leave Arch so we can work out a hand-over process.

I'm not leaving yet, and I do have some ideas though I'm not sure they're all ARM specific so I'll continue with that when I get time.
It's nice to know that this won't just die if I do leave, though.

p.s It's nothing against Arch, I just feel I've reached a point of burn-out and am looking into ways of coping with it.

Offline

#60 2010-05-21 18:26:35

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

Ok, time to do something about the upgrade/rollback situation.
For months I've had the idea how to hand this(sorry not done anything about) but anyway...

the idea is, relying on a script or such to automatically doing this is like relying on undefined behaviour in C, nuf said.
the proper way IMO is to record the packages that were installed before the update (and after).
with this information we can do a straight install of everything to rollback, we could even uninstall existing packages if needed

advantage to this approach is less worry about the dependency break. as i've said many times, pacman can actually downgrade
but there are sometimes dependency breaks and i didn't write an automated tool to resolve them because it's unsafe.
this approach is also but many of these breaks don't result in system breakage they are just conflicts.

the basic process is


* record the installed packages and versions
* do the update
* maybe record new installed packages
      this allows to apply that same update again, e.g you wanna test some bug that caused the initial downgrade

for downgrading

* the user specifies a snapshot to install - yes, install
* we figure out all the deps and what we don't already have locally we download
* with all files sitting in the cache we instruct pacman to install all the files we want installed

at this point we pray that pacman doesn't have any dependency issues
i've guessed that it'd be safe force the install, pacman isn't gonna install anything if there are file conflicts anyway

that's the basic idea of that approach how well it works i don't know.

other ideas i had was since pacman has issues replacing packages installed via pacman -U (soon to fixed, in the next pacman release i think)
would be to construct a repo db based on the files we want to install and to rollback we attach this to a copy of pacman.conf and use that to do an update -- pacman would see these files and downgrade some or all of them.

this might even be the nicest way to do it, that way pacman figures it all out for us.

---------------------------------------------------------------------------------------------

@all that have used the ARM, I'd like to hear about how you used it
did you just downgrade a single file or did you attempt a full rollback
how far back have to downgraded, when Purging Day(tm) comes, I don't want to remove files people are likely to need
so i was thinking of creating a service whereby certain packages could have references added to them and those would be held
for say up to a month longer. - this ties in with the tool, it'd add a reference when it does the snapshot and if say you made a second update to any of those packages then it decreases a reference.

are there any request for things that could make it better, currently i'm researching? ideas for efficiently packing the files list so you can also search for a package based on just a filename, which is done, but i have to compromise one of space, time, memory, none of which i want to do on the server.

Offline

#61 2010-05-21 20:17:44

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

I've successfully used up/downgrading from cache a couple times (back and forth - I have an old intel card):
* record the installed packages and versions
* do the update
* record new installed packages
etc.

Are btrfs snapshots easier?
* create snapshot
* update
* test
if OK, go on w/ your life, if not - revert to snapshot.


I've used ARM
- to grab several versions of foo package and play with dar, xdelta3, bsdiff and lrzip
- to get a couple of old packages I needed -> manual downgrade

I have the copies I need, so as far as I'm concerned you can remove everything :-)

Offline

#62 2010-05-22 07:08:58

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

karol wrote:

I've successfully used up/downgrading from cache a couple times (back and forth - I have an old intel card):
[...]
Are btrfs snapshots easier?
[...]

I've used ARM
- to grab several versions of foo package and play with dar, xdelta3, bsdiff and lrzip
- to get a couple of old packages I needed -> manual downgrade

I have the copies I need, so as far as I'm concerned you can remove everything :-)

with regard to btrfs, I don't know it how easy it is, but that like lvm serves a slightly different purpose and require setting it up. it is however better, so if you are up to doing that i think you should. the rollback is focused catering for broken packages not necessarily a broken system.
so if you upgrade and say a new lib was released and some older app start crashing you just roll everything back. with btrfs you could restore everything as it was.

Offline

#63 2010-05-22 07:48:19

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

> if you are up to doing that i think you should
I think I'll wait a bit.

> with btrfs you could restore everything as it was.
Would that mean also restore old data from the /home partition? I hope not, if it's on a separate partition it shouldn't be affected.


How much space / transfer does ARM need atm? How much does it cost?

Offline

#64 2010-05-22 08:22:45

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

This is mostly assumptions, so take it with a grain of salt...
You would have control over what is rolled back so AFAIK you could rollback a specific directory such as /usr or something long those lines. Only /etc and /home and maybe /usr/local are typically touched by uses so there should few problems with blindly reverting changes in directories where packages were installed. The advantage in lvm or btrfs is that you can cover things such as config files, but the only issues i've ever had were resolved by simply downgrading a package with no config issues.

i can't really say much about actual bandwidth it's hard to track - maybe 2-3gb a day avg. yesterday AFAICT less than 10 megs were synced but when there are big rebuilts that number rises gigs per day.

likewise i'm a little clueless as to how much it costs - konnichi.com has unlimited bandwidth and disk usage - within reasonable use ofcourse, it's already paid for until next january so i'll find out how much it costs then.

below is the current stats

27G    community
3.5G    community-testing
2.4G    core
55G    extra
14G    testing

a total of 33288 packages  over almost 7 months

Offline

#65 2010-05-22 14:24:59

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

So let's say over 100GB of storage (and counting) + 100GB bandwidth /mo.

xdelta3 could reduce that amount by a fair bit but would require more cpu power to compute the deltas. It would also make the process less straightforward from the user's perspective.
Unfortunately reverse deltas aren't yet supported
http://code.google.com/p/xdelta/issues/detail?id=62

Let's say I have foo-3.4.1 and for some reason I need foo-3.2.2. The nearest full foo package in ARM is foo-3.1.1 - all later are xdeltas computed wrt that package.
I would have to download a full foo-3.1.1.pkg.tar.xz package and foo-3.2.2.xd3, compute foo-3.2.2.pkg.tar.xz and install it.
I hope pacman won't complain about foo-3.2.2.pkg.tar.gz package - gzip compression is faster and I prefer to use it for the apps I get from AUR, so the option of recompression w/ gzip rather than xz would be nice.

It would be possible to keep the older files for longer time, because deltas are pretty small, but it would mean more bandwidth consumption until somebody writes reverse deltas, because I have to download the full package *and* xdelta of the version I need. On the user's side it would require a script to do all the delta-fiddling & installing / downgrading.

pacman & xdeltas - read shining's posts
http://bbs.archlinux.org/viewtopic.php?id=92085&p=2


Feel free to tell me why is it a bad idea :-)


Edit: sabooky provides deltas for i686 so you may want to join forces.

Last edited by karol (2010-05-22 14:32:48)

Offline

#66 2010-05-22 15:28:16

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

not necessarily a bad idea, i just think it has little gain:

* increased complexity
* will require a lot of CPU which lands me even further into the grey area of 'reasonable use' - I can't do it offline because that requires constant updates (remember it's a daily snapshot)
* it might save space but the space isn't really and issue, it's more a question of having 5GiB worth of packages for no reason.
Arch is rolling release so after a certain point it can be reasonably expected that there are no systems out there that out of date to that extent and even in that case it's questionable whether a rollback would be possible

if there's a download size issue: i haven't tested it so i can't say, but if delta were fast enough to be done on the fly then i'd have no problem with using that. there needn't be any support in pacman since we'd be using a separate tool anyway so we can reconstruct the package ourselves.

i much prefer extracting the file though, i did some into it and it's not much larger than the delta. just looking at an extreme example now
the extraction method of difference between vlc-1.0.6-3-i686.pkg.tar.xz and vlc-1.0.6-2-i686.pkg.tar.xz results in 1.3mb and 2.1mb difference which drops to  594kb and 595kb when compressed. xdelta3 gives me 3.1mb and 5.3mb taking 5 seconds each.

note: the higher size going from rel 3 to rel 2. ofcourse this is only 1 case so ofcourse this isn't the case everywhere but in all my tests the extracted difference was always comparable to delta while fast(faster ? requires some testing).

Offline

#67 2010-05-22 15:42:03

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

> t's more a question of having 5GiB worth of packages for no reason.
These are my thoughts exactly. If somebody needs the packages, he will download them for himself when they're still available from ARM and keep them in his backup as long as he wants/needs.

Last edited by karol (2010-05-22 15:55:27)

Offline

#68 2010-05-22 15:53:26

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

in that case there should be no issue removing them. the aim to to have at least 6months worth of packages  so if there are users out there who use arch and don't update that frequently(hell even 3 months is pushing) then you we can talk about keeping older packages around longer using the referencing system or similar but really in that case arch doesn't sound like the right distro for such a setup.

Offline

#69 2010-05-22 15:55:36

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

I don't know how well xdelta3 handles xz compression - that may be the reason you get such big deltas - I'll have a go at it after dinner: 'm starving.

Update 1: using gz instead of xz gives me a small delta in 5 secs.

Update 2:
sabooky's stats
http://bbs.archlinux.org/viewtopic.php? … 17#p724617

sabooky wrote:

Here's HD usage locally on my box:
# du -hcs repos/*/os/i686
7.3G    repos/community/os/i686
270M    repos/core/os/i686
11G    repos/extra/os/i686
18G    total

# du -hcs repos/*/os/i686/deltas
217M    repos/community/os/i686/deltas
49M    repos/core/os/i686/deltas
2.2G    repos/extra/os/i686/deltas
2.5G    total

The difference in the community repo / deltas is staggering - hope it's not a typo :-)


kumyco wrote:

* will require a lot of CPU which lands me even further into the grey area of 'reasonable use' - I can't do it offline because that requires constant updates (remember it's a daily snapshot)

I'll have to re-read how ARM works ;-P
I don't know how efficient would that be, but I think I could have a base set of packages and create deltas for the new ones offline, then upload just small deltas to the server. If I can create deltas with the speed of 1MB/s (process 1 MB of an xz-compressed file - somehow), "converting" 100GB of *.pkg.tar.xz files into deltas should take just a day or two.

100 GB in 7 months means an avg. of 500MB/day which means we need 10 mins/day of cpu time capable of creating the deltas at the speed of 1MB/s. I have no idea whether this is reasonable or excessive use.

Last edited by karol (2010-05-22 16:49:28)

Offline

#70 2010-05-22 21:17:18

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

doing a delta between kernel versions for me take 20 seconds and uses something over 150mb of ram. now put this into context, it's not 500mb per day. it's 2 and 3 gb. remember it's not a uniform sync, some days it's very high, others it's low, i'd estimate that it could be done in 10 seconds on the server which doesn't sound bad until you add them all up.

but still i don't think it's necessary. the main issue is reducing the download size on the fly to make it faster for the user, it doesn't need to be done in advance, i.e convert to a delta repo that imo is a waste of time since it's not a normal repo so usage is quite low.

still i'm open to convincing about using delta on the fly, i can't get it working well where it matters most, that is with the bigger packages. it takes time, gcc is done in 5 seconds, kernel 20 and that's for 1 package this adds up
--------
test before submit
--------

Yikes! One of your processes (xdelta3, pid 5798) was just killed because your
processes are, as a whole, consuming too much memory. If you believe you've
received this message in error, please contact Support.

i think that pretty much puts the final nail in that coffin of delta on the server side.

Offline

#71 2010-05-22 21:53:55

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

> i can't really say much about actual bandwidth it's hard to track - maybe 2-3gb a day avg.
And that would stay the same until there's a working reverse xdelta implementation. Actual downloads would be a tad bigger: the base package (the oldest you keep) *plus* xdelta of the version the user needs.
When you periodically purge ARM archive, you have to create another set of base packages or you decide beforehand that you keep 1 in 10 versions or so.
Until the reverse xdelta arrives, downloads wont' be smaller (than xz-compressed) and thus users won't download them faster (ceteris paribus).

> convert to a delta repo that imo is a waste of time since it's not a normal repo so usage is quite low
sabooky is making deltas anyway, so maybe you can grab them. MAybe there *is* a reason for keeping things a year or two - tiny xdeltas are the way then.
I don't think it's a problem if ARM is a day or two behind - it's for archival use. Even if it takes a day to create the deltas it doesn't really matter. I still have to read your post about modus operandi for ARM, maybe I'm misunderestimating things.

> still i'm open to convincing about using delta on the fly
I don't see how this can be done, you would have to tap into huge amount of cpus and RAM to somehow serve 20 kernels to 20 people at the same time.

Why can't you create xdeltas beforehand and delete the new package. The repos would shrink by 80% and you could mirror ARM if more bandwidth would be needed. It's easier to find sb to host 20GB than 100GB.

As for the 'Yikes!' note, I think you can limit the memory usage, I'll have a look right away.
Update: using compression '-1' instead of '-9' produces bigger deltas but is much faster and uses less memory.
http://code.google.com/p/xdelta/wiki/TuningMemoryBudget
" -9 takes about four times as much memory as -1. "


The way I see it: you sync ~500MB/day on avg. from the Arch mirrors (100GB after almost 7 months). On my P4 1GB RAM it shouldn't take longer than 10 minutes to create all the xdeltas and remove the synced packages. You do this once. and users download files same way they're doing now, only they get deltas and the base package so they can recreate the needed packages locally on their computers.

Last edited by karol (2010-05-22 22:35:44)

Offline

#72 2010-05-22 23:17:59

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

A couple thoughts:

[karol@black ddelta]$ time xdelta3 -e -1 -f -S djw -s kernel26-2.6.32.2-2-i686.pkg.tar kernel26-2.6.33.4-1-i686.pkg.tar 1.xd3

real    0m34.785s
user    0m28.145s
sys    0m0.990s


-rw-r--r-- 1 29M May 23 01:15 1.xd3
-rw-r--r-- 1 22M May 23 00:38 kernel26-2.6.33.4-1-i686.pkg.tar.xz

Not much sense of doing "long-range" xdeltas - at least for the kernel.


Update 1:

pacman -U delta3-3.0v-2-i686.pkg.tar.gz
--
[karol@black ~]$ xdelta3 -V
Xdelta version 3.0u, Copyright (C) 2007, 2008, Joshua MacDonald

This version works w/ xz-compressed packages - it can deal with de/recompression transparently.


Update 2:
I have no idea how the downgrade / rollback scripts work and whether there will be much pain to go through to adopt xdelta packages (*if* indeed xdeltas are useful for ARM).


Update 3:
I think I didn't read http://code.google.com/p/xdelta/issues/detail?id=62 carefully - reverse xdeltas *are* possible.

Were you suggesting sth along

xdelta3 -e -9 -f -S djw -s vlc-1.0.6-3-i686.pkg.tar.xz vlc-1.0.6-2-i686.pkg.tar.xz 1.xd3

Let's call it "backward" xdelta. Yes, I'm aware that '1.xd3' isn't a very informative name.
This would make sense to do on-demand: the user specifies which version they have and which they need (the closer the two are the better), next they wait patiently for ARM to create the needed "backward" xdelta for download. In some cases (kernel?) the delta may be too big and xz-compressed package will be a better choice.
Would ARM cache the created xdeltas?

That would mean that ARM has to be up-to-date to have the recent packages for "backward" deltas, but as ARM has testing and community-testing repos mirrored that shouldn't be a problem.

This approach would mean that the user downloads one file per package - either a small xdelta or the regular package. Storage needs would stay as they are or go a bit up if ARM keeps created xdeltas. If the user downloads only xdeltas, it would be a quick download followed by a bit longer recreating the needed package + conserving the bandwidth. The option to create a pkg.tar.gz file instead of pkg.tar.xz a plus.

I've run a couple quick tests and it seems that forward and "backward" xdeltas take the same abmount of time to create and are about the same size so it doesn't really matter which ones we use from the time / space point of view.

Last edited by karol (2010-05-23 01:21:46)

Offline

#73 2010-05-23 09:14:31

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

I'll just lay out some points, let me know what needs clarifying and where my assumptions are wrong or could be improved.

ARM is not an archival service in the strictest sense. It serves one purpose, to aid in downgrade. It must appear to be just another mirror.

The ARM then, is literally just aa set mirrors that just happens to contain the same packages as the official repos, from certain dates. This means the user should be able to simply change the mirror(url) and do an upgrade, as a consequence of how pacman works, i.e the mirror preferred and if the mirror has an older version a downgrade is done. This works generally, however sometimes there new packages with complicated versioning and dependencies. This is where the helper tool comes in, it's not finalised how it'll work so I won't comment on it now.

We want a way to provide smaller downloads, this can be done with delta but it cannot be done server-side because it uses too much CPU, normally  this isn't a problem the problem is that it uses that high CPU for too long. Other effects of this is that -9 compress can never be used, I wouldn't recommend it in any other case either it's take a lot longer for negligable gain.

xz, uses a lot resources more than gz, the compression isn't going to change back to gz and recompression on our side makes no reasonable sense.

It must be up to date, it cannot lag behind because, though it appears to be a regular mirror it's more like a literal reflective mirror.
By this I mean is, we don't care about the upgrade scenario, A - B. We only care about the downgrade scenario, B - A.
If there is another repo that provide suitable deltas I am willing to copy them, but I don't see how this helps, unless I totally misread that link.

In our case, we have the newest pkg locally because it was just downloaded to do the upgrade, what we don't is the old pkg otherwise we wouldn't need to download anything, and since we just upgraded we also don't have the old files so we reconstruct a package, yes it does make sense if we had. We could hardlink the files before upgrade but this requires more space on the users machine and in that we wouldn't even the ARM's aid. I thought of that idea before but I think it'd be better to just use LVM or something designed for that sorta job.

As to the -testing repos, yes they are mirrored but that doesn't solve the problem for the users most likely to use the ARM.
Most breakages happen in -testing. I know I see/hear the cries when a libjpeg is rebuilt(I've always been lucky).

If my assumptions about the main users are wrong please correct me.

With that said the use cases and assumptions are:

* full downgrade after upgrade
    pacman can mostly handle this and only downgrades over a long period have issues which isn't exactly in line with the Arch way, so it's not a high priority.

* single package downgrade
    there is a script or two that helps with it can even be done by pointing to a url

I can't think of any other user cases. The former is a requirement of [testing] users.
The latter mostly [community-testing] and regular repo users.

Creating deltas don't actually save ay bandwidth, and I have unlimited badwidth and space so that's not a problem anyway.
We can pretty much be sure that no packge will be deleted before purging. If on Purging Day someone wants to create deltas for real archival purpose then fine with me, the packages won't be deleted immediately anyway. The dated repo is first removed but the packages remain for a short while, maybe a week or 2 in the master repo at arm.konnichi.com/[repo].

Arch is rolling release so it can be reasonably expected that everyone i to date, if update is left for months then maybe another distro or a reinstall is better because updating with such large gaps is almost certainly going to result in a broken system based on how much would have change during that time.

With that said, I think it's reasonable that retention is last till after majour release of KDE and Gnome since there are the biggest projects. So dated repos can stay 8 roughly 8 months to give people time to upgrade and then upgrade the the  bugfix .1 release then after that dated repos are gone and shortly the actual packages are removed or tranferred elsewhere or something.

--------------------------------------------------------------------------------

how does this backward delta thing work. I'm not sure I understand it because if we have deltas for A - B, A - C, etc. does it work for d - C. where d is A - D. I'm assuming we need the old package to resolve forward deltas, but our case requires backward deltas which would mean we need reclculate them based on the new package which we can assume the user has. Which is never going happen because we can't do the delta server-side, you're welcome to attempt to keep that up to date though. Since no packages are going to be deleted, deltas can added without distruption so there would be no for the deltas always be to up to date.

--
today's sync was about 300mb

Offline

#74 2010-05-23 14:35:07

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Project ARM :: Arch Rollback Machine

> today's sync was about 300mb
I'm not 100% sure what does 'today' mean, but I downloaded about 150MB.

> If there is another repo that provide suitable deltas I am willing to copy them
Deltas are flexible, they can work in many ways, so let's "imagine all the people ..." ;-)
Let's imagine there's a service that provides deltas *for upgrade*. When a new version of package 'foo' arrives, they create the delta and upload it to their server. Let's assume the deltas follow the pattern A-B, B-C, C-D etc. Let's assume merging deltas works, so you can create version D of package 'foo' by grabbing version A of that package and applying the above-mentioned deltas in one go.
They don't keep the deltas for long and that's where ARM steps in. The problem is, those deltas were meant for upgrading, so we have to keep version A of the package and all the subsequent deltas and create the packages on-demand - which can stress the cpu.

> xz, uses a lot resources more than gz, the compression isn't going to change back to gz
> and recompression on our side makes no reasonable sense.
xdelta has to do decompression - whether it gzip, bzip2 or xz. If you don't like that, you have to provide uncompressed tar archives. Recompression is optional - you don't have to compress with xz when you recreate a package and you don't have to compress the deltas.
I've installed delta3-3.0v-2-i686.pkg.tar.gz and it works well with xz-compressed archives.

> We only care about the downgrade scenario, B - A.
atm no one provides such deltas, so we have to do it ourselves, offline if needed.
First, I have to check whether merging works.

> Most breakages happen in -testing. I know I see/hear the cries when a libjpeg is rebuilt(I've always been lucky).
The idea that having testing repos helps was tongue-in-cheek :-)

> Arch is rolling release so it can be reasonably expected that everyone i to date
I don't see how Arch can influence upstream.devs. If they want to take time, 6 months package retention maybe a bit on the short side.
If an upgrade breaks sth (f.e. X freezes every half an hour), I file a bug report and use a simple script to install the previous versions of some packages from my pacman cache. I add them temporary to IgnorePkg and I'm all set. It may take *years* for this bug to get fixed and this has nothing to do with rolling release. I need ARM only for packages I freshly installed i.e. there was no 'foo' package on my system yesterday, today I got one, but this version is broken, so I need a previous one or even one before that -> see xdelta woes. It's not my fault that the last working release is 2yo, so maybe ARM should keep at least the last 4 versions of a package but this can be tough with bigger packages - dependency hell. ARM would remove all packages older than 6mo iff there are more than 4 versions of that package left after the purge.

> Creating deltas don't actually save ay bandwidth, and I have unlimited badwidth and space so that's not a problem anyway.
What if you are hit by a truck? What if you get bored and say you don't want to do it anymore in 2011? I have absolutely no idea how much $50/mo can buy me and I don't know how easy it is to find a "replacement" for the service you provide. Let me say it again: the way ARM works atm is great but it's still single point of failure and at least to me this means "don't trust it too much: it may be here tomorrow but it may be as well gone".

> how does this backward delta thing work.
If I understand correctly the user has some package, that may *or may not* be the current version. This doesn't bother us at all if we create reverse / backward deltas.
For upgrading you have version A and you want version B, C, D or E. For downgrading it's the other way round.
So no A-B, but B-A where A is the older package, B is the newer.
If the user has B and wants A, he downloads B-A delta, applies it locally on his computer and installs A.
If the user has C and wants A, he downloads C-B and B-A deltas, applies (merges) them locally on his computer and installs A.
If the user has D and wants B, he downloads D-C and C-B deltas, applies (merges) them locally on his computer and installs B.
This way we don't force the user to first pointlessly upgrade to the current version only to downgrade to the version he needs.
Creating the deltas is fast. Recreating a package takes more time, so it's better done on the user's computer.
ARM keeps only reverse deltas so it needs less space than keeping every version of the package.
The user downloads only deltas, so should be fast and light on the bandwidth, *but* a couple deltas may be actually bigger than one xz-compressed package.
If the user has E and wants A, he downloads E-D D-C C-B and B-A deltas, applies (merges) them locally on his computer and installs A. An xz-compresseed A is 3 MB, and each of the four deltas is 1 MB. This means the user had to download 4 MB instead of 3 MB and also had to recreate A from the merged deltas.
How often would it occur? Do we care? If we do: what are the most downloaded packages and what are the biggest packages? If 100 people have E and want A we lose 100 MB in bandwidth on the xdelta deal.

I have yet to test if this actually works or I just made it up ;-P

Offline

#75 2010-05-23 15:01:18

kumyco
Member
From: somewhere
Registered: 2008-06-23
Posts: 153
Website

Re: Project ARM :: Arch Rollback Machine

> today's sync was about 300mb
I'm not 100% sure what does 'today' mean, but I downloaded about 150MB.

that means 300mb was downloaded today in order to sync the mirror from yesterday, if that makes sense. thats just for syncing the mirrors, not for people downloading.

> We only care about the downgrade scenario, B - A.
atm no one provides such deltas, so we have to do it ourselves, offline if needed.
First, I have to check whether merging works.

check, the packages aren't going anywhere, so if a delta can be created for say gcc and the difference is small then they can be added. there's no point in storing deltas for the kernel when it only achieves 1mb or 2mb saving but there are many text heavy packages with lots of tranlations and documentation, these typically result in small deltas so keep those.

> Arch is rolling release so it can be reasonably expected that everyone i to date
I don't see how Arch can influence upstream.devs. If they want to take time, 6 months package retention maybe a bit on the short side.
If an upgrade breaks sth (f.e. X freezes every half an hour), I file a bug report and use a simple script to install the previous versions of some packages from my pacman cache. I add them temporary to IgnorePkg and I'm all set. It may take *years* for this bug to get fixed and this has nothing to do with rolling release. I need ARM only for packages I freshly installed i.e. there was no 'foo' package on my system yesterday, today I got one, but this version is broken, so I need a previous one or even one before that -> see xdelta woes. It's not my fault that the last working release is 2yo, so maybe ARM should keep at least the last 4 versions of a package but this can be tough with bigger packages - dependency hell. ARM would remove all packages older than 6mo iff there are more than 4 versions of that package left after the purge.

i think you're forgetting a piece of this story. purging doesn't necessarily mean a package is gone. purging means that the repo dated 6 months ago to the day is gone. most of the files still exist in the master, so they are still accessible, just not via /2009/11/01/core/os/i686/. upstream is irrelevant here i think, because we care about the state of the packages as they appear in Arch repos. not that they are x months old. to clarify, purging is based on the particular snapshot, that's what's removed some time later they are removed from the master *but* only if it's the final copy and with the referencing system anyone who wants to lag that far behind can make a reference to the packages they want and they will be kept a little longer. hell, they could even register an email address or something so they could be notified that packages they depend on may be removed soon.


> Creating deltas don't actually save ay bandwidth, and I have unlimited badwidth and space so that's not a problem anyway.
What if you are hit by a truck? What if you get bored and say you don't want to do it anymore in 2011? I have absolutely no idea how much $50/mo can buy me and I don't know how easy it is to find a "replacement" for the service you provide. Let me say it again: the way ARM works atm is great but it's still single point of failure and at least to me this means "don't trust it too much: it may be here tomorrow but it may be as well gone".

then someone can mirror it, maybe in the future arch-games will support arm i dunno. konnichi.com is in a somewhat special situation in that i have a contract that is worth a fortune. i think i pay maybe £100 a year or less, but that includes costs for other things like the domain names. but if you work out the monthly rate you prolly won't be getting *unlimited* anything for that price.

> how does this backward delta thing work.
If I understand correctly the user has some package, that may *or may not* be the current version. This doesn't bother us at all if we create reverse / backward deltas.
For upgrading you have version A and you want version B, C, D or E. For downgrading it's the other way round.
So no A-B, but B-A where A is the older package, B is the newer.
If the user has B and wants A, he downloads B-A delta, applies it locally on his computer and installs A.
If the user has C and wants A, he downloads C-B and B-A deltas, applies (merges) them locally on his computer and installs A.
If the user has D and wants B, he downloads D-C and C-B deltas, applies (merges) them locally on his computer and installs B.
This way we don't force the user to first pointlessly upgrade to the current version only to downgrade to the version he needs.
Creating the deltas is fast. Recreating a package takes more time, so it's better done on the user's computer.
ARM keeps only reverse deltas so it needs less space than keeping every version of the package.
The user downloads only deltas, so should be fast and light on the bandwidth, *but* a couple deltas may be actually bigger than one xz-compressed package.
If the user has E and wants A, he downloads E-D D-C C-B and B-A deltas, applies (merges) them locally on his computer and installs A. An xz-compresseed A is 3 MB, and each of the four deltas is 1 MB. This means the user had to download 4 MB instead of 3 MB and also had to recreate A from the merged deltas.
How often would it occur? Do we care? If we do: what are the most downloaded packages and what are the biggest packages? If 100 people have E and want A we lose 100 MB in bandwidth on the xdelta deal.
I have yet to test if this actually works or I just made it up ;-P

if it works i say do it and we can have a test or something.

Offline

Board footer

Powered by FluxBB