You are not logged in.

#1 2018-12-26 21:01:08

nkukard
Member
Registered: 2018-12-26
Posts: 24

Delta repositories and fate of archdelta.net

Hi guys,

I've been doing a bit of work on repository deltas over the past week, mostly seeing if its viable to run a repository (or delta only repository?) which includes deltas for a number of colleagues of mine at universities in 3rd world countries with slow internet connections. (especially for times when updating things like texlive)

What I was wondering is what happened to archdelta.net, I know it was for i686 only, but I don't see anything as to its fate in the thread here https://bbs.archlinux.org/viewtopic.php?id=92085 (apart from the people going mia). For official it looks like a backburner topic https://bugs.archlinux.org/task/18590 .

It may be a pretty niche userbase looking for something like this, and maybe something already exists?  if not I wouldn't mind throwing something up.

I already have a working set of bash and perl scripts for generating the delta repo and can throw everything in git for anyone interested in contributing or testing? I can maybe work on using the official build scripts in future and making some surgical patches to add support for creation of detas.

-N


irc.libera.chat ~ nkukard
Discord ~ discord.gg/linuxchat ~ OpenSourceCoder

Offline

#2 2018-12-26 22:29:43

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

I guess the MIA nature of the previous project explains rather a lot. smile

It would be interesting to get it into the official repositories, though. The question remains, of course, implementing it.... Our repositories are maintained using https://git.archlinux.org/dbscripts.git/about/ which is all written in bash. Implementation notes:

  • repo-add can be told to automatically create deltas when adding a package to the database. This costs some processing time in order to run /usr/bin/pkgdelta, especially on larger packages.

  • repo-remove will automatically remove the deltas from the database when a package is completely deleted. EDIT: It doesn't, but only because there is a regression bug that broke this.

  • We would need to implement cleaning up delta files that we no longer want, that might be time-based or number-of-versions based.

I'm not sure what the best solution is w.r.t. the time it takes to do the repo-add. I think the servers would have enough space to handle them, however...

Last edited by eschwartz (2018-12-27 17:05:20)


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#3 2018-12-27 03:18:26

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: Delta repositories and fate of archdelta.net

eschwartz wrote:

[*]repo-remove will automatically remove the deltas from the database when a package is completely deleted.[/*]

Are you sure?

https://bugs.archlinux.org/task/53041

Offline

#4 2018-12-27 03:29:47

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

I was looking at the repo-add script's remove() function and I saw this:

        if db_remove_entry "$pkgname"; then
                rm -f "$tmpdir/tree/$pkgname.deltas"

From cursory inspection it seemed like it was *supposed* to do so. hmm

But double-checking, nope...

EDIT: Found the bug.

Last edited by eschwartz (2018-12-27 03:44:25)


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#5 2018-12-27 03:59:55

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

Fixed https://lists.archlinux.org/pipermail/p … 23030.html

BTW this is now three patches I've submitted to handle the results of commit https://git.archlinux.org/pacman.git/co … 55df3c2d8a

big_smile big_smile

Last edited by eschwartz (2018-12-27 04:01:50)


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#6 2018-12-27 08:26:36

nkukard
Member
Registered: 2018-12-26
Posts: 24

Re: Delta repositories and fate of archdelta.net

eschwartz wrote:

Fixed https://lists.archlinux.org/pipermail/p … 23030.html

BTW this is now three patches I've submitted to handle the results of commit https://git.archlinux.org/pacman.git/co … 55df3c2d8a

big_smile big_smile

Thanks for that @eschwartz!  You actually fixed the issue I was trying to track down that I asked about on IRC yesterday!


irc.libera.chat ~ nkukard
Discord ~ discord.gg/linuxchat ~ OpenSourceCoder

Offline

#7 2018-12-27 16:52:55

nkukard
Member
Registered: 2018-12-26
Posts: 24

Re: Delta repositories and fate of archdelta.net

I've setup a vm to test with, I'll start copying the updated packages into a local dir for testing and pull the dbscripts to see what I can do.

It would probably be best to add an option to repo-add, like maybe  --delta-versions X  to keep X number of versions around, and maybe one more --delta-expiry-days D  to keep the delta's along for a certain number of days?

The delta cleanup would probably only run when repo-add operates on that package, right?  or should there be a cleanup at the end of repo-add that looks at the delta's to see if they should be removed?

Thinking about multiple deltas now though ... from my understanding, pacman will take the current version on the local system, and the version its upgrading to and check if there is a delta available? what would be the point of keeping deltas for versions A -> B -> C -> D if the user is upgrading from A -> D? or do we want to generate delta's for A -> D, B -> D and C -> D too?


irc.libera.chat ~ nkukard
Discord ~ discord.gg/linuxchat ~ OpenSourceCoder

Offline

#8 2018-12-27 17:01:35

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

Well, repo-add will generate a pkgdelta for the current db package -> new db package.

But let's say I update a package twice in two days (A -> B -> C), and you only update once every three days and want deltas -- the only delta path is to download both deltas and use A + delta-A-B to generate B, then use B + delta-B-C to generate C. The server cannot generate a delta-A-C, since A was deleted by the server cleanup scripts which run every three hours and remove packages that are no longer required by the database.

That being said, it is possible to run /usr/bin/pkgdelta yourself, and repo-add foo.db.tar.gz *.delta to add a delta file to the database. This is an alternative to using repo-add -d to autogenerate deltas (since as I said the autogenerated deltas only occur between the current package and the new one).

Last edited by eschwartz (2018-12-27 17:03:23)


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#9 2018-12-27 17:30:19

nkukard
Member
Registered: 2018-12-26
Posts: 24

Re: Delta repositories and fate of archdelta.net

eschwartz wrote:

Well, repo-add will generate a pkgdelta for the current db package -> new db package.

But let's say I update a package twice in two days (A -> B -> C), and you only update once every three days and want deltas -- the only delta path is to download both deltas and use A + delta-A-B to generate B, then use B + delta-B-C to generate C. The server cannot generate a delta-A-C, since A was deleted by the server cleanup scripts which run every three hours and remove packages that are no longer required by the database.

That being said, it is possible to run /usr/bin/pkgdelta yourself, and repo-add foo.db.tar.gz *.delta to add a delta file to the database. This is an alternative to using repo-add -d to autogenerate deltas (since as I said the autogenerated deltas only occur between the current package and the new one).

Thats kinda what I was thinking ... I assume there is a master server that is pulling packages into the repo, what if it kept 3 or so previous versions of the package in an 'attic' directory (not mirrored) to use to generate deltas from previous versions?  Just a thought. On the negative side it would obviously increase repo size. Probably not worth it?


irc.libera.chat ~ nkukard
Discord ~ discord.gg/linuxchat ~ OpenSourceCoder

Offline

#10 2018-12-27 22:37:37

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: Delta repositories and fate of archdelta.net

nkukard wrote:

or should there be a cleanup at the end of repo-add that looks at the delta's to see if they should be removed?

Such as the "cleanupdelta" script?

Offline

#11 2018-12-27 23:08:47

nkukard
Member
Registered: 2018-12-26
Posts: 24

Re: Delta repositories and fate of archdelta.net

Allan wrote:
nkukard wrote:

or should there be a cleanup at the end of repo-add that looks at the delta's to see if they should be removed?

Such as the "cleanupdelta" script?

Pretty much (I think?), I'm not sure we need another script though, doing it within the function in repo-add maybe less lines of code.

We can easily grab the list of deltas from the delta file in the repo db, which has both the old and new names of the packages. Keeping a certain number of them with an option to specify how many should be pretty easy.

Now that we have a list of previous package filenames, it is pretty easy to generate deltas for N version backwards to the current version.

The question is however, do we want to generate deltas for the last N versions to the latest version too? Increase in mirror size (for each delta), plus the system(s) that is/are running dbscripts would need to have access to the previous package versions.


irc.libera.chat ~ nkukard
Discord ~ discord.gg/linuxchat ~ OpenSourceCoder

Offline

#12 2018-12-27 23:45:46

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

Allan wrote:

Such as the "cleanupdelta" script?

Which is not a script, it is a C executable with inscrutable help text that doesn't seem to do anything AFAICT. :)

Mind explaining how to use it?


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#13 2018-12-28 02:26:42

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: Delta repositories and fate of archdelta.net

eschwartz wrote:
Allan wrote:

Such as the "cleanupdelta" script?

...
Mind explaining how to use it?

It basically calls alpm_pkg_unused_deltas for each package in a repository and prints the result.

Some example invocations:

cleanupdelta core extra
cleanupdelta extra
cleanupdelta -b /var/lib/pacman core

| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#14 2018-12-28 02:51:31

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

And I cannot get it to print unused deltas that I know exist in my custom repo...

The usage message is vastly confusing and attempts to figure out what -b does (it implies that to specify repositories to check, you must use "-b") have inevitably met with the help text... if it does expect the pacman.conf DBPath to initialize an alpm session then that makes sense, but I'm still left with "it prints nothing for me".


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#15 2018-12-28 05:53:36

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: Delta repositories and fate of archdelta.net

Are they unused because the package was removed - as in the bug mentioned above?   Those aren't looked at because it loops over packages within the database.

Offline

#16 2018-12-28 06:00:46

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

Nope:

$ bsdtar -tvf custom.db.tar.gz | grep julia
drwxr-xr-x  0 eschwartz eschwartz   0 Dec 27 21:34 julia-docs-2:1.0.3-2/
-rw-r--r--  0 eschwartz eschwartz 174 Dec 27 21:34 julia-docs-2:1.0.3-2/deltas
-rw-r--r--  0 eschwartz eschwartz 502 Dec 27 18:43 julia-docs-2:1.0.3-2/desc
$ bsdtar -xOf custom.db.tar.gz julia-docs-2:1.0.3-2/deltas
%DELTAS%
julia-docs-2:1.0.3-2_to_2:0.6.2-6-x86_64.delta 8ebc8a55fad0fd24f0e43fb2af4b097e 456156 julia-docs-2:1.0.3-2-x86_64.pkg.tar.xz julia-docs-2:0.6.2-6-x86_64.pkg.tar.xz
julia-docs-2:0.6.2-6_to_2:1.0.3-1-x86_64.delta 4c43982999af5fced09c3dcca384a1c5 654120 julia-docs-2:0.6.2-6-x86_64.pkg.tar.xz julia-docs-2:1.0.3-1-x86_64.pkg.tar.xz
$ cleanupdelta custom
$

Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#17 2018-12-28 08:38:23

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: Delta repositories and fate of archdelta.net

Who created those?   They are deltas going the wrong direction.   I'd bet libalpm is just erroring on them

Offline

#18 2018-12-28 09:59:40

nkukard
Member
Registered: 2018-12-26
Posts: 24

Re: Delta repositories and fate of archdelta.net

Nonetheless, we cannot just keep N deltas around from A -> B, when C comes along the only delta that is really usabe is A -> C and B -> C. When D comes along A -> D, B -> D, C -> D.

I have POC set of changes working now with repo-add which keeps deltas around from the Nth version to the lastest and automatically removes the ones no longer applicable. I'm parsing the $pkgentry/deltas file and with a few lines of code its easy to determine what we need and what we don't.


irc.libera.chat ~ nkukard
Discord ~ discord.gg/linuxchat ~ OpenSourceCoder

Offline

#19 2018-12-28 12:03:42

apg
Developer
Registered: 2012-11-10
Posts: 211

Re: Delta repositories and fate of archdelta.net

nkukard wrote:

Nonetheless, we cannot just keep N deltas around from A -> B, when C comes along the only delta that is really usabe is A -> C and B -> C. When D comes along A -> D, B -> D, C -> D.

I have POC set of changes working now with repo-add which keeps deltas around from the Nth version to the lastest and automatically removes the ones no longer applicable. I'm parsing the $pkgentry/deltas file and with a few lines of code its easy to determine what we need and what we don't.

Why does C make A -> B unusable?  Use it to update A to B then update that to C.

Offline

#20 2018-12-28 12:20:02

nkukard
Member
Registered: 2018-12-26
Posts: 24

Re: Delta repositories and fate of archdelta.net

apg wrote:

Why does C make A -> B unusable?  Use it to update A to B then update that to C.

1. If you have A -> B -> C , it is likely that the combined delta size of A->B and B->C that you would need to download would exceed the size of C. The delta of A->C would likely be smaller.

2. B is also removed from the mirrors when C comes along? how would we know that B existed? maybe I'm missing something?

3. I cannot find anywhere in the pacman code that it uses deltas A->B when upgrading to from A->C, just A->C ... maybe I'm wrong and missed it?


irc.libera.chat ~ nkukard
Discord ~ discord.gg/linuxchat ~ OpenSourceCoder

Offline

#21 2018-12-28 12:31:20

apg
Developer
Registered: 2012-11-10
Posts: 211

Re: Delta repositories and fate of archdelta.net

nkukard wrote:

2. B is also removed from the mirrors when C comes along? how would we know that B existed? maybe I'm missing something?

Because we have deltas for B.

nkukard wrote:

3. I cannot find anywhere in the pacman code that it uses deltas A->B when upgrading to from A->C, just A->C ... maybe I'm wrong and missed it?

I don't know where you see in alpm that it only uses a single delta.  The delta operations use lists.

Offline

#22 2018-12-28 12:44:15

nkukard
Member
Registered: 2018-12-26
Posts: 24

Re: Delta repositories and fate of archdelta.net

apg wrote:
nkukard wrote:

2. B is also removed from the mirrors when C comes along? how would we know that B existed? maybe I'm missing something?

Because we have deltas for B.

nkukard wrote:

3. I cannot find anywhere in the pacman code that it uses deltas A->B when upgrading to from A->C, just A->C ... maybe I'm wrong and missed it?

I don't know where you see in alpm that it only uses a single delta.  The delta operations use lists.

Ok, you're right, sorry ... I don't know how I missed that.

So do you only want  deltas generated from the current version to one version back? or is N old versions to current version a better idea? or both?

I can add code for the first case, but I already have code for the second working.


irc.libera.chat ~ nkukard
Discord ~ discord.gg/linuxchat ~ OpenSourceCoder

Offline

#23 2018-12-28 18:26:32

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

Allan wrote:

Who created those?   They are deltas going the wrong direction.   I'd bet libalpm is just erroring on them

Why should it be erroring on "them"? The second delta is going in the right direction, the first delta was created with repo-add -d custom.db.tar.gz ${downgraded-package}.pkg.tar.xz because I wanted to add the old version and then add the new version to generate the delta going in the right direction, and I was too lazy to remove the -d flag. big_smile

At any rate, I removed the first delta.

$ bsdtar -tvf /var/lib/pacman/sync/custom.db | grep julia
drwxr-xr-x  0 eschwartz eschwartz   0 Dec 28 13:15 julia-docs-2:1.0.3-2/
-rw-r--r--  0 eschwartz eschwartz 174 Dec 28 13:15 julia-docs-2:1.0.3-2/deltas
-rw-r--r--  0 eschwartz eschwartz 502 Dec 27 18:43 julia-docs-2:1.0.3-2/desc
$ bsdtar -xOf /var/lib/pacman/sync/custom.db julia-docs-2:1.0.3-2/deltas
%DELTAS%
julia-docs-2:0.6.2-6_to_2:1.0.3-1-x86_64.delta 4c43982999af5fced09c3dcca384a1c5 654120 julia-docs-2:0.6.2-6-x86_64.pkg.tar.xz julia-docs-2:1.0.3-1-x86_64.pkg.tar.xz
$ cleanupdelta custom
$

This is definitely going in the right direction, and it's definitely operating on the copy of the db which is in pacman's syncdir, and cleanupdelta is definitely not listing this delta as unneeded due to offering a migration path from a previous pkgver to the current pkgver with pkgrel 1, whereas the repo only offers pkgrel 2 with no second delta to bridge pkgrel 1 and pkgrel 2.


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#24 2018-12-28 18:28:35

apg
Developer
Registered: 2012-11-10
Posts: 211

Re: Delta repositories and fate of archdelta.net

"unused" in this context means "too big relative to the actual package".

Offline

#25 2018-12-28 18:47:29

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Delta repositories and fate of archdelta.net

Ah, so cryptic binaries with no documentation even in the commit message adding it, strikes again...

This is entirely unintuitive, and furthermore I don't know when I would use it since we know at the time of generating it, how big it is relative to the package.


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

Board footer

Powered by FluxBB