You are not logged in.

#1 2022-04-12 08:59:53

franckbret
Member
Registered: 2022-04-12
Posts: 6
Website

Best strategy to list all existing core packages?

Hi, I'm looking for the best way for listing all existing packages of the arch linux os and their related versions and artifacts.

This is related to the software heritage projects ( https://archive.softwareheritage.org/ ) which ambition is to collect, preserve, and share all software that is publicly available in source code form.

Basically I would like to get a list of package with versions, something like :

{name: pkgname, versions ["0.2.1", "0.2.0"]}

Additionaly if there is field like "last_update", "branch", "target", "archive_name", "archive_url", "checksum", etc, that's better!

I can see at least two way to go, the goal is to have the lightest and most resilient pattern to stay up to date.

A - Cloning the main repository once, extracting data from PKGBUILD files, and then regularly svn/git fetching to get the differential
B - Scrapping the json api with something like, GET https://archlinux.org/packages/search/j … ate&page=1 . For differential, browse til previous last_update date

What do you think? Is there another way to get up to date?

Also is there any guideline around scrapping and api comsuption frequency ?

Offline

#2 2022-04-12 12:18:15

lahwaacz
Wiki Admin
From: Czech Republic
Registered: 2012-05-29
Posts: 762

Re: Best strategy to list all existing core packages?

The best way is to extract information from the binary package manager database which is available on any Arch mirror. This is trivial if you installed Arch Linux (just run "pacman -Ss" or "pacman -Si pkgname"). There are also Python bindings which are somewhat more flexible.

Offline

#3 2022-04-12 13:20:00

franckbret
Member
Registered: 2022-04-12
Posts: 6
Website

Re: Best strategy to list all existing core packages?

At first glance, running pacman seems impossible to me, but maybe i'm wrong, will check with team if it's an option.
We usually work with api or dvcs repository, plain text listing, etc and try to make as low bandwith consumption and http call as we can. and low dependencies as we can as well.
btw the project is python based, so using the lib is maybe an option too.

If we go the pacman way, is it possible to list all new released version since a date for example ?

And sorry for having so much questions, but what looks bad from you point of view about the A solution(ie: svn/git repository), isn't it a trusted source?

Offline

#4 2022-04-12 13:36:19

lahwaacz
Wiki Admin
From: Czech Republic
Registered: 2012-05-29
Posts: 762

Re: Best strategy to list all existing core packages?

The largest binary database file, community.db.tar.gz, is just about 7 MB. I don't know how large the svn/git repository is, but my guess is that it's orders of magnitude larger. Also the binary database is structured, you wouldn't need to implement your own PKGBUILD parser for example.

franckbret wrote:

If we go the pacman way, is it possible to list all new released version since a date for example ?

pacman itself does not have such filter, but the last update timestamp is stored in the database, so you can use it in a custom filter.

Last edited by lahwaacz (2022-04-12 13:41:16)

Offline

#5 2022-04-12 13:39:53

a821
Member
Registered: 2012-10-31
Posts: 381

Re: Best strategy to list all existing core packages?

The pacman database is a tarball that contains flat files (with package information) that can be easily parsed in python. I recall that someone even wrote a python-library for that and posted it this forum.

Offline

#6 2022-04-12 13:45:24

Slithery
Administrator
From: Norfolk, UK
Registered: 2013-12-01
Posts: 5,776

Re: Best strategy to list all existing core packages?

franckbret wrote:

If we go the pacman way, is it possible to list all new released version since a date for example ?

The pacman DB only contains details for the currently available versions of a package, you'd have to keep your own history. There is no way to get the history from before you first downloaded the DB.


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan
Closing -- for deletion; Banning -- for muppetry. - jasonwryan

aur - dotfiles

Offline

#7 2022-04-12 13:54:05

franckbret
Member
Registered: 2022-04-12
Posts: 6
Website

Re: Best strategy to list all existing core packages?

Ah, thanks, looks like a good option, will explore this too.
I have the core repository on my disk and for comparison its 36M against 156kb for core.db.tar.gz

Offline

#8 2022-04-12 14:07:51

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,330
Website

Re: Best strategy to list all existing core packages?

I doubt you really want literally just the 'core' repository.  Arch linux has one repository actually named, "core", but it would never be used on it's own (it might be theoretically possible, but it'd not be supported, and that's just not how it's meant to be used).  In the more general meaning of the word, arch linux has three core repos that are the default or base for the distro: core, extra, and community.

For your "new releases since a given date" question, whether you can answer this with just the repo database depends on exactly what is meant by the question.  For example, if you want to know every incremental update to package X that has happened in the last 6 months, that will not be in the database file.  However, if you just want to know every package that has been updated within the past 6 months, that *is* in the database.  Specifically, the database will include the *current* version of every package along with the date it was built / added to the repo, so this can be used to filter out any packages that have not been updated since a given date (thus leaving those that have been).

Last edited by Trilby (2022-04-12 14:12:20)


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#9 2022-04-12 16:37:11

papajoke
Member
From: france
Registered: 2014-10-10
Posts: 40

Re: Best strategy to list all existing core packages?

a821 wrote:

can be easily parsed in python. I recall that someone even wrote a python-library for that and posted it this forum.

before, download *.db files from one mirror
https://bbs.archlinux.org/viewtopic.php … 4#p1969414

Last edited by papajoke (2022-04-12 16:49:23)


lts - zsh - Kde - Intel Core i3 - 6Go RAM - GeForce 405 video-nouveau

Offline

#10 2022-04-12 17:19:24

a821
Member
Registered: 2012-10-31
Posts: 381

Re: Best strategy to list all existing core packages?

papajoke wrote:
a821 wrote:

can be easily parsed in python. I recall that someone even wrote a python-library for that and posted it this forum.

before, download *.db files from one mirror
https://bbs.archlinux.org/viewtopic.php … 4#p1969414

Thanks, I was sure I had seen it posted here somewhere smile

Offline

#11 2022-04-12 19:36:34

Alad
Wiki Admin/IRC Op
From: Bagelstan
Registered: 2014-05-04
Posts: 2,418
Website

Re: Best strategy to list all existing core packages?

The Arch Linux Archive contains all released versions of a package since it was released, e.g. https://archive.archlinux.org/packages/a/abiword/

Apart from the checksums, you can get the information you need directly from there.


Mods are just community members who have the occasionally necessary option to move threads around and edit posts. -- Trilby

Offline

#12 2022-04-13 07:54:12

franckbret
Member
Registered: 2022-04-12
Posts: 6
Website

Re: Best strategy to list all existing core packages?

Trilby wrote:

I doubt you really want literally just the 'core' repository.  Arch linux has one repository actually named, "core", but it would never be used on it's own (it might be theoretically possible, but it'd not be supported, and that's just not how it's meant to be used).  In the more general meaning of the word, arch linux has three core repos that are the default or base for the distro: core, extra, and community.

Right, so Arch repos are core + extra + community

Trilby wrote:

For your "new releases since a given date" question, whether you can answer this with just the repo database depends on exactly what is meant by the question.  For example, if you want to know every incremental update to package X that has happened in the last 6 months, that will not be in the database file.  However, if you just want to know every package that has been updated within the past 6 months, that *is* in the database.  Specifically, the database will include the *current* version of every package along with the date it was built / added to the repo, so this can be used to filter out any packages that have not been updated since a given date (thus leaving those that have been).

Behind the scene, discovering listing of all package and existing version is a two step process. First get whole list through a database, api, dvcs, or whatever we can trust. This step is done once and set a last visit date.

Second step, regularly launch an update comparing what's new since last visit.
In a dvcs it's quite easy to get a list of changed file since a date.
With an api it depends, but generally speaking there are date fields.

If there is no way to get more than the last 6 months versions, well it's ok, we must start from somewhere :-)

Offline

#13 2022-04-13 07:54:50

franckbret
Member
Registered: 2022-04-12
Posts: 6
Website

Re: Best strategy to list all existing core packages?

Interesting, Thanks!

papajoke wrote:
a821 wrote:

can be easily parsed in python. I recall that someone even wrote a python-library for that and posted it this forum.

before, download *.db files from one mirror
https://bbs.archlinux.org/viewtopic.php … 4#p1969414

Offline

#14 2022-04-13 12:07:36

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,330
Website

Re: Best strategy to list all existing core packages?

Given your goals I suspect the svn/git repos of the build files would be best.  This will have all the history of every package in the given repo.  So you'd clone three svn (or git mirror) repos of core, extra, and community, then all the data you need is there.

The archlinux archive was mentioned, but you'd have to iterate through *every* individual package to get a list of dates/versions from different urls for each package.  For a single package, this is great - but for all packages it's be a bad idea.


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#15 2022-04-13 13:08:03

franckbret
Member
Registered: 2022-04-12
Posts: 6
Website

Re: Best strategy to list all existing core packages?

Good point. I will experiment with svn/git first. Will put my progress here.

Trilby wrote:

Given your goals I suspect the svn/git repos of the build files would be best.  This will have all the history of every package in the given repo.  So you'd clone three svn (or git mirror) repos of core, extra, and community, then all the data you need is there.

The archlinux archive was mentioned, but you'd have to iterate through *every* individual package to get a list of dates/versions from different urls for each package.  For a single package, this is great - but for all packages it's be a bad idea.

Offline

Board footer

Powered by FluxBB