Idea for speeding up Pacman: Indexes

LeafStorm · 2011-03-22 02:01:43

Searching through from pacman (i.e. pacman -Ss) is kind of slow. (My laptop is pretty old, but still.) I read some threads a few years back about the possibility of moving pacman to a database, but that never really went anywhere. Possibly because it would be a lot of work to do, but also because the folders-and-textfiles is very simple and Archlike, and having a database would complicate the system.

Still, for operations involving searching, opening and closing every package file in sequence creates quite a bit of overhead. RAM caching helps, but the first run of pacman on a session can be quite slow. My idea is to add "index" files for use in search operations. They would be formatted like this:

bash<TAB>4.2.008-1<TAB>The GNU Bourne Again Shell<TAB>core<TAB>base
bzip2<TAB>1.0.6-1<TAB>A high-quality data compression program<TAB>core<TAB>base
gdbm<TAB>1.8.3-8<TAB>GNU database library<TAB>core<TAB><NEWLINE>

Or, in other words, it's a tab-separated values file. Two of them are maintained, in /var/lib/pacman/index (or /var/cache/pacman/index), named local-packages and sync-packages. local-packages has the format "name, version, description, 0/1 (depending on whether it was installed explicitly or as a dependency), groups (comma-separated)". sync-packages has the same format, but instead of 0/1 it has the package's repository.

Under this scheme, for a simple -Ss or -Qs operation, only the index files would need to be opened. The same applies for -Qe, -Qd, -Qg, and -Sg. local-packages would be regenerated whenever a package was installed, uninstalled, etc. and sync-packages would be regenerated after a -Sy operation. A package-files database could also be used for -Qo. (In its case, the format would be even simpler: filename, tab, package, newline.)

This would help with a lot of the problems that people have suggested databases for (i.e. search performance), while not actually adding a database and keeping it simple. What do you think?

ngoonee · 2011-03-22 02:07:23

I think a bit of searching (both through the forum and through arch-dev-public/pacman-dev) would have told you that pacman-3.5 is already speeded-up using a database

LeafStorm · 2011-03-22 02:15:18

ngoonee wrote:

I think a bit of searching (both through the forum and through arch-dev-public/pacman-dev) would have told you that pacman-3.5 is already speeded-up using a database

Wait, really?

(searches the Internet)

Hey, you're right! I should probably subscribe to arch-dev-public.

Well, it's not exactly a database. But the fact that it reads directly from the TAR should definitely help sync searches go faster, with less fragmentation and the like. That's pretty cool. (I still think indexes are a good idea, though. )

ngoonee · 2011-03-22 02:58:19

LeafStorm wrote:

Hey, you're right! I should probably subscribe to arch-dev-public.

Very good idea

Well, it's not exactly a database. But the fact that it reads directly from the TAR should definitely help sync searches go faster, with less fragmentation and the like. That's pretty cool. (I still think indexes are a good idea, though. )

Well, most people are more concerned with the speed than the implementation. All external access should go through pacman and/or libalpm in any case

Allan · 2011-03-22 03:34:13

Just to clarify, with pacman-3.5 the sync databases are now a single tar file per repo. The local database had some of its files merged, but is still has a "desc" and a "files" file per package (and potentially a changelog and install file).

So the local database format could be improved and everything from indexes to full blown databases has been suggested. However there has been no real agreement (or even decent discussion...) on the alternatives and what their advantages/disadvantages are.

I sort of looked at something similar to what you suggest for file lists. Having a single file with the complete list would be faster than reading lots of small files for a -Qo operation (although -Ql only needs to read one of these files currently...). But updating probably requires rewriting the entire file to filter out old entries and add new.

One option is to put the local database in a tarball much like the sync data is now. The issue is that there is not an easy way to remove/update files within that tarball from libarchive, but I believe that can be coded around. Then there is the question of whether the whole database goes in the tarball, or whether we break it up in parts... e.g the "desc" files in one tarball and the "files" files in another. "changelog" files could go with the "desc" ones, but "install" files probably need to stay separate.

Anyway, I agree there is something further to be done here and it is something I am interested in coding... It just requires a bit of discussion to figure out what is a good way to do this. The pacman-dev mailing list is the place for this to take place.

Arch Linux

#1 2011-03-22 02:01:43

Idea for speeding up Pacman: Indexes

#2 2011-03-22 02:07:23

Re: Idea for speeding up Pacman: Indexes

#3 2011-03-22 02:15:18

Re: Idea for speeding up Pacman: Indexes

#4 2011-03-22 02:58:19

Re: Idea for speeding up Pacman: Indexes

#5 2011-03-22 03:34:13

Re: Idea for speeding up Pacman: Indexes

Board footer