You are not logged in.

#1 2007-05-22 12:42:12

cxzuk
Member
Registered: 2007-05-22
Posts: 4

Pacman Development

Hi all, smile

Im new to Arch Linux, and so far im loving it and would like to help with its development.

I have little knowledge of the in's and out's of ABS and Pacman but after alittle research on them I would like to discuss how the current implementations work, what the plains are for the future, and to give some ideas too.

Pacman,

I have read that the current system uses a file based package management, which *could* be a problem with scale and speed. I have read many comments on moving to a DB system, here are my comments;

A database system is a very good choice and should improve both scalability and speed issues. A database system could create opportunities such as more complex searching, "pacman" history, rollback features, etc. One problem mentioned on a database system is the "ease" for people to customize the database, create packages etc. With the FUSE http://fuse.sourceforge.net/ we can mount the database as a file system which is similar to its current layout allowing people to edit the database in the current methods (bash scripts etc)

I would recommend Sqlite as the database system as it has a large community, well documented and many other packages use Sqlite which can create opportunities for pacman to be used with other projects. However, i am unaware of any other database systems that can be compiled into the project (to save a dependency if needed), thats lightweight fast etc..

Once the information is in a database, We could infact take the whole pacman project "online", and provide a webservice to search the pacman database, this has the added advantage of being 100% uptodate, only returning information that your system requires, and altho a server would have additional cpu overheads of providing the search, I predict that because we would only be sending required information and not "everything" we would infact lower bandwidth. Hopefully this would make everything much more scalable.

However, if someone did want an offline cache of the database, I would recommend zsync (http://zsync.moria.org.uk/), I havent looked up its license yet, but Zsync as a utility is much better than rsync. zsync provides excellent features for single file sharing, requires nothing but a http server. zsync supports compression (into blocks with gzip) checksums, mirrors, etc.. There maybe a more efficient Database syncing tool which maybe more appropriate, comments welcome.

OK well i think thats it, I would love to know some more indepth details on how pacman works, such as language its written in, and current development of it, so please reply!

Mike

Offline

#2 2007-05-22 13:34:36

chicha
Member
From: France
Registered: 2007-04-20
Posts: 271

Re: Pacman Development

Hello cxzuk

If you want to know more about how pacman is coded http://archlinux.org/pacman/ is a good place.
I hope you won't be too much disapointed, but I am not sure you are taking the best way to improve pacman or talk about it.
Also have a look to this page : http://wiki.archlinux.org/index.php/The_Arch_Way
You will see that KISS (Keep It Simple, Stupid) is one of the major Arch feature.

In my opinion the best way would be :
You think a feature would be nice ? Code it and share it. If people like it you can be sure it will be integrated and that some people will help you improving your work !

Another way to become an Arch developer : contribute to the Wiki, the forum. Open/fix bugs, create/maintain some packages in the AUR (Arch User Repository). Help translating if English is not your native language ... This way you will become a "Trusted User" and then a "Developer".

Once people will know you better, the way you work, etc... you can be sure they will ask you to become a developer.
Have a look to the Wiki documentation : it is clearly explained what are the differences between the developers and the community.

Do not worry : you can do a lot for the Arch Community without being a developer smile
I hope you will find your place here,
Cheers (and welcome),

Chicha

Offline

#3 2007-05-22 15:37:20

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Pacman Development

Everyone always seems to have grand ideas for some ultimate pacman database system.  The simple fact is, I think it's a poor design choice for many reasons (and you've probably seen a handful of these arguments).

I will never add a database to pacman.  That doesn't mean pacman will never have a database, though. You are welcome to do it and send a patch.

As for pacman development, join the pacman-dev mailing list, we also have #archlinux-pacman on freenode.  And here is our git "integration" tree: http://projects.archlinux.org/git/gitwe … ;a=summary (Dan and Andrew both have online trees somewhere, but I'm lazy so mine has no web interface just yet).

Offline

#4 2007-05-22 16:06:55

wain
Member
From: France
Registered: 2005-05-01
Posts: 289
Website

Re: Pacman Development

chicha wrote:

If people like it you can be sure it will be integrated

@cxzuk: good luck smile

Offline

#5 2007-05-22 16:15:47

STiAT
Member
From: Vienna, Austria
Registered: 2004-12-23
Posts: 606

Re: Pacman Development

Dependency tracking with sqlite without cycle checks would bring quite hard times to your development.
This will be possible if you implement it with a database with the capability of stored procedures, and would require a more complex database.
If you step down, query package-by-package and afterwards chunk them together "layer-by-layer", you won't win that much speed. Try it out with sqlite, you probably got some more ideas than i previewsly had.
A simple join statment does not protect you from cycles in the dependencies of a package.

// STi


Ability is nothing without opportunity.

Offline

#6 2007-05-22 16:50:47

Husio
Member
From: Europe
Registered: 2005-12-04
Posts: 359
Website

Re: Pacman Development

Show us the code...

IMO pacman is good, but with database it would be faster.

Offline

#7 2007-05-22 17:46:14

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Pacman Development

phrakture wrote:

I will never add a database to pacman.  That doesn't mean pacman will never have a database, though. You are welcome to do it and send a patch.

Just to further this a bit.  If you patch pacman3's backend to support sqlite, and provide a clear upgrade path (i.e. conversion scripts and things of that nature) for local DBs, I am willing to include it.

Offline

#8 2007-05-22 19:31:32

bboozzoo
Member
From: Poland
Registered: 2006-08-01
Posts: 125

Re: Pacman Development

why sqlite? is bdb broken or something? rpm uses bdb as far as I remember and they don't seem to be lacking in speed

Offline

#9 2007-05-22 19:33:31

test1000
Member
Registered: 2005-04-03
Posts: 834

Re: Pacman Development

quote: I have read that the current system uses a file based package management, which *could* be a problem with scale and speed.

It's very very fast with ext3 and not so fast on first run(after a reboot) with XFS (for example, there are other slow fs with it too iv'e heard).

the cool thing about this though is that this doesn't really present a problem for scale and speed since linux have very good tools to remedy this already. You simply:

make a loopfile on your XFS, jfs,whatever and format it ext3. then you mount it under the pacman db dir and copy over your backup db files and your done.

problem solved. Now how's that for avoiding complexity? This should be included by default...

Also. I believe pacman still has lots of room for speedup?

other question though: If i run pacman from a live cd/rescue cd/arch cd(note: not always an option) would this present problems when running from a chroot? and on that live cd who's to say i have the right versions (or at all) of fuse and sqlite installed? will this make it harder to save my arch system should it get b0rked?


KISS = "It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience." - Albert Einstein

Offline

#10 2007-05-22 21:24:13

cxzuk
Member
Registered: 2007-05-22
Posts: 4

Re: Pacman Development

test1000 wrote:

quote: I have read that the current system uses a file based package management, which *could* be a problem with scale and speed.

It's very very fast with ext3 and not so fast on first run(after a reboot) with XFS (for example, there are other slow fs with it too iv'e heard).

the cool thing about this though is that this doesn't really present a problem for scale and speed since linux have very good tools to remedy this already. You simply:

make a loopfile on your XFS, jfs,whatever and format it ext3. then you mount it under the pacman db dir and copy over your backup db files and your done.

problem solved. Now how's that for avoiding complexity? This should be included by default...

Also. I believe pacman still has lots of room for speedup?

other question though: If i run pacman from a live cd/rescue cd/arch cd(note: not always an option) would this present problems when running from a chroot? and on that live cd who's to say i have the right versions (or at all) of fuse and sqlite installed? will this make it harder to save my arch system should it get b0rked?

Just to clarify, Sqlite can be compiled into pacman, meaning there would be no dependencies (no need to download sqlite). And fuse is part of the kernel (modprobe fuse), FUSE would also only be needed if you are manually editing the entries. So devs only.

Mike

Offline

#11 2007-05-22 21:45:02

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: Pacman Development

This has been talked about to death many many times over.
Search the forums... it is in there...

Search - pacman+sqlite


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#12 2007-05-22 22:03:51

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Pacman Development

cactus wrote:

This has been talked about to death many many times over.
Search the forums... it is in there...

Search - pacman+sqlite

sqlite is also crap... you could probably find me saying that about 40 times in that search 8)

Offline

#13 2007-05-23 03:58:30

iphitus
Forum Fellow
From: Melbourne, Australia
Registered: 2004-10-09
Posts: 4,927

Re: Pacman Development

and, with pacman3, the filesystem database really isnt that bad. It works well, and is quicker than many of the alternative package management systems.

Offline

#14 2007-05-23 04:09:21

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Pacman Development

iphitus wrote:

and, with pacman3, the filesystem database really isnt that bad. It works well, and is quicker than many of the alternative package management systems.

I have a plan to improve it even more, but I need to do some tests (in essence, cutting the number of files down by 66%)

Offline

#15 2007-05-23 04:12:40

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: Pacman Development

yeah. there is plenty of room to improve the existing FS backend....
such as concatenating the depends and desc files...that would remove probably a third of the stat calls right there....


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#16 2007-05-23 07:20:54

STiAT
Member
From: Vienna, Austria
Registered: 2004-12-23
Posts: 606

Re: Pacman Development

phrakture wrote:
cactus wrote:

This has been talked about to death many many times over.
Search the forums... it is in there...

Search - pacman+sqlite

sqlite is also crap... you could probably find me saying that about 40 times in that search 8)

I totally agree on this. I've driven tests (not with a fully-imported pacman database, but with a similar relational structure, with about 3000 imported datasets), and realized that it hasn't been a lot of faster than pacman on ext3, since i had to query the database often, and also i had to compare the values. So just the access (fopen/stat) could be slower, but i doubt that fopen/stat of some hundret files takes longer than actually importing a (not that small) sqlite db properly, holding more values than i actually need.

Cutting down the file number of the files pacman uses actually could increase the speed a lot once more.

// STi


Ability is nothing without opportunity.

Offline

#17 2007-05-23 11:56:21

cxzuk
Member
Registered: 2007-05-22
Posts: 4

Re: Pacman Development

STiAT wrote:
phrakture wrote:
cactus wrote:

This has been talked about to death many many times over.
Search the forums... it is in there...

Search - pacman+sqlite

sqlite is also crap... you could probably find me saying that about 40 times in that search 8)

I totally agree on this. I've driven tests (not with a fully-imported pacman database, but with a similar relational structure, with about 3000 imported datasets), and realized that it hasn't been a lot of faster than pacman on ext3, since i had to query the database often, and also i had to compare the values. So just the access (fopen/stat) could be slower, but i doubt that fopen/stat of some hundret files takes longer than actually importing a (not that small) sqlite db properly, holding more values than i actually need.

Cutting down the file number of the files pacman uses actually could increase the speed a lot once more.

// STi

heya

from what i understand so far, pacman currently search's /var/lib/pacman/[extra|community|current|local] for any directories which match the search criteria. Pacman then searches /var/lib/pacman/[extra|community|current|local]/*/desc for another match.

The kernel is infact caching every file apart from "depends". Maybe the kernel is better at caching than a database? I suspect that a database driven would not have a great deal of speed benefit at the current package size of Arch, I believe tho that a database driven setup would provide easier future features such as linking ABS and Pacman closer together, So you could be able to specify a specific configuration of a package (Enable/Disable), and pacman would automatically know that this package must be compiled for your needs. If however you decide to install an extra package which maybe usable in another program, pacman can ask you if you wish to recompile/upgrade.

e.g.

If say i install bmpx, which has these options..
aac alsa cdparanoia debug flac ffmpeg hal mad modplug musepack nls ofa ogg oss p2p python sid theora vorbis

If i specifically want only alsa, pacman would compile bmpx rather than use binaries, and would ignore all other dependencies it wants to install (such as ogg, hal, sid, ffmpeg, etc).

If then, at a later date, i install beagle which uses HAL, pacman could see i have a package compiled without hal support and can offer me the choice to recompile to include it.

It would hopefully work the other way round too, which is where it gets most interesting. I have many-a-time been forced to uninstall EVERYTHING (on rpm and deb based systems) because there is a new low-level library update (libc or something). This cascading effect has broken my system countless times, a mirror would fail or a package wouldnt install and its too late, its uninstalled everything already!

Has anyone else had this problem?

As this mainly happens with libraries, and as far as im aware you can have multiple versions of libraries, what would be far better is to install the new version of the library, keeping the old one if a program still requires it. Once all packages have been updated or recompiled to use the newer library, then the old one can be safely removed.

Mike

Offline

#18 2007-05-23 16:01:04

STiAT
Member
From: Vienna, Austria
Registered: 2004-12-23
Posts: 606

Re: Pacman Development

Let me get a bit more into detail.

Pacman can search files, but searching files / directories isn't that slow. Remember, that are not too many files / dirs (yeye, sizing and so on ... i fully aggree there with you).

The kernel caches, as you already know. I'm not sure about how it works with sqlite, normally, providing a database WITH caching functionality means that the database handles this itself, loads and restores on restart of the db. A database implementation also implements a own storage / cache algorithm (i think sqlite does as well). Since pacman isn't a program which is constantly running, and needs fast access on the data here and there, i don't see a real benefit (for the application pacman).
The difference will make permanently running programs, such as GUIs for ALPM, since they actually take the full benefit out of the loaded sqlite.

I agree with you that databases have a longer breath, especially when it comes to the fact of saving information about installed packages, when you customize the packages. But hey - customizing packages? If i want gentoo, i set up gentoo big_smile.

I think it's worth a try at all. As i mentioned, i've built test-scripts with some thousand entries with sqlite with recoginition of cycles - but i have not had any speed benefit out of it. I'm not perfect, not even closey when it comes to sqlite, as well as i don't know any internals and have no clue about performance boosting there (my usual db backend is oracle *sigh*)

I totally agree with you, that a database interface could be the way to a "easier" alpm or backend in general. It also could speed up development. It could, but i don't know how much time the devs actually spend with their data storage procedures. I also don't have the knowledge about pacman / alpm like Aaron or Dan.

// STi


Ability is nothing without opportunity.

Offline

#19 2007-05-23 16:05:36

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Pacman Development

cxzuk wrote:

would provide easier future features such as linking ABS and Pacman closer together, So you could be able to specify a specific configuration of a package (Enable/Disable), and pacman would automatically know that this package must be compiled for your needs. If however you decide to install an extra package which maybe usable in another program, pacman can ask you if you wish to recompile/upgrade.

srcpac already does this.  It is a wrapper around pacman to handle things like this (custom compiles).  I, however, usually just recompile the package, -U it, and add the package to IgnorePkg, in which case I use the "ignoring package upgrade" output to know when to rebuild.

Offline

Board footer

Powered by FluxBB