You are not logged in.

#1 2010-09-17 13:22:22

solstice
Member
Registered: 2006-10-27
Posts: 236
Website

pkgfile rewritten in python with new feature

hi.

I have made a rewrite of pkgfile from the pkgtools package in python to speed up search by using a sqlite db file.

$ pkgfile -h
Usage: pkgfile [ACTIONS] [OPTIONS] filename

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -b, --binaries        only show files in a {s}bin/ directory. Works with -s,
                        -l
  -c, --case-sensitive  make searches case sensitive
  -g, --glob            allow the use of * and ? as wildcards.
  -r, --regex           allow the use of regex in searches
  -L, --local           search only in the local pacman repository
  -v, --verbose         enable verbose output

  ACTIONS:
    -i, --info          provides information about the package owning a file
    -l, --list          list files of a given package; similar to "pacman -Ql"
    -s, --search        search which package owns a file
    -u, --update        update to the latest filelist. This requires write
                        permission to /var/cache/pkgtools/lists

A new feature is that it does not download files list if there is no new update, saving bandwidth.

It's a fork on github so you can try it there git://github.com/solsticedhiver/pkgtools.git
Or just change the _gitroot variable to git://github.com/solsticedhiver/pkgtools.git in the PKGBUILD for pkgtools-git from AUR

I exchanged with Daenyth which showed some interest for it. And made me correct my style code ;-). But we're still waiting for him to accept the pull request on github (to a new branch in its git repo ?), and to possibly merge it upstream later.

Tell me what you think about it.
If you find bugs, you could report it on github.

Offline

#2 2010-09-17 13:34:13

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: pkgfile rewritten in python with new feature

After just changing the _gitroot make complained:

==> Starting make...
rm: cannot remove `/home/karol/test/t1/pkgtools-git/src/pkgtools-build': No such file or directory
    Aborting...

'touch pkgtools-build' "fixed it" ;P

Core, extra, community and local repo were OK, but I got errors for other repos:

:: Checking [heftig] for files list ...
:: Downloading http://archlinux.ro/~heftig/repo/i686/heftig.files.tar.gz ...
:: Converting [heftig] file list ...
Error: Unable to open /tmp/tmpxe6Gjg.gz
:: Checking [xyne-any] for files list ...
:: Downloading http://xyne.archlinux.ca/repos/xyne-any/xyne-any.files.tar.gz ...
:: Converting [xyne-any] file list ...
Error: Unable to open /tmp/tmpt1QgB0.gz
:: Checking [unarch] for files list ...
:: Downloading http://us4all.info/unarch/arch/i686/unarch.files.tar.gz ...
:: Converting [unarch] file list ...
Error: Unable to open /tmp/tmpcbuuaW.gz
:: Checking [archlinuxfr] for files list ...
:: Downloading http://repo.archlinux.fr/i686/archlinuxfr.files.tar.gz ...
:: Converting [archlinuxfr] file list ...
Done
:: Checking [archstuff] for files list ...
:: Downloading http://archstuff.vs169092.vserver.de/i686/archstuff.files.tar.gz ...
:: Converting [archstuff] file list ...
Error: Unable to open /tmp/tmpycP5bc.gz
:: Checking [arch-games] for files list ...
:: Downloading http://pseudoform.org/arch-games/games/i686/arch-games.files.tar.gz ...
:: Converting [arch-games] file list ...
Done
:: Checking [dragonlord] for files list ...
:: Downloading http://repo.dragonlord.cz/arch/i686/dragonlord.files.tar.gz ...
:: Converting [dragonlord] file list ...
Error: Unable to open /tmp/tmpy5RGe8.gz

Can I just post here or do I have to report it on github?


This pkgfile implementations still takes about 30s to find what I'm looking for but does so w/o thrashing my disk - good work :-)

Last edited by karol (2010-09-17 13:55:22)

Offline

#3 2010-09-17 16:17:59

solstice
Member
Registered: 2006-10-27
Posts: 236
Website

Re: pkgfile rewritten in python with new feature

about the error in the PKGBUILD,  that was a suggestion to get a package easily. I have not checked. I was using the PKGBUILD-git from the github repo. and I can't fix it.

About the errors, you got with pkgfile.py:
It's simply that the files-list don't exist on the server repo ! They all give me 404 errors. So these repos don't provide a files-list.

I will change my code to better take care of those 404 errors and do not try to open a non-existent downloaded file.

Thanks for the report, Karol

Offline

#4 2010-09-17 17:23:40

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: pkgfile rewritten in python with new feature

[karol@black test]$ pkgfile -vsb sudo
core/sudo (1.7.4.p4-1) : /etc/pam.d/sudo
core/sudo (1.7.4.p4-1) : /usr/bin/sudo
community/logwatch (7.3.6-3) : /usr/share/logwatch/scripts/services/sudo
[karol@black test]$ pkgfile -vb sudo
core/sudo (1.7.4.p4-1) : /etc/pam.d/sudo
core/sudo (1.7.4.p4-1) : /usr/bin/sudo
community/logwatch (7.3.6-3) : /usr/share/logwatch/scripts/services/sudo

Why is /usr/share/logwatch/scripts/services/sudo included in binaries search ('-b' switch)?

Offline

#5 2010-09-17 19:41:53

solstice
Member
Registered: 2006-10-27
Posts: 236
Website

Re: pkgfile rewritten in python with new feature

oops I forgot to implement that. I just fix it.
Also I fix the first bug. But I need to rewrite the code that looks bad right now.

Offline

#6 2010-09-17 20:11:44

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: pkgfile rewritten in python with new feature

solstice wrote:

oops I forgot to implement that. I just fix it.
Also I fix the first bug. But I need to rewrite the code that looks bad right now.

OK, no need to rush :-)

If I know what I'm looking for, I still prefer

farm () { curl "http://arm.konnichi.com/find/?raw=1&fn=$1"; }

It's a service provided by the same guy who runs Arch Rollback Machine

[karol@black test]$ time pkgfile -bv sendmail
community/esmtp (1.2-1) : /usr/lib/sendmail
community/esmtp (1.2-1) : /usr/sbin/sendmail
community/exim (4.72-3) : /usr/lib/sendmail
community/exim (4.72-3) : /usr/sbin/sendmail
community/logwatch (7.3.6-3) : /usr/share/logwatch/scripts/services/sendmail
extra/courier-mta (0.62.1-6) : /usr/bin/sendmail
extra/courier-mta (0.62.1-6) : /usr/sbin/sendmail
extra/postfix (2.7.1-1) : /usr/sbin/sendmail
extra/quilt (0.48-2) : /usr/share/quilt/compat/sendmail
extra/ssmtp (2.64-2) : /usr/sbin/sendmail

real    0m20.262s
user    0m17.439s
sys    0m0.553s
[karol@black test]$ time farm /usr/sbin/sendmail
extra/courier-mta 0.62.1-6
extra/postfix 2.7.0-3
extra/ssmtp 2.64-2
community/esmtp 1.2-1
community/exim 4.71-5

real    0m1.401s
user    0m0.113s
sys    0m0.010s

Offline

#7 2010-09-17 20:43:44

solstice
Member
Registered: 2006-10-27
Posts: 236
Website

Re: pkgfile rewritten in python with new feature

$ time pkgfile -svb sendmail
community/esmtp (1.2-1) : /usr/sbin/sendmail
community/exim (4.72-3) : /usr/sbin/sendmail
extra/courier-mta (0.62.1-6) : /usr/bin/sendmail
extra/courier-mta (0.62.1-6) : /usr/sbin/sendmail
extra/postfix (2.7.1-1) : /usr/sbin/sendmail
extra/ssmtp (2.64-2) : /usr/sbin/sendmail

real    0m10.660s
user    0m10.413s
sys    0m0.237s

or you also could try pkgfile -sv usr/sbin/sendmail

What's your CPU ? it takes half the time you reported here.

Offline

#8 2010-09-18 09:59:35

solstice
Member
Registered: 2006-10-27
Posts: 236
Website

Re: pkgfile rewritten in python with new feature

Fixes pushed to github

Offline

#9 2010-09-18 21:50:32

Daenyth
Forum Fellow
From: Boston, MA
Registered: 2008-02-24
Posts: 1,244

Re: pkgfile rewritten in python with new feature

I should have this pulled to master soon. Hopefully by the end of the weekend, maybe by tonight.

Once I do it's going to sit in git for a while before release, since it's going to need some polish smile

Offline

#10 2010-09-28 09:34:36

solstice
Member
Registered: 2006-10-27
Posts: 236
Website

Re: pkgfile rewritten in python with new feature

$ time ./pkgfile.py -svb sendmail
extra/courier-mta (0.62.1-6) : /usr/sbin/sendmail
extra/courier-mta (0.62.1-6) : /usr/bin/sendmail
extra/postfix (2.7.1-1) : /usr/sbin/sendmail
extra/ssmtp (2.64-2) : /usr/sbin/sendmail
community/esmtp (1.2-1) : /usr/sbin/sendmail
community/exim (4.72-3) : /usr/sbin/sendmail

real    0m0.976s
user    0m0.883s
sys    0m0.087s

I think you can't do better than that. @Karol: is that fast enough ?

It uses the pkgfile2 python module written in C by Thomas Bächler with a few patches

This should land on my github repo soon.

Offline

#11 2010-09-28 11:41:33

Dieter@be
Forum Fellow
From: Belgium
Registered: 2006-11-05
Posts: 2,001
Website

Re: pkgfile rewritten in python with new feature

what's the idea behind the sqlite db file?  what do you intend to store in it, and where will you store it?


< Daenyth> and he works prolifically
4 8 15 16 23 42

Offline

#12 2010-09-28 12:49:58

solstice
Member
Registered: 2006-10-27
Posts: 236
Website

Re: pkgfile rewritten in python with new feature

the idea was to replace the database tree of files with something taking less space on disk to speed up I/O.
what's stored in the sqlite db is every package details or information: name, version, etc...

but the latest version do not use those sqlite db. Instead, the module read directly the extra.files.tar.gz.

Offline

#13 2010-09-28 12:58:40

Dieter@be
Forum Fellow
From: Belgium
Registered: 2006-11-05
Posts: 2,001
Website

Re: pkgfile rewritten in python with new feature

solstice wrote:

the idea was to replace the database tree of files with something taking less space on disk to speed up I/O.
what's stored in the sqlite db is every package details or information: name, version, etc...

but the latest version do not use those sqlite db. Instead, the module read directly the extra.files.tar.gz.

okay, that was my question.  If you can use extra.files.tar.gz, there's probably no need for an sqlite db
Dan told me you can generate database files that include a list of files for each package.  I guess that's exactly what extra.files.tar.gz is?  Where do you get it from?


< Daenyth> and he works prolifically
4 8 15 16 23 42

Offline

#14 2010-09-28 13:01:30

wonder
Developer
From: Bucharest, Romania
Registered: 2006-07-05
Posts: 5,941
Website

Re: pkgfile rewritten in python with new feature

right now the plan is to use pkgfile2 wrote in C which uses directly the tarball an can be used by python.

http://projects.archlinux.org/users/tho … file2.git/


Give what you have. To someone, it may be better than you dare to think.

Offline

#15 2010-09-28 16:19:33

Daenyth
Forum Fellow
From: Boston, MA
Registered: 2008-02-24
Posts: 1,244

Re: pkgfile rewritten in python with new feature

I've pulled Thomas' changes into the repo at the c-pkgfile branch. Solstice has been doing some of the hard work of integrating it. Once we get it nicely put together and polished, I'll make a new release with these changes.

Related to that, I need some help with distribution setup. Right now I have a Makefile and I don't know where I'm supposed to install the built pkgfile .so to. I was told I should switch to distutils, which I'm open to, but have little experience with. If someone can help me get that set up, I'd appreciate it greatly.

Offline

#16 2010-11-21 10:15:03

solstice
Member
Registered: 2006-10-27
Posts: 236
Website

Re: pkgfile rewritten in python with new feature

With the help of brain0/thomas that made a well done C python module to read directly *.files.tar.gz, pkgfile.py now is super fast.

Give it a try at https://github.com/solsticedhiver/pkgtools/tree/plan-A or use the PKGBUILD at http://paste.pocoo.org/show/294166/

You could even enable the CMD_SEARCH_ENABLED=1 feature in /etc/pkgtools/pkgfile.conf to get something like this:

$ nano

nano may be found in the following packages:
core/nano (2.2.5-1) : /usr/bin/nano

if nano is not installed and instead of getting a command not found error by bash

there may be some bug left. Let me know.

Edit: due to bug #21771, you will have to manually source /usr/share/pkgtools/pkgfile-hook.bash or source it in your ~/.bashrc to get the feature about the not found command hint.

Last edited by solstice (2010-11-21 11:07:36)

Offline

#17 2011-01-16 16:48:10

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: pkgfile rewritten in python with new feature

I've picked the PKGBUILD you mentioned on the ML and now I get much better results than in mid-September.

[karol@black test]$ time pkgfile -svb sendmail
community/courier-mta (0.65.2-1) : /usr/bin/sendmail
community/courier-mta (0.65.2-1) : /usr/sbin/sendmail
community/esmtp (1.2-3) : /usr/sbin/sendmail
community/exim (4.73-2) : /usr/sbin/sendmail
extra/postfix (2.7.2-1) : /usr/sbin/sendmail
extra/ssmtp (2.64-2) : /usr/sbin/sendmail

real    0m2.253s
user    0m1.707s
sys    0m0.113s

Now it's almost as fast as

curl "http://arm.konnichi.com/find/?raw=1&fn=$1"

and is more powerful and easier to use (no need to use absolute paths). For me, it's a keeper :-)
pkgfile has the benefit of using any files.tar.gz, not just the ones from the official repos. Sadly, many user repos don't provide one.

My cpu is a P4 2GHz + I'm using ext3.

Offline

Board footer

Powered by FluxBB