You are not logged in.
hi.
I have made a rewrite of pkgfile from the pkgtools package in python to speed up search by using a sqlite db file.
$ pkgfile -h
Usage: pkgfile [ACTIONS] [OPTIONS] filename
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-b, --binaries only show files in a {s}bin/ directory. Works with -s,
-l
-c, --case-sensitive make searches case sensitive
-g, --glob allow the use of * and ? as wildcards.
-r, --regex allow the use of regex in searches
-L, --local search only in the local pacman repository
-v, --verbose enable verbose output
ACTIONS:
-i, --info provides information about the package owning a file
-l, --list list files of a given package; similar to "pacman -Ql"
-s, --search search which package owns a file
-u, --update update to the latest filelist. This requires write
permission to /var/cache/pkgtools/lists
A new feature is that it does not download files list if there is no new update, saving bandwidth.
It's a fork on github so you can try it there git://github.com/solsticedhiver/pkgtools.git
Or just change the _gitroot variable to git://github.com/solsticedhiver/pkgtools.git in the PKGBUILD for pkgtools-git from AUR
I exchanged with Daenyth which showed some interest for it. And made me correct my style code ;-). But we're still waiting for him to accept the pull request on github (to a new branch in its git repo ?), and to possibly merge it upstream later.
Tell me what you think about it.
If you find bugs, you could report it on github.
Offline
After just changing the _gitroot make complained:
==> Starting make...
rm: cannot remove `/home/karol/test/t1/pkgtools-git/src/pkgtools-build': No such file or directory
Aborting...
'touch pkgtools-build' "fixed it" ;P
Core, extra, community and local repo were OK, but I got errors for other repos:
:: Checking [heftig] for files list ...
:: Downloading http://archlinux.ro/~heftig/repo/i686/heftig.files.tar.gz ...
:: Converting [heftig] file list ...
Error: Unable to open /tmp/tmpxe6Gjg.gz
:: Checking [xyne-any] for files list ...
:: Downloading http://xyne.archlinux.ca/repos/xyne-any/xyne-any.files.tar.gz ...
:: Converting [xyne-any] file list ...
Error: Unable to open /tmp/tmpt1QgB0.gz
:: Checking [unarch] for files list ...
:: Downloading http://us4all.info/unarch/arch/i686/unarch.files.tar.gz ...
:: Converting [unarch] file list ...
Error: Unable to open /tmp/tmpcbuuaW.gz
:: Checking [archlinuxfr] for files list ...
:: Downloading http://repo.archlinux.fr/i686/archlinuxfr.files.tar.gz ...
:: Converting [archlinuxfr] file list ...
Done
:: Checking [archstuff] for files list ...
:: Downloading http://archstuff.vs169092.vserver.de/i686/archstuff.files.tar.gz ...
:: Converting [archstuff] file list ...
Error: Unable to open /tmp/tmpycP5bc.gz
:: Checking [arch-games] for files list ...
:: Downloading http://pseudoform.org/arch-games/games/i686/arch-games.files.tar.gz ...
:: Converting [arch-games] file list ...
Done
:: Checking [dragonlord] for files list ...
:: Downloading http://repo.dragonlord.cz/arch/i686/dragonlord.files.tar.gz ...
:: Converting [dragonlord] file list ...
Error: Unable to open /tmp/tmpy5RGe8.gz
Can I just post here or do I have to report it on github?
This pkgfile implementations still takes about 30s to find what I'm looking for but does so w/o thrashing my disk - good work :-)
Last edited by karol (2010-09-17 13:55:22)
Offline
about the error in the PKGBUILD, that was a suggestion to get a package easily. I have not checked. I was using the PKGBUILD-git from the github repo. and I can't fix it.
About the errors, you got with pkgfile.py:
It's simply that the files-list don't exist on the server repo ! They all give me 404 errors. So these repos don't provide a files-list.
I will change my code to better take care of those 404 errors and do not try to open a non-existent downloaded file.
Thanks for the report, Karol
Offline
[karol@black test]$ pkgfile -vsb sudo
core/sudo (1.7.4.p4-1) : /etc/pam.d/sudo
core/sudo (1.7.4.p4-1) : /usr/bin/sudo
community/logwatch (7.3.6-3) : /usr/share/logwatch/scripts/services/sudo
[karol@black test]$ pkgfile -vb sudo
core/sudo (1.7.4.p4-1) : /etc/pam.d/sudo
core/sudo (1.7.4.p4-1) : /usr/bin/sudo
community/logwatch (7.3.6-3) : /usr/share/logwatch/scripts/services/sudo
Why is /usr/share/logwatch/scripts/services/sudo included in binaries search ('-b' switch)?
Offline
oops I forgot to implement that. I just fix it.
Also I fix the first bug. But I need to rewrite the code that looks bad right now.
Offline
oops I forgot to implement that. I just fix it.
Also I fix the first bug. But I need to rewrite the code that looks bad right now.
OK, no need to rush :-)
If I know what I'm looking for, I still prefer
farm () { curl "http://arm.konnichi.com/find/?raw=1&fn=$1"; }
It's a service provided by the same guy who runs Arch Rollback Machine
[karol@black test]$ time pkgfile -bv sendmail
community/esmtp (1.2-1) : /usr/lib/sendmail
community/esmtp (1.2-1) : /usr/sbin/sendmail
community/exim (4.72-3) : /usr/lib/sendmail
community/exim (4.72-3) : /usr/sbin/sendmail
community/logwatch (7.3.6-3) : /usr/share/logwatch/scripts/services/sendmail
extra/courier-mta (0.62.1-6) : /usr/bin/sendmail
extra/courier-mta (0.62.1-6) : /usr/sbin/sendmail
extra/postfix (2.7.1-1) : /usr/sbin/sendmail
extra/quilt (0.48-2) : /usr/share/quilt/compat/sendmail
extra/ssmtp (2.64-2) : /usr/sbin/sendmail
real 0m20.262s
user 0m17.439s
sys 0m0.553s
[karol@black test]$ time farm /usr/sbin/sendmail
extra/courier-mta 0.62.1-6
extra/postfix 2.7.0-3
extra/ssmtp 2.64-2
community/esmtp 1.2-1
community/exim 4.71-5
real 0m1.401s
user 0m0.113s
sys 0m0.010s
Offline
$ time pkgfile -svb sendmail
community/esmtp (1.2-1) : /usr/sbin/sendmail
community/exim (4.72-3) : /usr/sbin/sendmail
extra/courier-mta (0.62.1-6) : /usr/bin/sendmail
extra/courier-mta (0.62.1-6) : /usr/sbin/sendmail
extra/postfix (2.7.1-1) : /usr/sbin/sendmail
extra/ssmtp (2.64-2) : /usr/sbin/sendmail
real 0m10.660s
user 0m10.413s
sys 0m0.237s
or you also could try pkgfile -sv usr/sbin/sendmail
What's your CPU ? it takes half the time you reported here.
Offline
Fixes pushed to github
Offline
I should have this pulled to master soon. Hopefully by the end of the weekend, maybe by tonight.
Once I do it's going to sit in git for a while before release, since it's going to need some polish
[git] | [AURpkgs] | [arch-games]
Offline
$ time ./pkgfile.py -svb sendmail
extra/courier-mta (0.62.1-6) : /usr/sbin/sendmail
extra/courier-mta (0.62.1-6) : /usr/bin/sendmail
extra/postfix (2.7.1-1) : /usr/sbin/sendmail
extra/ssmtp (2.64-2) : /usr/sbin/sendmail
community/esmtp (1.2-1) : /usr/sbin/sendmail
community/exim (4.72-3) : /usr/sbin/sendmail
real 0m0.976s
user 0m0.883s
sys 0m0.087s
I think you can't do better than that. @Karol: is that fast enough ?
It uses the pkgfile2 python module written in C by Thomas Bächler with a few patches
This should land on my github repo soon.
Offline
what's the idea behind the sqlite db file? what do you intend to store in it, and where will you store it?
< Daenyth> and he works prolifically
4 8 15 16 23 42
Offline
the idea was to replace the database tree of files with something taking less space on disk to speed up I/O.
what's stored in the sqlite db is every package details or information: name, version, etc...
but the latest version do not use those sqlite db. Instead, the module read directly the extra.files.tar.gz.
Offline
the idea was to replace the database tree of files with something taking less space on disk to speed up I/O.
what's stored in the sqlite db is every package details or information: name, version, etc...but the latest version do not use those sqlite db. Instead, the module read directly the extra.files.tar.gz.
okay, that was my question. If you can use extra.files.tar.gz, there's probably no need for an sqlite db
Dan told me you can generate database files that include a list of files for each package. I guess that's exactly what extra.files.tar.gz is? Where do you get it from?
< Daenyth> and he works prolifically
4 8 15 16 23 42
Offline
right now the plan is to use pkgfile2 wrote in C which uses directly the tarball an can be used by python.
Give what you have. To someone, it may be better than you dare to think.
Offline
I've pulled Thomas' changes into the repo at the c-pkgfile branch. Solstice has been doing some of the hard work of integrating it. Once we get it nicely put together and polished, I'll make a new release with these changes.
Related to that, I need some help with distribution setup. Right now I have a Makefile and I don't know where I'm supposed to install the built pkgfile .so to. I was told I should switch to distutils, which I'm open to, but have little experience with. If someone can help me get that set up, I'd appreciate it greatly.
[git] | [AURpkgs] | [arch-games]
Offline
With the help of brain0/thomas that made a well done C python module to read directly *.files.tar.gz, pkgfile.py now is super fast.
Give it a try at https://github.com/solsticedhiver/pkgtools/tree/plan-A or use the PKGBUILD at http://paste.pocoo.org/show/294166/
You could even enable the CMD_SEARCH_ENABLED=1 feature in /etc/pkgtools/pkgfile.conf to get something like this:
$ nano
nano may be found in the following packages:
core/nano (2.2.5-1) : /usr/bin/nano
if nano is not installed and instead of getting a command not found error by bash
there may be some bug left. Let me know.
Edit: due to bug #21771, you will have to manually source /usr/share/pkgtools/pkgfile-hook.bash or source it in your ~/.bashrc to get the feature about the not found command hint.
Last edited by solstice (2010-11-21 11:07:36)
Offline
I've picked the PKGBUILD you mentioned on the ML and now I get much better results than in mid-September.
[karol@black test]$ time pkgfile -svb sendmail
community/courier-mta (0.65.2-1) : /usr/bin/sendmail
community/courier-mta (0.65.2-1) : /usr/sbin/sendmail
community/esmtp (1.2-3) : /usr/sbin/sendmail
community/exim (4.73-2) : /usr/sbin/sendmail
extra/postfix (2.7.2-1) : /usr/sbin/sendmail
extra/ssmtp (2.64-2) : /usr/sbin/sendmail
real 0m2.253s
user 0m1.707s
sys 0m0.113s
Now it's almost as fast as
curl "http://arm.konnichi.com/find/?raw=1&fn=$1"
and is more powerful and easier to use (no need to use absolute paths). For me, it's a keeper :-)
pkgfile has the benefit of using any files.tar.gz, not just the ones from the official repos. Sadly, many user repos don't provide one.
My cpu is a P4 2GHz + I'm using ext3.
Offline