You are not logged in.
I am planning to import all repos into a database (sqlite).
Eaxmple:
1. Download:
wget ftp://ftp.archlinux.org/current/os/i686 … ckages.txt
2. Clean up:
cat /etc/pacmenu.txt | sed 's|/| |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*-[^-]*)$| 1|g' > pacmenu.pkgs
Result of pacmenu.pkgs (partial). Shows category, name and version:
base which 2.16-1
base wireless_tools 27-1
base zlib 1.2.2-1
daemons acpid 1.0.4-1
daemons apache 2.0.53-2
daemons apmd 3.0.2-4
Database: "test.db"
Table: "packages"
id INTEGER
name VARCHAR(20)
version VARCHAR(10)
category VARCHAR(10)
Bash sytanx to insert new data:
# sqlite3 test.db "insert into packages (name,version,category) values ('wireless_tools','27-1','base');"
We are speaking about 1,800 packages (current+extra). What's best way to import?... bash script.
Markku
Offline
1800 is nothing... I'd just do it using shell scriptiness... like awk and sed and things...
{ print '('$2','$1','$3')' }
should work after a "cat > awk -e"... you can prepend the "insert into" junk youself 8)
I'd also like to suggest splitting out the release number as it's own column.... it'd be easier for searching by version number then, and then you can simply get the max(release)...
Offline
cat /etc/pacmenu.txt | sed 's|/| |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*)-([^-]*)$| 1 2|g' > pacmenu.pkgs
split release field
Offline
I am making it into a dump file. The text line like this:
INSERT INTO packages VALUES ('x11', 'desktop-file-utils', '0.10-1');
I managed quite close with following syntax. Don't know how to add the ' (single quote mark).
cat packages.txt | sed 's|^|INSER INTO packages VALUES (|g' | sed 's|/|, |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*-[^-]*)$| 1|g' | sed 's/$/);/'
Result:
INSER INTO packages VALUES (x11, desktop-file-utils 0.10-1);
Markku
Offline
cat packages.txt |
sed "s|^|insert into packages values ('|g" |
sed "s|/|','|g" |
sed "s|.pkg.tar.gz||g" |
sed "s|-([^-]*-[^-]*)$|','1|g" |
sed "s|$|');|"
i split it up line by line so it's readable...
solution, use double quotes for sed expressions, allowing you to escape single quote (') and make it printable. single quote around each comma, and at the edges of the parentheses ().
that above should work
Offline
that above should work
Thanks, works fine!
Markku
Offline
Why don't you just gunzip the db, but keep it as a tar, and then use the libtar API to access and find the files? This avoids bloating Pacman, hd seeks and fragmentation. As the db fits in ram anyway the actual searching isn't the problem, only loading the db into ram. (When I have time I can hack Pacman to show how it would be, if there is enough interest.)
Offline
I can supply you with some python code, exactly doing this.
getting the db from a webserver, saving it to a temp file and then read it into memory.
Offline
I personally like the sqlite idea... I was against it at first, but that's because I didn't fully understand sqlite... it's not a db really, as it has no RDBMS or anything... it's just an "interpreter", so to speak... I think it'd be a great idea for pacman...
no rasat, you need to insert all the package info in another table so we can have descriptions, and all that jazz
Offline
That was also what I liked with sqlite, no daemon, simple but does well easy tasks.
I will test if two databases can be linked with one common field.
db1
name
category
version
repo
db2
name
description
From where can I fetch the packages description?
Markku
Offline
That's what tables are for. One database consists of multiple tables, often linked with common fields as you have.
Offline
>> From where can I fetch the packages description?
from the package database (current.db.tar.gz)?
Offline
here's what I'd do:
database: pacman.sqlite
table: packages
column: package_id
column: category
column: name
column: version
column: release (pkgrel)
column: description
column: .... all unique fields go in this table....
table: package_files
column: package_file_id
column: package_id (reference to above)
column: file_name
table: package_depends
column: package_file_id
column: package_id (reference to above)
column: depend_name (not referenced, due to versioning)
you can get all the unique fields from the current db... check out /var/lib/pacman/local/* for installed packages and then something like /var/lib/pacman/current/* for repo packages
the data is different... installed packages have 3 files, repo packages have one
Offline
you'd want all the "one per package" fields to be normalized (in one table), each "multiple per package" field, such as installed files, dependancies, etc, should get their own table joined by the unique id of the package...
can you index tables in sqlite?
Offline
I installed "SQLiteManager" tool to get a better idea what sqlite does and how to work.
http://freshmeat.net/projects/sqlitemanager
Ok, we go for one database with multiple tables.
Markku
Offline
post what kind of database layout you end up with, some people here can aide in normalizing it if it needs to be done...
I don't know sqlite too well, but I'm assuming it's fairly barebones... don't add referential integrity if it's an option...
Offline
>> From where can I fetch the packages description?
from the package database (current.db.tar.gz)?
Ok, then the descrptions can be fetched from /usr/lib/pacman/.
How to "grep" all descriptions in /var/lib/pacman/current/<package>/desc into one dump file?
Example desc:
%NAME%
abiword%VERSION%
2.2.4-1%DESC%
A fully-featured word processor%CSIZE%
3626225%MD5SUM%
65796b61b7684004f370041995935efd
Output:
INSERT INTO desc VALUES ('abiword', 'A fully-featured word processor');
Markku
Offline
pacman it self relies on the fact that the description follows directly "%DESCRIPTION%" so you could parse it from hand, but there should also be some nifty greg trick, but I'm not that good with grep.
Offline
Got stucked with a simple problem: How to replace "|" with space. If not with sed ("|" is a command) then ed.
base|autoconf|2.59-1|yes|current
Markku
Offline
sed 's/|/ /' or sed 's/|/ /'?
Offline
tr?
[18:03] police:~ echo "a|b|c" | tr "|" " "
a b c
Offline
Ah sorry, try the above, or
sed 'y/|/ /'
Offline
about pacman sections: each one is named and parsed as follows:
%SECTIONNAME% <---- section begins here
...predefined stuff....
<--- section ends here on empty line
so I guess you'll have to embed newlines into the grep... or soemthing... match "^%NAME%n[.n]*nn$" maybe... it's prolly off w/o testing
Offline
Try:
# sed -n "/%DESC%/,$ { /%DESC%/ d; /^$/ Q; p }" /var/lib/pacman/local/mjpegtools-1.6.2-1/desc
The mjpeg programs are a set of tools that can do recording of videos and playback, simple cut-and-paste editing and the MPEG compression of audio and video under Linux.
#
Offline
Ah sorry, try the above, or
sed 'y/|/ /'
Thanks, with your direction I got it correct.... sed 's/|/ /g'
I managed to get my first test run adding db to Trigger's pacmenu's first option checking packages what are not yet installed and then able to select with tick marks. It works fine.
It makes a big difference comparing+parsing two flat files (all current+extra packages with installed packages) or using DB. Flat files about 40s and DB 1 to 2s.
The categories are already in DB. Next stage is to add description, I will take a kook at your suggestions.
Markku
Offline