pkg to db

rasat · 2005-03-28 17:03:27

I am planning to import all repos into a database (sqlite).

Eaxmple:
1. Download:
wget ftp://ftp.archlinux.org/current/os/i686 … ckages.txt
2. Clean up:

cat /etc/pacmenu.txt | sed 's|/| |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*-[^-]*)$| 1|g' > pacmenu.pkgs

Result of pacmenu.pkgs (partial). Shows category, name and version:

base which 2.16-1
base wireless_tools 27-1
base zlib 1.2.2-1
daemons acpid 1.0.4-1
daemons apache 2.0.53-2
daemons apmd 3.0.2-4

Database: "test.db"
Table: "packages"
id INTEGER
name VARCHAR(20)
version VARCHAR(10)
category VARCHAR(10)

Bash sytanx to insert new data:

# sqlite3 test.db  "insert into packages (name,version,category) values ('wireless_tools','27-1','base');"

We are speaking about 1,800 packages (current+extra). What's best way to import?... bash script.

phrakture · 2005-03-28 17:48:30

1800 is nothing... I'd just do it using shell scriptiness... like awk and sed and things...

{ print '('$2','$1','$3')' }

should work after a "cat > awk -e"... you can prepend the "insert into" junk youself 8)

I'd also like to suggest splitting out the release number as it's own column.... it'd be easier for searching by version number then, and then you can simply get the max(release)...

phrakture · 2005-03-28 17:56:16

cat /etc/pacmenu.txt | sed 's|/| |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*)-([^-]*)$| 1 2|g' > pacmenu.pkgs

split release field

rasat · 2005-03-28 19:20:27

I am making it into a dump file. The text line like this:

INSERT INTO packages VALUES ('x11', 'desktop-file-utils', '0.10-1');

I managed quite close with following syntax. Don't know how to add the ' (single quote mark).

cat packages.txt | sed 's|^|INSER INTO packages VALUES (|g' | sed 's|/|, |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*-[^-]*)$| 1|g' | sed 's/$/);/'

Result:
INSER INTO packages VALUES (x11, desktop-file-utils 0.10-1);

phrakture · 2005-03-28 19:51:03

cat packages.txt |
sed "s|^|insert into packages values ('|g" |
sed "s|/|','|g" |
sed "s|.pkg.tar.gz||g" |
sed "s|-([^-]*-[^-]*)$|','1|g" |
sed "s|$|');|"

i split it up line by line so it's readable...
solution, use double quotes for sed expressions, allowing you to escape single quote (') and make it printable. single quote around each comma, and at the edges of the parentheses ().

that above should work

rasat · 2005-03-28 20:00:44

phrakture wrote:

that above should work

Thanks, works fine!

i3839 · 2005-03-28 21:59:06

Why don't you just gunzip the db, but keep it as a tar, and then use the libtar API to access and find the files? This avoids bloating Pacman, hd seeks and fragmentation. As the db fits in ram anyway the actual searching isn't the problem, only loading the db into ram. (When I have time I can hack Pacman to show how it would be, if there is enough interest.)

cmp · 2005-03-29 10:05:04

I can supply you with some python code, exactly doing this.
getting the db from a webserver, saving it to a temp file and then read it into memory.

phrakture · 2005-03-29 16:05:01

I personally like the sqlite idea... I was against it at first, but that's because I didn't fully understand sqlite... it's not a db really, as it has no RDBMS or anything... it's just an "interpreter", so to speak... I think it'd be a great idea for pacman...
no rasat, you need to insert all the package info in another table so we can have descriptions, and all that jazz

rasat · 2005-03-29 16:23:31

That was also what I liked with sqlite, no daemon, simple but does well easy tasks.

I will test if two databases can be linked with one common field.

db1
name
category
version
repo

db2
name
description

From where can I fetch the packages description?

i3839 · 2005-03-29 16:25:57

That's what tables are for. One database consists of multiple tables, often linked with common fields as you have.

cmp · 2005-03-29 16:29:39

>> From where can I fetch the packages description?
from the package database (current.db.tar.gz)?

phrakture · 2005-03-29 16:33:34

here's what I'd do:

database: pacman.sqlite
table: packages
   column: package_id
   column: category
   column: name
   column: version
   column: release (pkgrel)
   column: description
   column: .... all unique fields go in this table....
table: package_files
   column: package_file_id
   column: package_id (reference to above)
   column: file_name
table: package_depends
   column: package_file_id
   column: package_id (reference to above)
   column: depend_name (not referenced, due to versioning)

you can get all the unique fields from the current db... check out /var/lib/pacman/local/* for installed packages and then something like /var/lib/pacman/current/* for repo packages

the data is different... installed packages have 3 files, repo packages have one

phrakture · 2005-03-29 16:36:43

you'd want all the "one per package" fields to be normalized (in one table), each "multiple per package" field, such as installed files, dependancies, etc, should get their own table joined by the unique id of the package...
can you index tables in sqlite?

rasat · 2005-03-29 16:46:47

I installed "SQLiteManager" tool to get a better idea what sqlite does and how to work.
http://freshmeat.net/projects/sqlitemanager

Ok, we go for one database with multiple tables.

phrakture · 2005-03-29 17:05:52

post what kind of database layout you end up with, some people here can aide in normalizing it if it needs to be done...
I don't know sqlite too well, but I'm assuming it's fairly barebones... don't add referential integrity if it's an option...

rasat · 2005-03-30 04:32:26

cmp wrote:

>> From where can I fetch the packages description?
from the package database (current.db.tar.gz)?

Ok, then the descrptions can be fetched from /usr/lib/pacman/.
How to "grep" all descriptions in /var/lib/pacman/current/<package>/desc into one dump file?

Example desc:

%NAME%
abiword
%VERSION%
2.2.4-1
%DESC%
A fully-featured word processor
%CSIZE%
3626225
%MD5SUM%
65796b61b7684004f370041995935efd

Output:
INSERT INTO desc VALUES ('abiword', 'A fully-featured word processor');

cmp · 2005-03-30 11:08:58

pacman it self relies on the fact that the description follows directly "%DESCRIPTION%" so you could parse it from hand, but there should also be some nifty greg trick, but I'm not that good with grep.

rasat · 2005-03-30 16:36:25

Got stucked with a simple problem: How to replace "|" with space. If not with sed ("|" is a command) then ed.

base|autoconf|2.59-1|yes|current

i3839 · 2005-03-30 16:50:09

sed 's/|/ /' or sed 's/|/ /'?

cmp · 2005-03-30 17:00:35

tr?

[18:03] police:~ echo "a|b|c" | tr "|" " "
a b c

i3839 · 2005-03-30 17:18:28

Ah sorry, try the above, or

sed 'y/|/ /'

phrakture · 2005-03-30 17:22:39

about pacman sections: each one is named and parsed as follows:

%SECTIONNAME% <---- section begins here
...predefined stuff....
 <--- section ends here on empty line

so I guess you'll have to embed newlines into the grep... or soemthing... match "^%NAME%n[.n]*nn$" maybe... it's prolly off w/o testing

kpiche · 2005-03-30 17:59:25

Try:

# sed -n "/%DESC%/,$ { /%DESC%/ d; /^$/ Q; p }" /var/lib/pacman/local/mjpegtools-1.6.2-1/desc
The mjpeg programs are a set of tools that can do recording of videos and playback, simple cut-and-paste editing and the MPEG compression of audio and video under Linux.
#

rasat · 2005-03-30 18:29:24

i3839 wrote:

Ah sorry, try the above, or
sed 'y/|/ /'

Thanks, with your direction I got it correct.... sed 's/|/ /g'

I managed to get my first test run adding db to Trigger's pacmenu's first option checking packages what are not yet installed and then able to select with tick marks. It works fine.

It makes a big difference comparing+parsing two flat files (all current+extra packages with installed packages) or using DB. Flat files about 40s and DB 1 to 2s.

The categories are already in DB. Next stage is to add description, I will take a kook at your suggestions.

Arch Linux

#1 2005-03-28 17:03:27

pkg to db

#2 2005-03-28 17:48:30

Re: pkg to db

#3 2005-03-28 17:56:16

Re: pkg to db

#4 2005-03-28 19:20:27

Re: pkg to db

#5 2005-03-28 19:51:03

Re: pkg to db

#6 2005-03-28 20:00:44

Re: pkg to db

#7 2005-03-28 21:59:06

Re: pkg to db

#8 2005-03-29 10:05:04

Re: pkg to db

#9 2005-03-29 16:05:01

Re: pkg to db

#10 2005-03-29 16:23:31

Re: pkg to db

#11 2005-03-29 16:25:57

Re: pkg to db

#12 2005-03-29 16:29:39

Re: pkg to db

#13 2005-03-29 16:33:34

Re: pkg to db

#14 2005-03-29 16:36:43

Re: pkg to db

#15 2005-03-29 16:46:47

Re: pkg to db

#16 2005-03-29 17:05:52

Re: pkg to db

#17 2005-03-30 04:32:26

Re: pkg to db

#18 2005-03-30 11:08:58

Re: pkg to db

#19 2005-03-30 16:36:25

Re: pkg to db

#20 2005-03-30 16:50:09

Re: pkg to db

#21 2005-03-30 17:00:35

Re: pkg to db

#22 2005-03-30 17:18:28

Re: pkg to db

#23 2005-03-30 17:22:39

Re: pkg to db

#24 2005-03-30 17:59:25

Re: pkg to db

#25 2005-03-30 18:29:24

Re: pkg to db

Board footer