You are not logged in.

#1 2005-03-28 17:03:27

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,176
Website

pkg to db

I am planning to import all repos into a database (sqlite).

Eaxmple:
1. Download:
wget ftp://ftp.archlinux.org/current/os/i686 … ckages.txt
2. Clean up:

cat /etc/pacmenu.txt | sed 's|/| |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*-[^-]*)$| 1|g' > pacmenu.pkgs

Result of pacmenu.pkgs (partial). Shows category, name and version:

base which 2.16-1
base wireless_tools 27-1
base zlib 1.2.2-1
daemons acpid 1.0.4-1
daemons apache 2.0.53-2
daemons apmd 3.0.2-4

Database: "test.db"
Table: "packages"
id  INTEGER
name  VARCHAR(20)
version  VARCHAR(10)
category  VARCHAR(10)

Bash sytanx to insert new data:

# sqlite3 test.db  "insert into packages (name,version,category) values ('wireless_tools','27-1','base');"

We are speaking about 1,800 packages (current+extra). What's best way to import?... bash script.

Offline

#2 2005-03-28 17:48:30

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: pkg to db

1800 is nothing... I'd just do it using shell scriptiness... like awk and sed and things...

{ print '('$2','$1','$3')' }

should work after a "cat > awk -e"... you can prepend the "insert into" junk youself 8)

I'd also like to suggest splitting out the release number as it's own column.... it'd be easier for searching by version number then, and then you can simply get the max(release)...

Offline

#3 2005-03-28 17:56:16

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: pkg to db

cat /etc/pacmenu.txt | sed 's|/| |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*)-([^-]*)$| 1 2|g' > pacmenu.pkgs

split release field

Offline

#4 2005-03-28 19:20:27

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,176
Website

Re: pkg to db

I am making it into a dump file. The text line like this:

INSERT INTO packages VALUES ('x11', 'desktop-file-utils', '0.10-1');

I managed quite close with following syntax. Don't know how to add the ' (single quote mark).

cat packages.txt | sed 's|^|INSER INTO packages VALUES (|g' | sed 's|/|, |g' | sed 's|.pkg.tar.gz||g' | sed 's|-([^-]*-[^-]*)$| 1|g' | sed 's/$/);/' 

Result:
INSER INTO packages VALUES (x11, desktop-file-utils 0.10-1);

Offline

#5 2005-03-28 19:51:03

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: pkg to db

cat packages.txt |
sed "s|^|insert into packages values ('|g" |
sed "s|/|','|g" |
sed "s|.pkg.tar.gz||g" |
sed "s|-([^-]*-[^-]*)$|','1|g" |
sed "s|$|');|"

i split it up line by line so it's readable...
solution, use double quotes for sed expressions, allowing you to escape single quote (') and make it printable.  single quote around each comma, and at the edges of the parentheses ().

that above should work

Offline

#6 2005-03-28 20:00:44

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,176
Website

Re: pkg to db

phrakture wrote:

that above should work

Thanks, works fine!

Offline

#7 2005-03-28 21:59:06

i3839
Member
Registered: 2004-02-04
Posts: 1,185

Re: pkg to db

Why don't you just gunzip the db, but keep it as a tar, and then use the libtar API to access and find the files? This avoids bloating Pacman, hd seeks and fragmentation. As the db fits in ram anyway the actual searching isn't the problem, only loading the db into ram. (When I have time I can hack Pacman to show how it would be, if there is enough interest.)

Offline

#8 2005-03-29 10:05:04

cmp
Member
Registered: 2005-01-03
Posts: 350

Re: pkg to db

I can supply you with some python code, exactly doing this.
getting the db from a webserver, saving it to a temp file and then read it into memory.

Offline

#9 2005-03-29 16:05:01

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: pkg to db

I personally like the sqlite idea... I was against it at first, but that's because I didn't fully understand sqlite... it's not a db really, as it has no RDBMS or anything... it's just an "interpreter", so to speak... I think it'd be a great idea for pacman...
no rasat, you need to insert all the package info in another table so we can have descriptions, and all that jazz

Offline

#10 2005-03-29 16:23:31

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,176
Website

Re: pkg to db

That was also what I liked with sqlite, no daemon, simple but does well easy tasks.

I will test if two databases can be linked with one common field.

db1
name
category
version
repo

db2
name
description

From where can I fetch the packages description?

Offline

#11 2005-03-29 16:25:57

i3839
Member
Registered: 2004-02-04
Posts: 1,185

Re: pkg to db

That's what tables are for. One database consists of multiple tables, often linked with common fields as you have.

Offline

#12 2005-03-29 16:29:39

cmp
Member
Registered: 2005-01-03
Posts: 350

Re: pkg to db

>> From where can I fetch the packages description?
from the package database (current.db.tar.gz)?

Offline

#13 2005-03-29 16:33:34

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: pkg to db

here's what I'd do:

database: pacman.sqlite
table: packages
   column: package_id
   column: category
   column: name
   column: version
   column: release (pkgrel)
   column: description
   column: .... all unique fields go in this table....
table: package_files
   column: package_file_id
   column: package_id (reference to above)
   column: file_name
table: package_depends
   column: package_file_id
   column: package_id (reference to above)
   column: depend_name (not referenced, due to versioning)

you can get all the unique fields from the current db... check out /var/lib/pacman/local/* for installed packages and then something like /var/lib/pacman/current/* for repo packages

the data is different... installed packages have 3 files, repo packages have one

Offline

#14 2005-03-29 16:36:43

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: pkg to db

you'd want all the "one per package" fields to be normalized (in one table), each "multiple per package" field, such as installed files, dependancies, etc, should get their own table joined by the unique id of the package...
can you index tables in sqlite?

Offline

#15 2005-03-29 16:46:47

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,176
Website

Re: pkg to db

I installed "SQLiteManager" tool to get a better idea what sqlite does and how to work.
http://freshmeat.net/projects/sqlitemanager

Ok, we go for one database with multiple tables.

Offline

#16 2005-03-29 17:05:52

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: pkg to db

post what kind of database layout you end up with, some people here can aide in normalizing it if it needs to be done...
I don't know sqlite too well, but I'm assuming it's fairly barebones... don't add referential integrity if it's an option...

Offline

#17 2005-03-30 04:32:26

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,176
Website

Re: pkg to db

cmp wrote:

>> From where can I fetch the packages description?
from the package database (current.db.tar.gz)?

Ok, then the descrptions can be fetched from /usr/lib/pacman/.
How to "grep" all descriptions in /var/lib/pacman/current/<package>/desc into one dump file?

Example desc:

%NAME%
abiword

%VERSION%
2.2.4-1

%DESC%
A fully-featured word processor

%CSIZE%
3626225

%MD5SUM%
65796b61b7684004f370041995935efd

Output:
INSERT INTO desc VALUES ('abiword', 'A fully-featured word processor');

Offline

#18 2005-03-30 11:08:58

cmp
Member
Registered: 2005-01-03
Posts: 350

Re: pkg to db

pacman it self relies on the fact that the description follows directly "%DESCRIPTION%" so you could parse it from hand, but there should also be some nifty greg trick, but I'm not that good with grep.

Offline

#19 2005-03-30 16:36:25

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,176
Website

Re: pkg to db

Got stucked with a simple problem: How to replace "|" with space. If not with sed ("|" is a command) then ed.

base|autoconf|2.59-1|yes|current

Offline

#20 2005-03-30 16:50:09

i3839
Member
Registered: 2004-02-04
Posts: 1,185

Re: pkg to db

sed 's/|/ /' or sed 's/|/ /'?

Offline

#21 2005-03-30 17:00:35

cmp
Member
Registered: 2005-01-03
Posts: 350

Re: pkg to db

tr?

[18:03] police:~ echo "a|b|c" | tr "|" " "
a b c

Offline

#22 2005-03-30 17:18:28

i3839
Member
Registered: 2004-02-04
Posts: 1,185

Re: pkg to db

Ah sorry, try the above, or

sed 'y/|/ /'

Offline

#23 2005-03-30 17:22:39

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: pkg to db

about pacman sections: each one is named and parsed as follows:

%SECTIONNAME% <---- section begins here
...predefined stuff....
 <--- section ends here on empty line

so I guess you'll have to embed newlines into the grep... or soemthing... match "^%NAME%n[.n]*nn$" maybe... it's prolly off w/o testing

Offline

#24 2005-03-30 17:59:25

kpiche
Forum Fellow
From: Ottawa, ON, Canada
Registered: 2004-03-30
Posts: 246
Website

Re: pkg to db

Try:

# sed -n "/%DESC%/,$ { /%DESC%/ d; /^$/ Q; p }" /var/lib/pacman/local/mjpegtools-1.6.2-1/desc
The mjpeg programs are a set of tools that can do recording of videos and playback, simple cut-and-paste editing and the MPEG compression of audio and video under Linux.
#

Offline

#25 2005-03-30 18:29:24

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,176
Website

Re: pkg to db

i3839 wrote:

Ah sorry, try the above, or

sed 'y/|/ /'

Thanks, with your direction I got it correct.... sed 's/|/ /g'

I managed to get my first test run adding db to Trigger's pacmenu's first option checking packages what are not yet installed and then able to select with tick marks. It works fine.

It makes a big difference comparing+parsing two flat files (all current+extra packages with installed packages) or using DB. Flat files about 40s and DB 1 to 2s.

The categories are already in DB. Next stage is to add description, I will take a kook at your suggestions.

Offline

Board footer

Powered by FluxBB