You are not logged in.

#1 2004-10-28 17:58:02

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

[proposition] pacscan

Considering there tend to be a decent amount of posts dealing with the searching of installed packages and orphan checking and all these things, I would like to propose pacscan:

pacscan would use the pacman local DB for optimized searching/sorting routines.  Some of the functionality is listed as follows:

* Display all installed packages
* Display installed packages in a tree format (dependancy based)
* Display all installed files (basically a -Ql on all installed packages)
* Check for missing files (similar to above except tests for file)
* Check for unowned files (similar to skeeterbug's script in another thread, but done in C/C++ and optimized to work on a cached pacman database listing)

Now this is just a random idea - I could probably kick this out pretty quickly.

Does anyone think this would be helpful?  If so let me know - if you'd want to veto go right ahead.

Offline

#2 2004-10-28 18:49:53

skeeterbug
Member
From: Oklahoma, USA
Registered: 2004-10-24
Posts: 92
Website

Re: [proposition] pacscan

sounds like a heck of a plan to me - I'd love to use it, I especially like the addition of searching for missing files - sounds sweet

did I mention I like the idea - a lot smile

Offline

#3 2004-10-28 18:54:56

Dusty
Schwag Merchant
From: Medicine Hat, Alberta, Canada
Registered: 2004-01-18
Posts: 5,986
Website

Offline

#4 2004-10-28 19:51:10

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

skeeterbug wrote:

sounds like a heck of a plan to me - I'd love to use it, I especially like the addition of searching for missing files - sounds sweet

did I mention I like the idea - a lot smile

i got the idea from your response to dusty's post
i'll hack some crap together later on tonight and see where I'm at tomorrow - expect a prelim around sunday or so

Offline

#5 2004-10-28 20:01:37

Dusty
Schwag Merchant
From: Medicine Hat, Alberta, Canada
Registered: 2004-01-18
Posts: 5,986
Website

Re: [proposition] pacscan

phrakture wrote:

expect a prelim around sunday or so

knowing how optomistic coders can be, I'll assume that's Sunday Nov 7... :-D

Offline

#6 2004-10-28 20:09:40

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

no way... the pacman code is done really well - i'll nab the db.h and db.c files - for parsing the pacman database - and extend them a bit with my own functions.  the fact that package parsing is done with a linked list already makes this much easier.

i'm debating converting this to c++ so I could use the STL containers that do internal sorting on insertion (map<T>)...

conversion wouldnt be that hard...PMList becomes list<T> and then minor semantics changes....

I might actually do that - as more of an exercise than anything.

Offline

#7 2004-10-28 21:03:53

dtw
Forum Fellow
From: UK
Registered: 2004-08-03
Posts: 4,439
Website

Re: [proposition] pacscan

this would definitely fill the biggest gap in Arch - i just want to be able to know more about my packages installed - i feel a bit blind at the moment!

Offline

#8 2004-10-29 00:20:07

skeeterbug
Member
From: Oklahoma, USA
Registered: 2004-10-24
Posts: 92
Website

Re: [proposition] pacscan

phrakture wrote:

i got the idea from your response to dusty's post
i'll hack some crap together later on tonight and see where I'm at tomorrow - expect a prelim around sunday or so

Yeah, that's what I figgered - I know my script is kind of junky and was hoping that someone would improve on it but the pacscan program sounds a lot better - especially if it's faster.

By the way, I hope I didn't offend anyone with the error message in that script - I was just writing it for myself and figured that before I showed it to anyone else I'd clean all that up but I cut / pasted without reading it.  I doubt if it's a big deal but sorry just in case.

Offline

#9 2004-10-29 02:25:40

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

hey, i thought it was funny

Offline

#10 2004-10-31 17:52:45

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

Sorry guys, got real busy at work.
Had to work all Sat/Sun (being salaried sucks) just to get shit done.

So my development has been delayed.  I most likely will not have time until around Thursday.  But right now I have some of it - it's just not in a state to throw together yet.

Offline

#11 2004-10-31 19:12:50

neotuli
Lazy Developer
From: London, UK
Registered: 2004-07-06
Posts: 1,204
Website

Re: [proposition] pacscan

skeeterbug wrote:

sounds like a heck of a plan to me - I'd love to use it, I especially like the addition of searching for missing files - sounds sweet

did I mention I like the idea - a lot smile

http://bbs.archlinux.org/viewtopic.php?t=6591  I posted this some time ago, it goes through the tree and spits out missing files and the package that it was supposed to be part of.


The suggestion box only accepts patches.

Offline

#12 2004-11-05 22:41:37

skeeterbug
Member
From: Oklahoma, USA
Registered: 2004-10-24
Posts: 92
Website

Re: [proposition] pacscan

Hey phrakture, is this still in the works?  I know you got busy with work - just aksing smile

Offline

#13 2004-11-05 23:34:48

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

yeah, yes it is - I have some scraps throw together - and a python script which *almost* works - it was for performance testing really (c vs python).

the python script works with almost all files except for a few... but it takes a bit:

riddle me this you algorithm masters out there: what's the fastest way to sort through the file system: I am going to use a cache for the installed files, to make it easier... say pacscan --buildcache to build the cache - right now that part works (in C).... reads the installed files (all of them) and caches it to a file (/var/cache/pacscan right now) - a simple line by line list... for simple parsing later on.

Now i need to determine the fastest way to step through each and every file in the file system and check it against this list of installed files. right now, the list is unsorted, but I may implement a simple quick-sort to make it all nice...

So here's the setup - I have a single linked list of installed files (may or may not be in order) and I want to go through the file system and verify that each file is part of a package.  Best way to do this?

Offline

#14 2004-11-06 03:02:31

rasat
Forum Fellow
From: Finland, working in Romania
Registered: 2002-12-27
Posts: 2,293
Website

Re: [proposition] pacscan

phrakture wrote:

* Display all installed packages
* Display installed packages in a tree format (dependancy based)
* Display all installed files (basically a -Ql on all installed packages)

There is one disadvantage if the packages cannot be divided into categories as per in Arch home page package list, the display with pacman's current search engine becomes too long for study.


Markku

Offline

#15 2004-11-27 08:42:13

IceRAM
Member
From: Bucharest, Romania
Registered: 2004-03-04
Posts: 772
Website

Re: [proposition] pacscan

:ding:
any news?

and.. to add another feature that I'd personally like:

* Display modified config files (the files in the backup=(..) lines) - shouldn't be very difficult because the MD5 are already stored

Offline

#16 2004-11-30 16:33:45

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

well, yeah it's getting closer to a milestone release, however this project ends on the 5th, so I'm working like a dog.
Also, my laptop's in the shop and I probably won't get it back for a while - I ganked the harddrive before I sent it in, as I have a feeling installing linux might void their warranty, but I don't know.  Faulty power socket.... sigh

Offline

#17 2004-12-04 19:47:43

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

end of the project at work... been here 12 hours on a saturday... i hate my job....

anyway, when my laptop gets back from HP I will finish up what I have so there's at least SOME results...

Offline

#18 2005-01-12 22:08:58

tranquility
Member
From: Portugal
Registered: 2004-08-06
Posts: 136

Re: [proposition] pacscan

phrakture wrote:

yeah, yes it is - I have some scraps throw together - and a python script which *almost* works - it was for performance testing really (c vs python).

the python script works with almost all files except for a few... but it takes a bit:

riddle me this you algorithm masters out there: what's the fastest way to sort through the file system: I am going to use a cache for the installed files, to make it easier... say pacscan --buildcache to build the cache - right now that part works (in C).... reads the installed files (all of them) and caches it to a file (/var/cache/pacscan right now) - a simple line by line list... for simple parsing later on.

Now i need to determine the fastest way to step through each and every file in the file system and check it against this list of installed files. right now, the list is unsorted, but I may implement a simple quick-sort to make it all nice...

So here's the setup - I have a single linked list of installed files (may or may not be in order) and I want to go through the file system and verify that each file is part of a package.  Best way to do this?

IANAAM (I Am Not An Algorith Master) but,
to me seems like if the most common operation to be performed is a search, that a strategy of sorting the filenames after all the insert's are performed is correct and to my knowledge binary search is the best search strategy. If the case is not so, and further insert's or remove's are required, you might try considering building some kind of search tree.

Some links that might be useful:
http://en.wikipedia.org/wiki/Binary_search
http://en.wikipedia.org/wiki/Search_algorithm

Offline

#19 2005-01-19 20:57:09

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

Well holy crap guys.  If it wasn't for dibblethewrecker, I would have completely forgot about this.  With my laptop being in the shop for way longer than it should have been, and some real heavy projects at work, quitting that job, starting a new one, and moving to a new apartment, I haven't had time to do much else (besides read on the train).

I'll get back into this tonight and see where I was at.  Last thing I recall was attempting to figure out an optimal filesystem searching algorithm....

Expect an update soon.  And again, I apologize.

Offline

#20 2005-01-19 22:26:13

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: [proposition] pacscan

I recommend doing a simple B-tree.
I would build a tree as I read in the filenames. Since you will add far less than you will search, the overhead involved with the initial build would be tolerable. There would be no need for sorting either (bonus).
I dont recall the max depth of a B-tree offhand, but it is less than a binary tree by some factor of the size of the nodes. For a true binary it is something like log-base-2 of the sample size.

IE, with 24K files, the max tree depth of a simple binary tree would be about 15 (log-base-2 of 24,000.)..That means no more than 15 comparisons to find any file name, with a well balanced tree of course (red-black maybe).
With a B-tree, it can be considerably less.

B-trees aren't too hard to code. The hardest part is the splits and merges, but looking in most classical algorithm books should present you with an almost verbatim chunk of code you can use.


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#21 2005-01-20 00:58:08

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: [proposition] pacscan

cactus wrote:

I recommend doing a simple B-tree.
I would build a tree as I read in the filenames. Since you will add far less than you will search, the overhead involved with the initial build would be tolerable. There would be no need for sorting either (bonus).
I dont recall the max depth of a B-tree offhand, but it is less than a binary tree by some factor of the size of the nodes. For a true binary it is something like log-base-2 of the sample size.

IE, with 24K files, the max tree depth of a simple binary tree would be about 15 (log-base-2 of 24,000.)..That means no more than 15 comparisons to find any file name, with a well balanced tree of course (red-black maybe).
With a B-tree, it can be considerably less.

B-trees aren't too hard to code. The hardest part is the splits and merges, but looking in most classical algorithm books should present you with an almost verbatim chunk of code you can use.

the STL map<> template is red-black... I was planning on using it

Offline

Board footer

Powered by FluxBB