You are not logged in.

#1 2005-04-29 19:31:43

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

web scraping + AUR = fun

seeing as cvsup isn't working with aur yet, I made a little script to download the entire set of aur package directories... yeah, it takes a long time... it's using HTTP and scraping apache's directory listing format... but it works...

anyway, here we go - feel free to do what you want:

# !/bin/sh
# AUR Web Scraping to get all PKGBUILDs
# Aaron Griffin [phrakture]

BASEDIR="$HOME/aur"
PKGURL="http://aur.archlinux.org/packages/"
PKGFILE="index.html"

#get_dir http://www.xyz.com/a
# This function will get all files
# listed in an apache formatted directory list
function get_dir()
{
   local thisdir=`basename $1`

   if [ "x$thisdir" != "x" ]; then
      mkdir $thisdir
      cd $thisdir
      wget -q $1
      if [ $? -eq 0 ]; then
         local files=`grep "[   ]" $PKGFILE |
                      sed 's@.*href="(.*)".*@1@g'`
         #skip parent dir, infinate recursion
         local dirs=`grep "[DIR]" $PKGFILE |
                     grep -v "Parent Directory" |
                     sed 's@.*href="(.*)".*@1@g'`
         rm $PKGFILE

         for f in $files; do
            echo "downloading $thisdir::$f"
            wget -q $1$f
         done

         for d in $dirs; do
            get_dir $1$d
         done

         cd ..
      else
         echo "error downloading directory list : $1"
      fi
   else
      echo "usage: get_dir <apache url>"
   fi
}

cd $BASEDIR
[ -f $PKGFILE ] && rm -f $PKGFILE

get_dir $PKGURL

and don't expect jesus to come out of this and make everything fine and dandy... it's just a script to download the PKGBUILD in whatever format/state they're in on the server

Offline

#2 2005-04-29 21:14:51

JGC
Developer
Registered: 2003-12-03
Posts: 1,664

Re: web scraping + AUR = fun

You know you can use wget --mirror for this?

Hmm, --mirror is only useful if you're sure you can exclude stuff from it. Having complete packages isn't very useful tongue

Offline

#3 2005-04-29 21:19:38

IceRAM
Member
From: Bucharest, Romania
Registered: 2004-03-04
Posts: 772
Website

Re: web scraping + AUR = fun

wget --mirror copies index.html files as well, not to mention that it goes over index.html?C=X (sorted lists).

Offline

#4 2005-04-29 22:08:27

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: web scraping + AUR = fun

yeah, doesn't mirror grab everything including html files (and I'd assume it's might follow the "parent directory" links)

this could also be extended to get one package at a time... *shrug*

Offline

#5 2005-06-23 02:50:26

lazychris2000
Member
Registered: 2004-11-29
Posts: 35

Re: web scraping + AUR = fun

I ran the script, now I have alot of packages/pkgbuilds in my home folder.  Is there a way to add that folder to /etc/pacman.conf so I can just do pacman -S <packagename>? 
I tried adding:
[aur]
Server = file:///home/chris/aur/packages
to pacman.conf, but it spit out this error:
failed copying /home/chris/aur/packages/aur.db.tar.gz

failed to synchronize aur
error: could not open sync database: aur
       have you used --refresh yet?

Am I just being stupid, or is it not possible to do that?


Arch .7.1, 2.6.15, 1024 MB PC2700, Athlon XP 2600+, Soyo KT400 Dragon Ultra Platinum, NVIDIA GeForceFX 5700 256MB

Compaq Armada E500--Arch .7.1, 2.6.14, 256MB PC100, 900MHz PIII, Netgear WG511T

Offline

#6 2005-06-23 03:01:03

Snowman
Developer/Forum Fellow
From: Montreal, Canada
Registered: 2004-08-20
Posts: 5,212

Re: web scraping + AUR = fun

It's not possible.  The above script only fetch PKGBUILDs.  You need to make the packages with makepkg and install them with pacman -A

Take a look at srcpac.  IIRC, it makes the package and install it in one step.

Offline

#7 2005-06-23 03:15:51

lazychris2000
Member
Registered: 2004-11-29
Posts: 35

Re: web scraping + AUR = fun

Thanks.  I'll give that a shot.


Arch .7.1, 2.6.15, 1024 MB PC2700, Athlon XP 2600+, Soyo KT400 Dragon Ultra Platinum, NVIDIA GeForceFX 5700 256MB

Compaq Armada E500--Arch .7.1, 2.6.14, 256MB PC100, 900MHz PIII, Netgear WG511T

Offline

#8 2005-06-23 09:29:56

T-Dawg
Forum Fellow
From: Charlotte, NC
Registered: 2005-01-29
Posts: 2,736

Re: web scraping + AUR = fun

you'll have to move them into /var/abs/local. srcpac does a top level find in /var/abs....

Offline

Board footer

Powered by FluxBB