wget-autoindex

metzen · 2006-10-09 23:55:17

For a while now, I've been searching around for an easy way to download all the files in an http directory autoindex (the type of page you would see when a folder does not have an index.html file, and autoindex is enabled in apache). I learned some tricks with wget, but it was still not as easy as I was hoping, as it still required memorizing several flags and options. So, I whipped up a nifty script to accomplish the task.

#!/bin/bash
# wget-autoindex
#   script to recursively download all the files in a specified autoindex URL
#
# usage:
# wget-autoindex http://mydomain.com/dir1/dir2/
#
# This will retrieve all the files at mydomain.com/dir/dir2/ into the current
# directory, while maintaining the remote directory structure

function get-url-item {
    echo $1 | awk -F/ '{ print $'$2' }'
}

cut=0
url=$1

# count what our --cut-dir should be, this is nasty to the max
while [[ -n `get-url-item $url $[ $cut + 4 ]` ]]; do
    cut=$[ $cut + 1 ]
done

#[[ $url =~ http://.+..+..+/(.+) ]]
#[[ $BASH_REMATCH[1] =~ ]]

wget -c -N -nH -r -R "index.html*" -np --cut-dir=$cut $url/

The method of computing the --cut-dir value is a little sloppy, but I'm still a little new to bash scripting. If someone can think of a better way, let me know.

Arch Linux

#1 2006-10-09 23:55:17

wget-autoindex

Board footer