You are not logged in.
Pages: 1
For a while now, I've been searching around for an easy way to download all the files in an http directory autoindex (the type of page you would see when a folder does not have an index.html file, and autoindex is enabled in apache). I learned some tricks with wget, but it was still not as easy as I was hoping, as it still required memorizing several flags and options. So, I whipped up a nifty script to accomplish the task.
#!/bin/bash
# wget-autoindex
# script to recursively download all the files in a specified autoindex URL
#
# usage:
# wget-autoindex http://mydomain.com/dir1/dir2/
#
# This will retrieve all the files at mydomain.com/dir/dir2/ into the current
# directory, while maintaining the remote directory structure
function get-url-item {
echo $1 | awk -F/ '{ print $'$2' }'
}
cut=0
url=$1
# count what our --cut-dir should be, this is nasty to the max
while [[ -n `get-url-item $url $[ $cut + 4 ]` ]]; do
cut=$[ $cut + 1 ]
done
#[[ $url =~ http://.+..+..+/(.+) ]]
#[[ $BASH_REMATCH[1] =~ ]]
wget -c -N -nH -r -R "index.html*" -np --cut-dir=$cut $url/
The method of computing the --cut-dir value is a little sloppy, but I'm still a little new to bash scripting. If someone can think of a better way, let me know.
Offline
Pages: 1