You are not logged in.

#1 2012-05-21 23:00:04

jimwalton7
Member
Registered: 2012-05-21
Posts: 1

wget Scheme missing

I'm trying to download a group of files from Internet Archive. I have used this script in the past and it does work, but I'm missing something that is keeping it from working this time.  I've tried all the things I could find here, but still not working. Here is my wget script based on instructions from Archive.org at Internet Archive Blogs:

wget -A pdf -r -H -nc -np -nH -nd --cut-dirs=2 -D archive.org --exclude-domains blog.archive.org,web.archive.org -e robots=off -i ../itemlist.txt -B ‘http://www.archive.org/download/’

Here is a sample return:
../itemlist.txt: Invalid URL ‘http://www.archive.org/download/federalandstate04statgoog: Scheme missing
../itemlist.txt: Invalid URL ‘http://www.archive.org/download/federalandstate03statgoog: Scheme missing
../itemlist.txt: Invalid URL ‘http://www.archive.org/download/1156151.0002.001.umich.edu: Scheme missing

The itemlist.txt file is a csv file built by Archive.org for the express purpose of downloading files from their site. federalandstate04statgoog is the first item on the list, and it formats the URL with no problem.The basic URL format I'm  parsing is http://{datanode}.us.archive.org/{drive}/items/{identifier}/{identifier}.pdf. The command skips the {datanode] and jumps over the {drive}/itmes/ directories to put all files in a single list. The final download URL should look like archive.org/download/federalandstate04statgoog/federalandstate04statgoog.pdf

Technically, I'm not a newbie, but I haven't used Unix in nearly 30 years. So, what am I missing?

Offline

Board footer

Powered by FluxBB