You are not logged in.
I am looking for a program that will download all pages within a website including pictures and text but automated. I have been doing this by hand simply by saving each page but there must be a smarter way.
Offline
..did you even bother googling this?
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
How do these kind of programs deal with 'javascript heavy'/'stream based' websites that don't have actual html content?
https://ugjka.net
paru > yay | vesktop > discord
pacman -S spotify-launcher
mount /dev/disk/by-...
Offline
It can be hard to search when you don't really know what they're called. I know them as "web crawlers". The only one I've used is httrack. It's both easy and difficult to use. Thanks to this stupid age of "Web 2.0" the concept of how much of the website the program should download can be tricky.
EDIT: I bet there are convenient Firefox addons to do this as well.
Last edited by drcouzelis (2014-12-10 19:28:57)
Offline
I found some like wget and httrack thanks to graysky's link. I just didn't know what to call them! I cannot get either of those to work though.
wget -r -Nc -mk URL
That only downloaded the first page but not any 1, 2, 3, links inside the main page.
httrack URL -W -O "~/website" -%v
That did the same as wget.
Mirror launched on Wed, 10 Dec 2014 14:30:49 by HTTrack Website Copier/3.48-19 [XR&CO'2014]
Bytes saved: 210,64KiB Links scanned: 12/13 (+0)
Time: 10s Files written: 10
Transfer rate: 13,33KiB/s (21,19KiB/s)Files updated: 0
Active connections: 1 Errors: 0
Maybe the website has some robots.txt that breaks this?
Offline
I notice now that the website uses something in the url to bring up page 2, 3, and so on
http://url.com?2
and
http://url.com?3
and so on. I think that is messing up the programs.
Offline
Offline
wget can do it for you:
like this one for google.com
wget --limit-rate=2000k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla google.com & tail -f nohup.out
Offline