You are not logged in.
I just want to share a small tip for someone who use wget tremendously (i.e. scripting download with wget, pacman with wget).
If you happen to see a small second delay when downloading,
--2010-11-12 07:31:19-- http://abc.def/hij.klm
Resolving abc.def...
you can add "--no-dns-cache" and "-4" options in your wgetrc or in your wget command.
The small second delay does not matter if you only download one or two file, but if you want to mirrorring or downloading using shell script, which use wget inside it, i.e.,
for [[ x=1; x < 100; x++ ]]; then
wget -c http://abc.def/hij__${x}.klm
done
the small second will become minutes.
The wget manual page even mention this problem but still make it on by default.
However, it has been reported that in some situations it is not desirable to cache host names, even for the duration of a short-running application like Wget. With this option Wget issues a new DNS lookup (more precisely, a new call to "gethostbyname" or "getaddrinfo") each time it makes a new connection. Please note that this option will not affect caching that might be performed by the resolving library or by an external caching layer, such as NSCD.
Well, that's it. I would love to see if anyone have any tips on using wget.
Update:
add '-4' option, thanks to brebs.
Last edited by ms (2010-11-12 23:27:00)
Offline
Isn't it a good idea to not use DNS caching even within Chromium? You should disable it as most modern internet speeds don't really need it. I thought it was only useful on slow connections/mobile devices etc.
Offline
don't really need it
Try comparing a half-second or second's delay, VERSUS practically zero delay (if running BIND locally, for example, like I do). Those half-seconds add up, and get noticeable, and thus annoying.
But, the OP is saying that wget's own internal cache is ridiculously slow?? Weird. Maybe it's an IPV6-unavailable screwup.
Offline
[TIP] Add tips to the wiki.
Offline
But, the OP is saying that wget's own internal cache is ridiculously slow?
yeah that is a bit unclear, if wget's cahce is slow then report a bug, the point of a cache is to be faster than a full lookup.
Offline
@Google: In fast Internet speed probably yes; if you don't bothered by second delay, like brebs said.
@brebs: Well, since you mentioned about IPv6-unavailable, I try the "-4" option and it also remove the delay at resolving host too.
@ijanos: The point is wget only run in short duration and it is not like long-running application (maybe like kget or Firefox, that continuously open) that you can pass a URL to it and let it downloaded for you. If you run wget 100 times, each of them in different process, do you think do you still need internal dns cache? I think not.
Offline
@ijanos: The point is wget only run in short duration and it is not like long-running application (maybe like kget or Firefox, that continuously open) that you can pass a URL to it and let it downloaded for you. If you run wget 100 times, each of them in different process, do you think do you still need internal dns cache? I think not.
Ok, I checked, the dns cache is in-memory and every wget process starts with a fresh empty new one, so it makes some sense that disabling it will gain you some time advantage. Still I am a bit reluctant to belive it is mesaurable. Do you have some numbers?
Also, if you are starting 100+ wget processes, you are doing it wrong. wget can batch download you know, and the iternal dns cache will speed up batch downloads. You mention pacman speedup, i did not check but i think pacman does not starts a new wget process for each package, and if I am right with this then disabling the wget dns cache will slow down pacman downloads.
Offline
ijanos,
No, I don't have an exact number. If I measure it with 'time' command, that would result in inaccurate time, because I have an unstable connection (3G dialup - currently at ~4-7KB). But, I will try,
* wget without '--no-dns-cache' and '-4' options
rv77ax@bubu$ time wget -c "[url]http://www.google.com/images/srpr/nav_logo25.png[/url]"
--2010-11-13 05:53:21-- [url]http://www.google.com/images/srpr/nav_logo25.png[/url]
Resolving [url=http://www.google.com]www.google.com[/url]... 64.233.181.104
Connecting to [url=http://www.google.com]www.google.com[/url]|64.233.181.104|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [image/png]
Saving to: "nav_logo25.png"
[ <=> ] 31,317 3.38K/s in 9.1s
2010-11-13 05:53:45 (3.38 KB/s) - "nav_logo25.png" saved [31317]
real 0m24.764s
user 0m0.020s
sys 0m0.007s
* wget with '--no-dns-cache' and '-4' options
rv77ax@bubu$ rm nav_logo25.png
rv77ax@bubu$ time wget -c --no-dns-cache -4 "[url]http://www.google.com/images/srpr/nav_logo25.png[/url]"
--2010-11-13 05:54:43-- [url]http://www.google.com/images/srpr/nav_logo25.png[/url]
Resolving [url=http://www.google.com]www.google.com[/url]... 64.233.181.104
Connecting to [url=http://www.google.com]www.google.com[/url]|64.233.181.104|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [image/png]
Saving to: "nav_logo25.png"
[ <=> ] 31,317 2.65K/s in 12s
2010-11-13 05:54:56 (2.65 KB/s) - "nav_logo25.png" saved [31317]
real 0m12.952s
user 0m0.013s
sys 0m0.017s
You can see in the second test it take a longer time (12s vs 9.1s) to download the same file but have a short time to process ( 0m12.952s vs 0m24.764s). If we substract real time with download time,
first test: 24.764 - 9.1 = 15.664
second test: 12.952 - 12 = 0.952
Batch download with wget is a little bit tricky. Some server allow recursive download and some server does not allow it (only accept direct file name), so, the only option is using script and use wget one by one. Oh, about pacman, last time I am remember it, it use a new wget process for each packages.
After this comment I will update my original post and add "-4" option, thanks to brebs.
Offline