You are not logged in.

#1 2010-11-12 00:40:35

ms
Member
From: Bandung, Indonesia
Registered: 2010-07-28
Posts: 81
Website

[TIPS] Improve wget process time with "--no-dns-cache" and "-4" option

I just want to share a small tip for someone who use wget tremendously (i.e. scripting download with wget, pacman with wget).

If you happen to see a small second delay when downloading,

--2010-11-12 07:31:19--  http://abc.def/hij.klm
Resolving abc.def...

you can add "--no-dns-cache" and "-4" options in your wgetrc or in your wget command.

The small second delay does not matter if you only download one or two file, but if you want to mirrorring or downloading using shell script, which use wget inside it, i.e.,

for [[ x=1; x < 100; x++ ]]; then
  wget -c http://abc.def/hij__${x}.klm
done

the small second will become minutes.

The wget manual page even mention this problem but still make it on by default.

$man wget wrote:

However, it has been reported that in some situations it is not desirable to cache host names, even for the duration of a short-running application like Wget.  With this option Wget issues a new DNS lookup (more precisely, a new call to "gethostbyname" or "getaddrinfo") each time it makes a new connection.  Please note that this option will not affect caching that might be performed by the resolving library or by an external caching layer, such as NSCD.

Well, that's it. I would love to see if anyone have any tips on using wget.

Update:

  • add '-4' option, thanks to brebs.

Last edited by ms (2010-11-12 23:27:00)

Offline

#2 2010-11-12 07:00:40

Google
Member
From: Mountain View, California
Registered: 2010-05-31
Posts: 484
Website

Re: [TIPS] Improve wget process time with "--no-dns-cache" and "-4" option

Isn't it a good idea to not use DNS caching even within Chromium? You should disable it as most modern internet speeds don't really need it. I thought it was only useful on slow connections/mobile devices etc.

Offline

#3 2010-11-12 08:05:44

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: [TIPS] Improve wget process time with "--no-dns-cache" and "-4" option

Google wrote:

don't really need it

Try comparing a half-second or second's delay, VERSUS practically zero delay (if running BIND locally, for example, like I do). Those half-seconds add up, and get noticeable, and thus annoying.

But, the OP is saying that wget's own internal cache is ridiculously slow?? Weird. Maybe it's an IPV6-unavailable screwup.

Offline

#4 2010-11-12 17:43:52

Misfit138
Misfit Emeritus
From: USA
Registered: 2006-11-27
Posts: 4,189

Re: [TIPS] Improve wget process time with "--no-dns-cache" and "-4" option

[TIP] Add tips to the wiki.
wink

Offline

#5 2010-11-12 19:04:43

ijanos
Member
From: Budapest, Hungary
Registered: 2008-03-30
Posts: 443

Re: [TIPS] Improve wget process time with "--no-dns-cache" and "-4" option

brebs wrote:

But, the OP is saying that wget's own internal cache is ridiculously slow?

yeah that is a bit unclear, if wget's cahce is slow then report a bug, the point of a cache is to be faster than a full lookup.

Offline

#6 2010-11-12 20:29:22

ms
Member
From: Bandung, Indonesia
Registered: 2010-07-28
Posts: 81
Website

Re: [TIPS] Improve wget process time with "--no-dns-cache" and "-4" option

@Google: In fast Internet speed probably yes; if you don't bothered by second delay, like brebs said.

@brebs: Well, since you mentioned about IPv6-unavailable, I try the "-4" option and it also remove the delay at resolving host too.

@ijanos: The point is wget only run in short duration and it is not like long-running application (maybe like kget or Firefox, that continuously open) that you can pass a URL to it and let it downloaded for you. If you run wget 100 times, each of them in different process, do you think do you still need internal dns cache? I think not.

Offline

#7 2010-11-12 21:02:15

ijanos
Member
From: Budapest, Hungary
Registered: 2008-03-30
Posts: 443

Re: [TIPS] Improve wget process time with "--no-dns-cache" and "-4" option

ms wrote:

@ijanos: The point is wget only run in short duration and it is not like long-running application (maybe like kget or Firefox, that continuously open) that you can pass a URL to it and let it downloaded for you. If you run wget 100 times, each of them in different process, do you think do you still need internal dns cache? I think not.

Ok, I checked, the dns cache is in-memory and every wget process starts with a fresh empty new one, so it makes some sense that disabling it will gain you some time advantage. Still I am a bit reluctant to belive it is mesaurable. Do you have some numbers?

Also, if you are starting 100+ wget processes, you are doing it wrong. wget can batch download you know, and the iternal dns cache will speed up batch downloads. You mention pacman speedup, i did not check but i think pacman does not starts a new wget process for each package, and if I am right with this then disabling the wget dns cache will slow down pacman downloads.

Offline

#8 2010-11-12 23:23:55

ms
Member
From: Bandung, Indonesia
Registered: 2010-07-28
Posts: 81
Website

Re: [TIPS] Improve wget process time with "--no-dns-cache" and "-4" option

ijanos,

No, I don't have an exact number. If I measure it with 'time' command, that would result in inaccurate time, because I have an unstable connection (3G dialup - currently at ~4-7KB). But, I will try,

* wget without '--no-dns-cache' and '-4' options

rv77ax@bubu$ time wget -c "[url]http://www.google.com/images/srpr/nav_logo25.png[/url]"
--2010-11-13 05:53:21--  [url]http://www.google.com/images/srpr/nav_logo25.png[/url]
Resolving [url=http://www.google.com]www.google.com[/url]... 64.233.181.104
Connecting to [url=http://www.google.com]www.google.com[/url]|64.233.181.104|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [image/png]
Saving to: "nav_logo25.png"

    [                 <=>                                                                                                                                             ] 31,317      3.38K/s   in 9.1s    

2010-11-13 05:53:45 (3.38 KB/s) - "nav_logo25.png" saved [31317]


real    0m24.764s
user    0m0.020s
sys    0m0.007s

* wget with '--no-dns-cache' and '-4' options

rv77ax@bubu$ rm nav_logo25.png
rv77ax@bubu$ time wget -c --no-dns-cache -4 "[url]http://www.google.com/images/srpr/nav_logo25.png[/url]"
--2010-11-13 05:54:43--  [url]http://www.google.com/images/srpr/nav_logo25.png[/url]
Resolving [url=http://www.google.com]www.google.com[/url]... 64.233.181.104
Connecting to [url=http://www.google.com]www.google.com[/url]|64.233.181.104|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [image/png]
Saving to: "nav_logo25.png"

    [                    <=>                                                                                                                                          ] 31,317      2.65K/s   in 12s     

2010-11-13 05:54:56 (2.65 KB/s) - "nav_logo25.png" saved [31317]


real    0m12.952s
user    0m0.013s
sys    0m0.017s

You can see in the second test it take a longer time (12s vs 9.1s) to download the same file but have a short time to process (    0m12.952s vs 0m24.764s). If we substract real time with download time,

  • first test: 24.764 - 9.1 = 15.664

  • second test: 12.952 - 12 = 0.952

Batch download with wget is a little bit tricky. Some server allow recursive download and some server does not allow it (only accept direct file name), so, the only option is using script and use wget one by one. Oh, about pacman, last time I am remember it, it use a new wget process for each packages.

After this comment I will update my original post and add "-4" option, thanks to brebs.

Offline

Board footer

Powered by FluxBB