You are not logged in.
Pages: 1
Topic closed
Hi folks
The last few days I have had the following problem on fully updated x86_64. DNS lookups are taking 5-6 seconds when running network clients such as ssh, telnet, wget. The client hangs while resolving the hostname. If I use IP address or put host in /etc/hosts, there is no issue. The really weird thing is that DNS utils like dig, host, nslookup always resolve immediately (dig query times ~10msec).
- problem is independent of nameservers used, I have tested with my ISP's, my company's (tier 1), and opendns: no change
- not an /etc/resolv.conf problem, the following trivial file exhibits the issue:
nameserver 208.67.222.222
nameserver 208.67.220.220
- not an /etc/nsswitch.conf issue:
hosts: files dns
- /etc/host.conf looks like this:
order hosts,bind
multi on
- I disabled IPv6: it is not that
- sometimes the first connection to a server has no problem, but subsequent lookups are slow; there is a refractory period on this: if I wait a while (minutes rather than hours) and try again the cycle repeats
- if I use dnsmasq the problem goes away
This is not a network issue; this is my home machine, but yesterday I installed arch on my office machine, which had been running Fedora 9 with no issue, and it immediately showed the exact same behaviour.
I suspected /lib/libnss_dns.so, and recently got a glibc upgrade, so I tried downgrading from glibc-2.10.1-3 to glibc-2.10.1-2. Made no difference.
Anyone else seeing this issue?
Here is what I think is the relevant part of an strace on wget:
strace wget www.google.com
...
open("/lib/libnss_dns.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\17\0\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=27260, ...}) = 0
mmap(NULL, 2117888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2176e41000
mprotect(0x7f2176e46000, 2093056, PROT_NONE) = 0
mmap(0x7f2177045000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7f2177045000
close(3) = 0
open("/lib/libresolv.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2408\0\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=94587, ...}) = 0
mmap(NULL, 2185864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2176c2b000
mprotect(0x7f2176c3e000, 2093056, PROT_NONE) = 0
mmap(0x7f2176e3d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12000) = 0x7f2176e3d000
mmap(0x7f2176e3f000, 6792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2176e3f000
close(3) = 0
mprotect(0x7f2176e3d000, 4096, PROT_READ) = 0
mprotect(0x7f2177045000, 4096, PROT_READ) = 0
munmap(0x7f21785d0000, 64550) = 0
socket(PF_INET, 0x802 /* SOCK_??? */, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("208.67.222.222")}, 28) = 0
poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, ".w\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1"..., 32, MSG_NOSIGNAL, NULL, 0) = 32
poll([{fd=3, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, "\274\5\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\34\0\1"..., 32, MSG_NOSIGNAL, NULL, 0) = 32
poll([{fd=3, events=POLLIN}], 1, 4999) = 1 ([{fd=3, revents=POLLIN}])
ioctl(3, FIONREAD, [104]) = 0
recvfrom(3, ".w\201\200\0\1\0\3\0\0\0\0\3www\6google\3com\0\0\1\0\1\300"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("208.67.222.222")}, [16]) = 104
poll([{fd=3, events=POLLIN}], 1, 4989
and that is where it hangs for 5-6 seconds.
Cheers
harpo
Last edited by harpo (2009-07-13 17:44:52)
Offline
"I'm Winston Wolfe. I solve problems."
~ Need moar games? [arch-games] ~ [aurcheck] AUR haz updates? ~
Offline
Thanks, but that made no difference.
Offline
Hi folks
The last few days I have had the following problem on fully updated x86_64. DNS lookups are taking 5-6 seconds when running network clients such as ssh, telnet, wget. The client hangs while resolving the hostname. If I use IP address or put host in /etc/hosts, there is no issue. The really weird thing is that DNS utils like dig, host, nslookup always resolve immediately (dig query times ~10msec).
Solved ... kinda. I have at least found out the reason for the issue. As described here:
http://udrepper.livejournal.com/20948.html (see DNS NSS improvement).
It seems that glibc 2.10 implements a timeout to handle broken DNS servers (or firewalls) that can't handle a parallel lookup of IPv4 and IPv6 addresses. The following workarounds all work for me:
- run dnsmasq
- run nscd
- add 'options single-request' to /etc/resolv.conf
I am using the single-request fix now.
What I don't understand yet is why there aren't more reports of people encountering this problem, and why I see to have this problem with Opendns servers, which others report to work fine.
Offline
What I don't understand yet is why there aren't more reports of people encountering this problem, and why I see to have this problem with Opendns servers, which others report to work fine.
I remember that when kernel 2.6.17 came out some years ago my internet would hardly work. After some investigation I found that it was due to implementing TCP Window Scaling in the kernel for the first time. Disabling it solved the problem ever since.
I thought there would be LOTS of people complaining about this problem, but it didn't happen. Just a small percentage of users were affected. It turns out that most routers do support TCP Window Scaling correctly, but not mine and a few others.
I guess it's the same in your case. You said above that you have the problem with opendns and your ISPs own DNS servers, so the problem is somewhere in your own setup. Most probably your router cannot deal with the parallel DNS lookups well. But most others can. That's what U. Drepper was saying in his post: some users will have problems, but most won't. It just feels strange when YOU are the one affected and all the rest seem to be doing fine. But as long as there's an easy workaround, it's all fine.
Offline
I guess it's the same in your case. You said above that you have the problem with opendns and your ISPs own DNS servers, so the problem is somewhere in your own setup. Most probably your router cannot deal with the parallel DNS lookups well. But most others can. That's what U. Drepper was saying in his post: some users will have problems, but most won't. It just feels strange when YOU are the one affected and all the rest seem to be doing fine. But as long as there's an easy workaround, it's all fine.
Thanks for the response. I see that most others have no issue, which is why this is getting interesting. Please bear in mind that the issue is occurring for me at work as well (Tier 1 IP), on a different host, so this is not due to some hokey router setup.
I decided to fire up wireshark and see what is actually going on. These packets are from an ssh to www.google.com (doesn't matter that I can't connect, it is just the DNS lookup that we care about, and I can always reproduce the problem with ssh), and using OpenDNS nameservers.
The first attempt invariably has no problem. We see two requests, for A (IPv4) and AAAA records (IPv6), and two immediate replies with DNS answers:
No. Time Source Destination Protocol Info
28 2.750721 10.181.35.108 208.67.222.222 DNS Standard query A www.google.com
29 2.750754 10.181.35.108 208.67.222.222 DNS Standard query AAAA www.google.com
30 2.769332 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com A 208.69.32.230 A 208.69.32.231
31 2.773500 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com
A few seconds later I rerun the command. This time we see the two requests, a valid response for IPv4, then timeout 5 secs and resend the two requests, this time getting two answers, and we're done:
No. Time Source Destination Protocol Info
14 1.668597 10.181.35.108 208.67.222.222 DNS Standard query A www.google.com
15 1.668627 10.181.35.108 208.67.222.222 DNS Standard query AAAA www.google.com
16 1.675936 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com A 208.69.32.230 A 208.69.32.231
... PAUSE 5 seconds here ...
42 6.671249 10.181.35.108 208.67.222.222 DNS Standard query A www.google.com
43 6.685842 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com A 208.69.32.231 A 208.69.32.230
44 6.685895 10.181.35.108 208.67.222.222 DNS Standard query AAAA www.google.com
45 6.692457 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com
Looks like the resolver sends parallel requests, fails to see the IPv6 response, waits 5 sec and sends sequential requests because it thinks the nameserver is broken. Any DNS gurus out there who can explain what is happening?
Offline
- run nscd
- add 'options single-request' to /etc/resolv.confI am using the single-request fix now.
What I don't understand yet is why there aren't more reports of people encountering this problem, and why I see to have this problem with Opendns servers, which others report to work fine.
Thank you, that solved the issue for me! (also sorry for bumping old thread:cool:)
It's strange, if I instructed a program to specifically use IPv4, there would be no hang (I disabled IPv6). For example, tnftp, if passed -4, wouldn't hang.
I suspect this is happening in getattrinfo(), because, by default, tnftp passes AF_UNSPEC, and that somehow causes it to hang (-4 option just changes it to AF_INET and it works fine).
Anyway, I'm glad I have some workaround, even if it's not a stellar one.
EDIT: oh yeah, I just realised it's a cache deamon.
Last edited by Xartrix (2009-08-21 10:09:21)
Offline
Hey thanks, Harpo. This is one of those things I saw and said it's gonna happen to me and it did. I knew I had to do a reinstall soon (doing it now) and got the 5 second pause between lookups you described. Found this a bit odd because just doing normal updates I didn't run across this but on a new install, sure enough. Anyways, thanks for the temporary fix. Using the 'options single-request' worked for me too.
echo "options single-request" >> /etc/resolv.conf.head
Setting Up a Scripting Environment | Proud donor to wikipedia - link
Offline
Huh, I'm getting this again after a large update the other day. I have rmmod'd the ipv6 module and have the single-request option in /etc/resolv.conf. I saw on another forum to use "options timeout:1" that does seem to help with some websites. I've gone through some new websites and they will load quickly (3-5 seconds" while other ones will take about 20 seconds. I just did a "ping -c2 google.com" and it took about 15 seconds but shows me:
PING google.com (74.125.47.103) 56(124) bytes of data.
--- google.com ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms
Anyone else having these type or troubles or have any ideas on a fix?
Last edited by Gen2ly (2010-06-11 14:57:06)
Setting Up a Scripting Environment | Proud donor to wikipedia - link
Offline
I'm lonely
Setting Up a Scripting Environment | Proud donor to wikipedia - link
Offline
** waves **
Setting Up a Scripting Environment | Proud donor to wikipedia - link
Offline
Old thread but still an issue and i think it was after updates after a fresh install, adding that option to resolv.conf fixes this issue of slow dns lookups.
Good find harpo
in /etc/resolv.conf
add the line to the end of the file
options single-request
Offline
Glad it is solved.
This thread has run its course. Closing.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Online
Pages: 1
Topic closed