Good find harpo
in /etc/resolv.conf
add the line to the end of the file
options single-request
]]>PING google.com (74.125.47.103) 56(124) bytes of data.
--- google.com ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms
Anyone else having these type or troubles or have any ideas on a fix?
]]>echo "options single-request" >> /etc/resolv.conf.head
- run nscd
- add 'options single-request' to /etc/resolv.confI am using the single-request fix now.
What I don't understand yet is why there aren't more reports of people encountering this problem, and why I see to have this problem with Opendns servers, which others report to work fine.
Thank you, that solved the issue for me! (also sorry for bumping old thread:cool:)
It's strange, if I instructed a program to specifically use IPv4, there would be no hang (I disabled IPv6). For example, tnftp, if passed -4, wouldn't hang.
I suspect this is happening in getattrinfo(), because, by default, tnftp passes AF_UNSPEC, and that somehow causes it to hang (-4 option just changes it to AF_INET and it works fine).
Anyway, I'm glad I have some workaround, even if it's not a stellar one.
EDIT: oh yeah, I just realised it's a cache deamon.
]]>I guess it's the same in your case. You said above that you have the problem with opendns and your ISPs own DNS servers, so the problem is somewhere in your own setup. Most probably your router cannot deal with the parallel DNS lookups well. But most others can. That's what U. Drepper was saying in his post: some users will have problems, but most won't. It just feels strange when YOU are the one affected and all the rest seem to be doing fine. But as long as there's an easy workaround, it's all fine.
Thanks for the response. I see that most others have no issue, which is why this is getting interesting. Please bear in mind that the issue is occurring for me at work as well (Tier 1 IP), on a different host, so this is not due to some hokey router setup.
I decided to fire up wireshark and see what is actually going on. These packets are from an ssh to www.google.com (doesn't matter that I can't connect, it is just the DNS lookup that we care about, and I can always reproduce the problem with ssh), and using OpenDNS nameservers.
The first attempt invariably has no problem. We see two requests, for A (IPv4) and AAAA records (IPv6), and two immediate replies with DNS answers:
No. Time Source Destination Protocol Info
28 2.750721 10.181.35.108 208.67.222.222 DNS Standard query A www.google.com
29 2.750754 10.181.35.108 208.67.222.222 DNS Standard query AAAA www.google.com
30 2.769332 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com A 208.69.32.230 A 208.69.32.231
31 2.773500 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com
A few seconds later I rerun the command. This time we see the two requests, a valid response for IPv4, then timeout 5 secs and resend the two requests, this time getting two answers, and we're done:
No. Time Source Destination Protocol Info
14 1.668597 10.181.35.108 208.67.222.222 DNS Standard query A www.google.com
15 1.668627 10.181.35.108 208.67.222.222 DNS Standard query AAAA www.google.com
16 1.675936 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com A 208.69.32.230 A 208.69.32.231
... PAUSE 5 seconds here ...
42 6.671249 10.181.35.108 208.67.222.222 DNS Standard query A www.google.com
43 6.685842 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com A 208.69.32.231 A 208.69.32.230
44 6.685895 10.181.35.108 208.67.222.222 DNS Standard query AAAA www.google.com
45 6.692457 208.67.222.222 10.181.35.108 DNS Standard query response CNAME google.navigation.opendns.com
Looks like the resolver sends parallel requests, fails to see the IPv6 response, waits 5 sec and sends sequential requests because it thinks the nameserver is broken. Any DNS gurus out there who can explain what is happening?
]]>What I don't understand yet is why there aren't more reports of people encountering this problem, and why I see to have this problem with Opendns servers, which others report to work fine.
I remember that when kernel 2.6.17 came out some years ago my internet would hardly work. After some investigation I found that it was due to implementing TCP Window Scaling in the kernel for the first time. Disabling it solved the problem ever since.
I thought there would be LOTS of people complaining about this problem, but it didn't happen. Just a small percentage of users were affected. It turns out that most routers do support TCP Window Scaling correctly, but not mine and a few others.
I guess it's the same in your case. You said above that you have the problem with opendns and your ISPs own DNS servers, so the problem is somewhere in your own setup. Most probably your router cannot deal with the parallel DNS lookups well. But most others can. That's what U. Drepper was saying in his post: some users will have problems, but most won't. It just feels strange when YOU are the one affected and all the rest seem to be doing fine. But as long as there's an easy workaround, it's all fine.
]]>Hi folks
The last few days I have had the following problem on fully updated x86_64. DNS lookups are taking 5-6 seconds when running network clients such as ssh, telnet, wget. The client hangs while resolving the hostname. If I use IP address or put host in /etc/hosts, there is no issue. The really weird thing is that DNS utils like dig, host, nslookup always resolve immediately (dig query times ~10msec).
Solved ... kinda. I have at least found out the reason for the issue. As described here:
http://udrepper.livejournal.com/20948.html (see DNS NSS improvement).
It seems that glibc 2.10 implements a timeout to handle broken DNS servers (or firewalls) that can't handle a parallel lookup of IPv4 and IPv6 addresses. The following workarounds all work for me:
- run dnsmasq
- run nscd
- add 'options single-request' to /etc/resolv.conf
I am using the single-request fix now.
What I don't understand yet is why there aren't more reports of people encountering this problem, and why I see to have this problem with Opendns servers, which others report to work fine.
]]>Thanks, but that made no difference.
]]>The last few days I have had the following problem on fully updated x86_64. DNS lookups are taking 5-6 seconds when running network clients such as ssh, telnet, wget. The client hangs while resolving the hostname. If I use IP address or put host in /etc/hosts, there is no issue. The really weird thing is that DNS utils like dig, host, nslookup always resolve immediately (dig query times ~10msec).
- problem is independent of nameservers used, I have tested with my ISP's, my company's (tier 1), and opendns: no change
- not an /etc/resolv.conf problem, the following trivial file exhibits the issue:
nameserver 208.67.222.222
nameserver 208.67.220.220
- not an /etc/nsswitch.conf issue:
hosts: files dns
- /etc/host.conf looks like this:
order hosts,bind
multi on
- I disabled IPv6: it is not that
- sometimes the first connection to a server has no problem, but subsequent lookups are slow; there is a refractory period on this: if I wait a while (minutes rather than hours) and try again the cycle repeats
- if I use dnsmasq the problem goes away
This is not a network issue; this is my home machine, but yesterday I installed arch on my office machine, which had been running Fedora 9 with no issue, and it immediately showed the exact same behaviour.
I suspected /lib/libnss_dns.so, and recently got a glibc upgrade, so I tried downgrading from glibc-2.10.1-3 to glibc-2.10.1-2. Made no difference.
Anyone else seeing this issue?
Here is what I think is the relevant part of an strace on wget:
strace wget www.google.com
...
open("/lib/libnss_dns.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\17\0\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=27260, ...}) = 0
mmap(NULL, 2117888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2176e41000
mprotect(0x7f2176e46000, 2093056, PROT_NONE) = 0
mmap(0x7f2177045000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7f2177045000
close(3) = 0
open("/lib/libresolv.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2408\0\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=94587, ...}) = 0
mmap(NULL, 2185864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2176c2b000
mprotect(0x7f2176c3e000, 2093056, PROT_NONE) = 0
mmap(0x7f2176e3d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12000) = 0x7f2176e3d000
mmap(0x7f2176e3f000, 6792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2176e3f000
close(3) = 0
mprotect(0x7f2176e3d000, 4096, PROT_READ) = 0
mprotect(0x7f2177045000, 4096, PROT_READ) = 0
munmap(0x7f21785d0000, 64550) = 0
socket(PF_INET, 0x802 /* SOCK_??? */, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("208.67.222.222")}, 28) = 0
poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, ".w\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1"..., 32, MSG_NOSIGNAL, NULL, 0) = 32
poll([{fd=3, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, "\274\5\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\34\0\1"..., 32, MSG_NOSIGNAL, NULL, 0) = 32
poll([{fd=3, events=POLLIN}], 1, 4999) = 1 ([{fd=3, revents=POLLIN}])
ioctl(3, FIONREAD, [104]) = 0
recvfrom(3, ".w\201\200\0\1\0\3\0\0\0\0\3www\6google\3com\0\0\1\0\1\300"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("208.67.222.222")}, [16]) = 104
poll([{fd=3, events=POLLIN}], 1, 4989
and that is where it hangs for 5-6 seconds.
Cheers
harpo