You are not logged in.
I just wanted to point out that there is a bug in the current 4.10.1 kernel which causes connection timeouts.
I ran into that when i upgraded my home server yesterday. It just stopped to connect to local DB calls in running web services and stuff like that. I downgraded to the latest 4.9.11 kernel and the issue is gone.
More details here: https://bugzilla.kernel.org/show_bug.cgi?id=194723
Easy way to verify: ncat -k -l 19999 & C=1 ; while true ; do echo -n "$C " ; echo ping | ncat localhost 19999 ; C=`expr $C + 1` ; sleep 1 ; done
Last edited by sulaweyo (2017-03-12 13:26:23)
Offline
I can confirm this bug, running lts kernel until this is resolved.
Offline
can't reproduce this here, I wonder why? localhost resolves to an ipv6 address by default.
$ getent hosts localhost
::1 localhost
Offline
@damjan, it only is an issue if you have a service running on ipv4: 127.0.0.1
Offline
I can confirm this problem. Update to kernel 4.10.1-1 and gcc, etc.. broke bind9. During startup bind doesn't get past the 'libseccomp sandboxing active' command and does not load /etc/named.conf. As a result, bind is left dead -- could not connect to 127.0.0.1#953 (e.g. rndc: connect failed: 127.0.0.1#953: connection refused).
I tried individually downgrading packages installed during update on 3/10, but the only thing that worked was downgrading the kernel, gcc and the firmware. Otherwise I get:
# rndc -V sync --clean
create memory context
create socket manager
<snip>
using server 127.0.0.1 (127.0.0.1#953)
create socket
bind socket
connect
rndc: connect failed: 127.0.0.1#953: connection refused
After downgrade of linux (4.10.1-1 -> 4.9.9-1) and associated downgrade of linux-api-headers, linux-firmware (20170227.5abb924-1 -> 20170217.12987ca-2), gcc, gcc-libs, glibc, openresolv, binutils, cifs-utils, libinput, xf86-input-libinput, and valgrind -- bind9 is working well again.
Last edited by drankinatty (2017-03-12 22:30:54)
David C. Rankin, J.D.,P.E.
Offline
so, for ipv4 only …
ncat -4 -k -l 19999 & C=1 ; while true ; do echo -n "$C " ; echo ping | ncat -4 localhost 19999 ; C=`expr $C + 1` ; sleep 1 ; done
still don't have the issue
Offline
Damjan, are there any other tests I can do on my end that may help narrow this down. I have another server that was broken by the 4.10 upgrade, but instead of downgrading, I have switched to 4.9-lts. Bind is OK there, but X will not start (strange -- never had linux/linux-lts problems running X with the basic display drivers before...) Anyway, I have that box that I can test the current config on. When I first discovered the issue, I wanted to check for updates -- hard without name resolution, so I ended up pinging a repo and just putting the IP in mirrorlist -- worked, but no update to fix this problem :(
Not sure what your ncat test is supposed to show, but with 4.9-lts, it dies at 107, e.g.
1 ping
...
103 ping
104 ping
105 ping
106 ping
107
Last edited by drankinatty (2017-03-13 12:19:24)
David C. Rankin, J.D.,P.E.
Offline
@drankinatty
I don't know, different people complain about different things (resolving, localhost issues, kernel, glibc??).
you should narrow down what the issues is. does `traceroute -n ...` work, does `ping -n localhost` work, does `getent hosts some.domain` work etc.
Offline
I comfirm bind/named issue here also. I have been running the 4.10.0 kernel here for a couple of weeks with out any problem until my update yesterday. Named loads and then dies and can not git rid of the defunct process until after reboot:
376 ? 00:00:00 named <defunct>
I put in the Google servers instead of 127.0.0.1 and get internet just fine.
I suspect one of these packages:
dnssec-anchors
openresolv
network-manager
Offline
@slick517d i can drop network-manager from that list as i don't have that installed
Offline
They fixed it with an upgrade with bind & bind-tools packages. Had to reenable named and rebooted.
Offline
Yesterday evening i was still able to reproduce it on all my machines. None of them has bind installed but bind-tools
Now i just tested on my work machine and i can not reproduce it there with the latest updates. I'll verify when i get back home
Offline
@sulaweyu I use my desktop as it's own dns server which in turn uses 127.0.0.1 ip address for look up. That part broke with 2 days ago update. For some reason named daemon (bind) would load and then die leaving a defunct process. The bind (named) update yesterday fixed that for me.
There seems to be other issues going on in this thread so the ones here that was using 127.0.0.1 for dns your issue is probably fixed.
Last edited by slick517d (2017-03-15 14:44:31)
Offline
I can still reproduce it on all my machines at home while i can't at work. Have to dig deeper..
Last edited by sulaweyo (2017-03-15 16:59:50)
Offline
@sulaweyu:
It is appearing like @damjan stated there are more things going on than one issue.
For clarification here I do not use the arch kernel or it's .config. I follow another kernel / .config with modified dvb modules designed for dvb blind scanning and higher bit rate capabilities.
So it appears either you are not using your own local dns server or if you are may be not using bind and some other resolver or the problem may be back to being with arch's new kernel and or .config but would not be the case if your home computer and work computer is presently running the same kernel.
Good luck on hunting down the issue
Last edited by slick517d (2017-03-15 18:58:22)
Offline
@sulaweyo on the affected systems what is the output of
$ cat /proc/sys/net/ipv4/tcp_tw_recycle
Have you been able to produce the same bisection as in https://bugzilla.kernel.org/show_bug.cgi?id=194723#c15?
Offline
Jep that workaround fixes the issue on all my nodes
To test:
echo 0 >/proc/sys/net/ipv4/tcp_tw_recycle
Permanent via sysctl:
net.ipv4.tcp_tw_recycle = 0
Offline
I wonder why on your systems it was set to 1 on this system it is set to 0.
Offline
After updates today, all appears to be working fine (I'm one of the initial reporters that rely on bind9 for mail host/web host name resolution, dhcpd w/dyn_updates, etc.). So what was broken in 4.10.1/glibc 2.25/bind9 now appears working. (at least for my setup, which relies completely on bind9)
Last edited by drankinatty (2017-03-16 23:57:49)
David C. Rankin, J.D.,P.E.
Offline
After updates today, all appears to be working fine (I'm one of the initial reporters that rely on bind9 for mail host/web host name resolution, dhcpd w/dyn_updates, etc.). So what was broken in 4.10.1/glibc 2.25/bind9 now appears working. (at least for my setup, which relies completely on bind9)
@drankinatty In your working setup, what settings do you currently have for these two values:
cat /proc/sys/net/ipv4/tcp_tw_recycle
cat /proc/sys/net/ipv4/tcp_timestamps
Offline