You are not logged in.

#1 2017-03-12 13:26:05

sulaweyo
Member
From: Austria
Registered: 2013-07-08
Posts: 24

Connect timeouts in kernel 4.10.1

I just wanted to point out that there is a bug in the current 4.10.1 kernel which causes connection timeouts.
I ran into that when i upgraded my home server yesterday. It just stopped to connect to local DB calls in running web services and stuff like that. I downgraded to the latest 4.9.11 kernel and the issue is gone.

More details here: https://bugzilla.kernel.org/show_bug.cgi?id=194723

Easy way to verify: ncat -k -l 19999 & C=1 ; while true ; do echo -n "$C " ; echo ping | ncat localhost 19999 ; C=`expr $C + 1` ; sleep 1 ; done

Last edited by sulaweyo (2017-03-12 13:26:23)

Offline

#2 2017-03-12 15:39:22

aboe
Member
From: Netherlands
Registered: 2006-10-23
Posts: 19

Re: Connect timeouts in kernel 4.10.1

I can confirm this bug, running lts kernel until this is resolved.

Offline

#3 2017-03-12 16:00:54

damjan
Member
Registered: 2006-05-30
Posts: 452

Re: Connect timeouts in kernel 4.10.1

can't reproduce this here, I wonder why? localhost resolves to an ipv6 address by default.

$ getent hosts localhost
::1             localhost

Offline

#4 2017-03-12 19:35:58

aboe
Member
From: Netherlands
Registered: 2006-10-23
Posts: 19

Re: Connect timeouts in kernel 4.10.1

@damjan, it only is an issue if you have a service running on ipv4: 127.0.0.1

Offline

#5 2017-03-12 22:17:51

drankinatty
Member
From: Nacogdoches, Texas
Registered: 2009-04-24
Posts: 70
Website

Re: Connect timeouts in kernel 4.10.1

I can confirm this problem. Update to kernel 4.10.1-1 and gcc, etc.. broke bind9. During startup bind doesn't get past the 'libseccomp sandboxing active' command and does not load /etc/named.conf. As a result, bind is left dead -- could not connect to 127.0.0.1#953 (e.g. rndc: connect failed: 127.0.0.1#953: connection refused).

I tried individually downgrading packages installed during update on 3/10, but the only thing that worked was downgrading the kernel, gcc and the firmware. Otherwise I get:

    # rndc -V sync --clean
    create memory context
    create socket manager
    <snip>
    using server 127.0.0.1 (127.0.0.1#953)
    create socket
    bind socket
    connect
    rndc: connect failed: 127.0.0.1#953: connection refused

After downgrade of linux (4.10.1-1 -> 4.9.9-1) and associated downgrade of linux-api-headers, linux-firmware (20170227.5abb924-1 -> 20170217.12987ca-2), gcc, gcc-libs, glibc, openresolv, binutils, cifs-utils, libinput, xf86-input-libinput, and valgrind -- bind9 is working well again.

Last edited by drankinatty (2017-03-12 22:30:54)


David C. Rankin, J.D.,P.E.

Offline

#6 2017-03-12 22:45:14

damjan
Member
Registered: 2006-05-30
Posts: 452

Re: Connect timeouts in kernel 4.10.1

so, for ipv4 only …

ncat -4 -k -l 19999 & C=1 ; while true ; do echo -n "$C " ; echo ping | ncat -4 localhost 19999 ; C=`expr $C + 1` ; sleep 1 ; done

still don't have the issue

Offline

#7 2017-03-13 12:11:07

drankinatty
Member
From: Nacogdoches, Texas
Registered: 2009-04-24
Posts: 70
Website

Re: Connect timeouts in kernel 4.10.1

Damjan, are there any other tests I can do on my end that may help narrow this down. I have another server that was broken by the 4.10 upgrade, but instead of downgrading, I have switched to 4.9-lts. Bind is OK there, but X will not start (strange -- never had linux/linux-lts problems running X with the basic display drivers before...) Anyway, I have that box that I can test the current config on. When I first discovered the issue, I wanted to check for updates -- hard without name resolution, so I ended up pinging a repo and just putting the IP in mirrorlist -- worked, but no update to fix this problem :(

Not sure what your ncat test is supposed to show, but with 4.9-lts, it dies at 107, e.g.

    1 ping
    ...
    103 ping
    104 ping
    105 ping
    106 ping
    107

Last edited by drankinatty (2017-03-13 12:19:24)


David C. Rankin, J.D.,P.E.

Offline

#8 2017-03-14 11:39:20

damjan
Member
Registered: 2006-05-30
Posts: 452

Re: Connect timeouts in kernel 4.10.1

@drankinatty

I don't know, different people complain about different things (resolving, localhost issues, kernel, glibc??).


you should narrow down what the issues is. does `traceroute -n ...` work, does `ping -n localhost` work, does `getent hosts some.domain` work etc.

Offline

#9 2017-03-14 12:24:39

slick517d
Member
Registered: 2013-01-15
Posts: 18

Re: Connect timeouts in kernel 4.10.1

I comfirm bind/named issue here also. I have been running the 4.10.0 kernel here for a couple of weeks with out any problem until my update yesterday. Named loads and then dies and can not git rid of the defunct process until after reboot:

376 ?        00:00:00 named <defunct>

I put in the Google servers instead of 127.0.0.1 and get internet just fine.

I suspect one of these packages:

dnssec-anchors
openresolv
network-manager

Offline

#10 2017-03-14 14:48:30

sulaweyo
Member
From: Austria
Registered: 2013-07-08
Posts: 24

Re: Connect timeouts in kernel 4.10.1

@slick517d i can drop network-manager from that list as i don't have that installed

Offline

#11 2017-03-15 03:27:19

slick517d
Member
Registered: 2013-01-15
Posts: 18

Re: Connect timeouts in kernel 4.10.1

They fixed it with an upgrade with bind & bind-tools packages. Had to reenable named and rebooted.

Offline

#12 2017-03-15 07:13:22

sulaweyo
Member
From: Austria
Registered: 2013-07-08
Posts: 24

Re: Connect timeouts in kernel 4.10.1

Yesterday evening i was still able to reproduce it on all my machines. None of them has bind installed but bind-tools
Now i just tested on my work machine and i can not reproduce it there with the latest updates. I'll verify when i get back home

Offline

#13 2017-03-15 14:36:51

slick517d
Member
Registered: 2013-01-15
Posts: 18

Re: Connect timeouts in kernel 4.10.1

@sulaweyu I use my desktop as it's own dns server which in turn uses 127.0.0.1 ip address for look up. That part broke with 2 days ago update. For some reason named daemon (bind) would load and then die leaving a defunct process. The bind (named) update yesterday fixed that for me.

There seems to be other issues going on in this thread so the ones here that was using 127.0.0.1 for dns your issue is probably fixed.

Last edited by slick517d (2017-03-15 14:44:31)

Offline

#14 2017-03-15 16:55:35

sulaweyo
Member
From: Austria
Registered: 2013-07-08
Posts: 24

Re: Connect timeouts in kernel 4.10.1

I can still reproduce it on all my machines at home while i can't at work. Have to dig deeper..

Last edited by sulaweyo (2017-03-15 16:59:50)

Offline

#15 2017-03-15 18:55:10

slick517d
Member
Registered: 2013-01-15
Posts: 18

Re: Connect timeouts in kernel 4.10.1

@sulaweyu:

It is appearing like @damjan stated there are more things going on than one issue.

For clarification here I do not use the arch kernel or it's .config. I follow another kernel / .config with modified dvb modules designed for dvb blind scanning and higher bit rate capabilities.

So it appears either you are not using your own local dns server or if you are may be not using bind and some other resolver or the problem may be back to being with arch's new kernel and or .config but would not be the case if your home computer and work computer is presently running the same kernel.

Good luck on hunting down the issue hmm

Last edited by slick517d (2017-03-15 18:58:22)

Offline

#16 2017-03-16 16:02:20

loqs
Member
Registered: 2014-03-06
Posts: 17,376

Re: Connect timeouts in kernel 4.10.1

@sulaweyo on the affected systems what is the output of

$ cat /proc/sys/net/ipv4/tcp_tw_recycle

Have you been able to produce the same bisection as in https://bugzilla.kernel.org/show_bug.cgi?id=194723#c15?

Offline

#17 2017-03-16 18:04:23

sulaweyo
Member
From: Austria
Registered: 2013-07-08
Posts: 24

Re: Connect timeouts in kernel 4.10.1

Jep that workaround fixes the issue on all my nodes

To test:

echo 0 >/proc/sys/net/ipv4/tcp_tw_recycle

Permanent via sysctl:

net.ipv4.tcp_tw_recycle = 0

Offline

#18 2017-03-16 18:22:05

loqs
Member
Registered: 2014-03-06
Posts: 17,376

Re: Connect timeouts in kernel 4.10.1

I wonder why on your systems it was set to 1 on this system it is set to 0.

Offline

#19 2017-03-16 23:57:33

drankinatty
Member
From: Nacogdoches, Texas
Registered: 2009-04-24
Posts: 70
Website

Re: Connect timeouts in kernel 4.10.1

After updates today, all appears to be working fine (I'm one of the initial reporters that rely on bind9 for mail host/web host name resolution, dhcpd w/dyn_updates, etc.). So what was broken in 4.10.1/glibc 2.25/bind9 now appears working. (at least for my setup, which relies completely on bind9)

Last edited by drankinatty (2017-03-16 23:57:49)


David C. Rankin, J.D.,P.E.

Offline

#20 2017-03-22 14:25:12

twelveeighty
Member
From: Alberta, Canada
Registered: 2011-09-04
Posts: 1,096

Re: Connect timeouts in kernel 4.10.1

drankinatty wrote:

After updates today, all appears to be working fine (I'm one of the initial reporters that rely on bind9 for mail host/web host name resolution, dhcpd w/dyn_updates, etc.). So what was broken in 4.10.1/glibc 2.25/bind9 now appears working. (at least for my setup, which relies completely on bind9)

@drankinatty In your working setup, what settings do you currently have for these two values:

cat /proc/sys/net/ipv4/tcp_tw_recycle
cat /proc/sys/net/ipv4/tcp_timestamps

Offline

Board footer

Powered by FluxBB