You are not logged in.
I'm having a problem that I'm reasonably certain that it's linux related but I have ran out of ideas how to debug.
As the title says, download speeds are unstable when I am connected by ethernet cable but it works as expected when connected via wifi. When connected over ethernet cable the downloads start and go up to my connection speed, which is around 1.4 to 1.5MB/s (I'm on an adsl connection), but then it's as if something chokes and the speed drops many times to zero, stays there for a while, then the speed starts to increase until it reaches 1.4 to 1.5MB/s again and it chokes again, rinse and repeat.
If I do the same download using exactly the same conditions except doing it over wifi the download speed stays at the connection maximum.
I'm seeing this problem on two separate machines connected over ethernet, my laptop and an arm box, so I guess this discards the possibility of a driver bug. The arm box (a cubieboard2) is running ALARM and both my laptop and that box are up-to-date. Laptop is running the current stable kernel and the arm box is running kernel 4.18.1.
The same download from the same laptop, using the same cable but this time from windows works as expected.
The speed between machines connected over ethernet is consistent with the connection speed (100Mb/s).
What I've tried so far:
- reset the router: didn't work and since it is the router provided the ISP it seems to restore some of the settings after reset. I'm not happy about this but from the evidence I have so far I would say the router is not to blame, at least completely.
- lowering the mtu size, no change
- adjusting the ethernet card tx/rx buffers to match what the diver on windows uses, no change.
- disable tcp window scaling (maybe I missed some detail on this one), no change
- force mtu size probing, no change
- using the LTS kernel also doesn't help
I have no more ideas of what to test, maybe someone will have more ideas taking into account the symptoms.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Maybe Windows is using ipv6, and your Linux box is using ipv4 - or even vice-versa.
Any funky iptables rules involved (e.g. dumbly blocking ICMP)?
Edit: Try with different values for Explicit Congestion Notification (especially disabling it), in case your router (or something else along the route) behaves badly.
Last edited by brebs (2018-08-26 13:11:25)
Offline
I forgot to mention I did try with ipv6 disabled and enabled and getting the IP via dhcp, just in case the router does not like statically assigned IP's (although it did work fine before).
I guess the firewall rules are not to blame since it works fine with wifi. I did try with the oldest available install media and I also saw the problem.
Windows is using ipv6, or at least gets and ipv6 address, and I did confirm that it is using an mtu of 1500, same as linux. I suspect that regardless of this ipv4 should be in use, I was using
wget https://glua.ua.pt/pub/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-1804.iso -O - > /dev/null
to test and glua.ua.pt seems to have only an ipv4 address.
In any case my current firewall rules are:
# Generated by iptables-save v1.6.2 on Sat Aug 25 21:10:13 2018
*raw
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -m rpfilter --invert -j DROP
COMMIT
# Completed on Sat Aug 25 21:10:13 2018
# Generated by iptables-save v1.6.2 on Sat Aug 25 21:10:13 2018
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
# Completed on Sat Aug 25 21:10:13 2018
# Generated by iptables-save v1.6.2 on Sat Aug 25 21:10:13 2018
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -s 192.168.56.0/24 -j MASQUERADE
COMMIT
# Completed on Sat Aug 25 21:10:13 2018
# Generated by iptables-save v1.6.2 on Sat Aug 25 21:10:13 2018
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:OPEN - [0:0]
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate INVALID -j DROP
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT ! -i brkvm -m set --match-set portscan src -j SET --add-set portscan src --exist
-A INPUT ! -i brkvm -m set --match-set portscan src -j REJECT --reject-with icmp-port-unreachable
-A INPUT -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -m conntrack --ctstate NEW -j OPEN
-A INPUT -p udp -m conntrack --ctstate NEW -j OPEN
-A INPUT ! -i brkvm -j SET --add-set portscan src
-A INPUT -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i brkvm -j ACCEPT
-A FORWARD -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT ! -d 127.0.0.1/32 -m owner --uid-owner 1009 -j REJECT --reject-with icmp-port-unreachable
-A OPEN -i brkvm -p tcp -m multiport --dports 139,445 -j ACCEPT
-A OPEN -i brkvm -p udp -m udp --dport 53 -j ACCEPT
COMMIT
# Completed on Sat Aug 25 21:10:13 2018
Edit:
Use code tags for wget command.
Last edited by R00KIE (2018-08-26 13:23:02)
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
These 3 rules are in a suboptimal order:
-A INPUT -m conntrack --ctstate INVALID -j DROP
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
The order should be:
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -m conntrack --ctstate INVALID -j DROP
... because of performance (the vast majority of the traffic will match with "RELATED,ESTABLISHED"), and because you want to ensure that icmp is accepted, rather than being (maybe) caught by "ctstate INVALID".
Tcpdump and wireshark will hopefully shed some light on the slowdowns.
Offline
Hmm, did you try to change the congestion control algorithm?
You can do that with
/proc/sys/net/ipv4/tcp_congestion_control
https://en.wikipedia.org/wiki/TCP_congestion_control
https://securenetweb.wordpress.com/2017 … comparing/
Last edited by progandy (2018-08-26 14:54:13)
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
@brebs
I did adjust my firewall rules according to your suggestion, but it didn't make a difference. Still you have a good point about performance.
I have done a couple of captures with wireshark that I can share. I'll leave links once I upload them somewhere.
I have tried a few more things that make me start to think the problem is above the kernel and drivers, see below.
@progandy
I did try reno and bbr but it didn't help. I suppose it might work if the change was done server side.
Now for the head scratcher:
If I run windows 7 in qemu and try the same download with firefox the download speeds are fine.
If I enter a centos5 chroot with systemd-nspawn and use wget the download speeds are fine.
The only difference is that both the virtual machine and the chroot use a bridge specifically configured for them and my laptop then acts as a router and does nat/masquerading. So it looks like:
qemu/chroot -> bridge -> nat -> ethernet -> my router -> internet at large.
What really puzzles me is that it works fine over wifi
Edit:
Captures link: https://www.dropbox.com/s/b32lnzkwgm3x9 … ar.gz?dl=0
Last edited by R00KIE (2018-08-26 16:59:26)
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
If you are using a bridge, that means the VMs have their own IP address. The wifi has its own IP address.
Just a WAG, could this be an IP address collision? Any chance the Arch Arm box and the Arch box are configured to the same IP address?
Any chance there are other IP address collisions on the LAN?
How about hostnames? Any collisions between machines there?
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
What networking services are you running? Noob question I know
Offline
@ewaller
The qemu and chroot are on a bridge but I'm doing nat so I guess for my router point of view it should all look the same as coming from my laptop via the ethernet cable.
I don't think this is an IP address collision, the cubieboard is statically set to 192.168.10.10, the router is 192.168.10.1 and assigning dhcp addresses including and above 192.168.10.110 and I've set my laptop with a static address of 192.168.10.100.
I see the same problem if I get the ethernet address by dhcp.
My laptop has the hostname set to 'arch' and the cubieboard has the hostname set to 'cubieboard2' so that's not it either.
@Slithery
On my laptop I'm not running much, just unbound and network manager.
On my cubieboard I have openssh, unbound, shadowsocks, nzbget, nfs, transmission, darkhttpd, lighttpd and a feed aggregator for which updates are triggered by a systemd timer.
Edit:
I should add I have not changed any configuration in the arm board or specific network settings in my laptop in a long while and this instability also happens when only the arm board and no other clients are connected to the router as I see this happen when login with ssh when I'm not home. So in that case there is no change I have IP collisions.
Last edited by R00KIE (2018-08-26 20:29:42)
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Differences in the traces:
* Arch sets the ECN (Explicit Congestion Notification) at the start, and 193.136.175.23 acknowledges it. Windows 7 does not set the ECN flag.
* In the Windows 7 VM, 193.136.175.23 keeps its window size at 131328. With Arch, the window size increases to 651156.
So, try turning ECN off in Arch. This can be done at several levels - easiest is at sysctl level:
cat /proc/sys/net/ipv4/tcp_ecn
echo 0 > /proc/sys/net/ipv4/tcp_ecn
Offline
Offline
@brebs
That seems to be the magic potion that makes things work. I'm not at home at the moment (I'll go back home only next weekend) but changing that on the cubieboard (ain't remote login nice ) restores the stable download speed I was expecting.
@loqs
It seems then that it will be automagically fixed soonish when a new release of systemd comes out with the patch/revert.
I'll try to read more on this soon, it's already way past sleep time and tomorrow is work day.
Thanks for the help figuring this one out.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Differences in the traces:
* Arch sets the ECN (Explicit Congestion Notification) at the start, and 193.136.175.23 acknowledges it. Windows 7 does not set the ECN flag.
* In the Windows 7 VM, 193.136.175.23 keeps its window size at 131328. With Arch, the window size increases to 651156.So, try turning ECN off in Arch. This can be done at several levels - easiest is at sysctl level:
cat /proc/sys/net/ipv4/tcp_ecn echo 0 > /proc/sys/net/ipv4/tcp_ecn
I'm impressed, but not found anything about that on Wiki, would you kindly add some lines to ?
Thank you if possible
Offline
I've added an entry to Network_configuration.
Hopefully it's just a consumer-level router at fault, which has an updated firmware available
Offline
Hopefully it's just a consumer-level router at fault, which has an updated firmware available
Consumer-level and available firmware updates are usually mutually exclusive even more so if the equipment is provided by the ISP, they like to have their own backdoors customizations so they can provide remote support.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline