You are not logged in.

#1 2023-11-12 15:45:21

8472
Member
From: Slovakia
Registered: 2010-05-15
Posts: 88

[SOLVED] nftables & port forwarding not working

Hi,
I've been trying to create a simple port forwarding with the nftables, but can't get it working.
For start, I've tested following solutions, but none of it works and I can't find why:
- https://bbs.archlinux.org/viewtopic.php … 7#p1710367
- https://wiki.nftables.org/wiki-nftables … nation_NAT

What I wan't is a simple:

curl http://127.0.0.1 = forward do => http://192.168.1.2 
[root@archlinux ~]#  cat /etc/nftables.conf
#!/usr/bin/nft -f

table inet filter {
  chain input {
    type filter hook input priority 0;

    # Connection state based:
    ct state {established, related} accept
    ct state invalid drop

    # Allow loopback and ICMP:
    iifname lo accept
    ip protocol icmp accept
    ip6 nexthdr icmpv6 accept

    # Allow local traffic to port 443
    tcp dport 80 accept

    # Reject everything else
    reject with icmp type port-unreachable
  }

  chain forward {
    type filter hook forward priority 0;
  }

  chain output {
    type filter hook output priority 0;
  }
}

table ip nat {
  chain prerouting {
    type nat hook prerouting priority dstnat;
    tcp dport http dnat to 192.168.1.2
  }
  chain postrouting {
    type nat hook postrouting priority srcnat;
    masquerade
  }
}
[root@archlinux ~]# nft flush ruleset; systemctl restart nftables.service; nft -a list ruleset
table inet filter { # handle 51
	chain input { # handle 1
		type filter hook input priority filter; policy accept;
		ct state { established, related } accept # handle 5
		ct state invalid drop # handle 6
		iifname "lo" accept # handle 7
		ip protocol icmp accept # handle 8
		ip6 nexthdr ipv6-icmp accept # handle 9
		tcp dport 80 accept # handle 10
		reject with icmp port-unreachable # handle 11
	}

	chain forward { # handle 2
		type filter hook forward priority filter; policy accept;
	}

	chain output { # handle 3
		type filter hook output priority filter; policy accept;
	}
}
table ip nat { # handle 52
	chain prerouting { # handle 1
		type nat hook prerouting priority dstnat; policy accept;
		tcp dport 80 dnat to 192.168.1.2 # handle 3
	}

	chain postrouting { # handle 2
		type nat hook postrouting priority srcnat; policy accept;
		masquerade # handle 4
	}
}
[root@archlinux ~]# ss -tlnp
State                   Recv-Q                  Send-Q                                   Local Address:Port                                    Peer Address:Port                  Process                                                     
LISTEN                  0                       4096                                     127.0.0.53%lo:53                                           0.0.0.0:*                      users:(("systemd-resolve",pid=378,fd=21))                  
LISTEN                  0                       128                                            0.0.0.0:22                                           0.0.0.0:*                      users:(("sshd",pid=389,fd=3))                              
LISTEN                  0                       4096                                        127.0.0.54:53                                           0.0.0.0:*                      users:(("systemd-resolve",pid=378,fd=23))                  
LISTEN                  0                       4096                                           0.0.0.0:5355                                         0.0.0.0:*                      users:(("systemd-resolve",pid=378,fd=12))                  
LISTEN                  0                       128                                               [::]:22                                              [::]:*                      users:(("sshd",pid=389,fd=4))                              
LISTEN                  0                       4096                                              [::]:5355                                            [::]:*                      users:(("systemd-resolve",pid=378,fd=14))
### calling the destination w/o the forwarding works just fine
[root@archlinux ~]# curl http://192.168.1.2
<html>
<head><title>302 Found</title></head>
<body>
<center><h1>302 Found</h1></center>
<hr><center>nginx/1.21.3</center>
</body>
</html>
### here I expect, that once port forwarding is working, will redirect/forward it to that http://192.168.1.2
[root@archlinux ~]# curl http://127.0.0.1
curl: (7) Failed to connect to 127.0.0.1 port 80 after 0 ms: Couldn't connect to server

What I'm doing wrong?

Edit: I forgot to mention:

[root@archlinux ~]# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

Last edited by 8472 (2023-11-19 12:27:27)


Logic clearly dictates that the needs of the many outweigh the needs of the few.

Offline

#2 2023-11-12 21:17:31

-thc
Member
Registered: 2017-03-15
Posts: 714

Re: [SOLVED] nftables & port forwarding not working

AFAIK this is not possible - packets to or from the "lo" (loopback) interface will only traverse the output and the input chain.
NAT in the pre- or postrouting chains have no effect whatsoever.

Offline

#3 2023-11-15 10:30:25

8472
Member
From: Slovakia
Registered: 2010-05-15
Posts: 88

Re: [SOLVED] nftables & port forwarding not working

Ok, then let me ask another way.
Basically, what I'm trying to achieve, is this: https://wiki.archlinux.org/title/Nftabl … ith_Docker
Using exactly the same configuration (as is in that wiki article) for /etc/systemd/system/docker.service.d/netns.conf (with the same interface names or IPs), I've already achieved:
- working network namespace
- working docker.service with this network namespace
- learned and I hope also understood a lot about how docker.service is working within this network namespace


Therefore:
- my host's main network is like 192.168.1.0/24, with 192.168.1.224 as main IP (enp0s3) of the test VM.
- my network namespace running on this host has subnet like 10.0.0.0/24 with 10.0.0.1/24 on the interface docker0 from the host perspective, and 10.0.0.100/24 for interface eth0 from network namespace perspective.
- the docker.service and my running test container is using docker's intern bridge with default IP addresses like 172.17.0.1/16 for it's gateway, and 172.17.0.2/16 for the running container.
- this running test container is also publishing port 80:80 to the docker.service, and is perfectly reachable within the running network namespace, either by ping or via curl.
- I can also reach the host's main IP from within the network namespace.
- when I add 'ip route add 172.17.0.0/16 via 10.0.0.100' to the host, I can likewise reach the eth0 using 'ping 10.0.0.100' or 'curl http://10.0.0.100', or the docker container via 'ping 172.17.0.2' or 'curl http://172.17.0.2' from the host himself. Works perfectly.
- also as mentioned previously, the "net.ipv4.ip_forward = 1", ergo enabled


But I still can't make the container's published port/http service (published and reachable in the network namespace) available to another hosts running in the host's main network, and I don't understand why.
I'm basically using the same nftables config as shown before, with minor adjustments - to port forward to the network namespace's eth0 (or the container's IP):

[root@archlinux]# grep -v '^#' /etc/nftables.conf

flush ruleset

table inet filter {
  chain input {
    type filter hook input priority 0;

    # Connection state based:
    ct state {established, related} accept
    ct state invalid drop

    # Allow loopback and ICMP:
    iifname lo accept
    ip protocol icmp accept
    ip6 nexthdr icmpv6 accept

    # Allow local traffic to port 80
    tcp dport 80 meta nftrace set 1
    tcp dport 80 accept

    # Reject everything else
    reject with icmp type port-unreachable
  }

  chain forward {
    type filter hook forward priority 0;
    tcp dport 80 meta nftrace set 1
  }

  chain output {
    type filter hook output priority 0;
    tcp dport 80 meta nftrace set 1
  }
}

table nat {
  chain prerouting {
    type nat hook prerouting priority dstnat;
    tcp dport 80 meta nftrace set 1
    tcp dport 80 log prefix "4:nat:prerouting:dport:dnat - " dnat to 10.0.0.100
  }
  chain postrouting {
    type nat hook postrouting priority srcnat;
    tcp dport 80 meta nftrace set 1
    masquerade
  }
}

Thanks to that enabled nftrace, I can at least see the following:

Nov 15 10:36:01 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6960 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:01 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6960 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE7626B0000000001030307) 
Nov 15 10:36:02 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6961 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:02 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6961 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE7666A0000000001030307) 
Nov 15 10:36:04 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6962 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:04 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6962 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE76F100000000001030307) 
Nov 15 10:36:08 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6963 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:08 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6963 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE77EE20000000001030307) 
Nov 15 10:36:16 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6964 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:16 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6964 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE79E900000000001030307)
...

Although I still do not understand where is the problem.
Have also been searching and testing another suggestions similar to mine, like this one: https://unix.stackexchange.com/question … 468#393468 , but still no success.
And I want to reach from the main network that test container's http service, running within network namespace of some test VM.
I don't get it why the port forwarding in nftables is not working for me.


Logic clearly dictates that the needs of the many outweigh the needs of the few.

Offline

#4 2023-11-15 18:29:34

-thc
Member
Registered: 2017-03-15
Posts: 714

Re: [SOLVED] nftables & port forwarding not working

Since I do not like docker and it's complicated network architecture I can only try to help you from the "outside".

8472 wrote:

- this running test container is also publishing port 80:80 to the docker.service, and is perfectly reachable within the running network namespace, either by ping or via curl.
- I can also reach the host's main IP from within the network namespace.
- when I add 'ip route add 172.17.0.0/16 via 10.0.0.100' to the host, I can likewise reach the eth0 using 'ping 10.0.0.100' or 'curl http://10.0.0.100', or the docker container via 'ping 172.17.0.2' or 'curl http://172.17.0.2' from the host himself. Works perfectly.

No surprises here - as it should be if the host initiating those requests has itself addresses in all those address spaces.

8472 wrote:
Nov 15 10:36:01 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6960 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:01 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6960 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE7626B0000000001030307) 
Nov 15 10:36:02 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6961 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:02 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6961 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE7666A0000000001030307) 
Nov 15 10:36:04 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6962 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:04 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6962 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE76F100000000001030307) 
Nov 15 10:36:08 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6963 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:08 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6963 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE77EE20000000001030307) 
Nov 15 10:36:16 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6964 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:16 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6964 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE79E900000000001030307)
...

This looks suspiciously like the container either (A) doesn't receive those packets or (B) doesn't know where to send the responses.

A) Can you trace incoming HTTP requests inside the container?

B) Does the IP stack in the container know where to route packets to 192.168.1.x? And if not - is the default gateway from the containers point of view the correct interface for the way back through NAT?

Offline

#5 2023-11-15 18:39:14

Tarqi
Member
From: Ixtlan
Registered: 2012-11-27
Posts: 179
Website

Re: [SOLVED] nftables & port forwarding not working


Knowing others is wisdom, knowing yourself is enlightenment. ~Lao Tse

Offline

#6 2023-11-17 10:01:23

8472
Member
From: Slovakia
Registered: 2010-05-15
Posts: 88

Re: [SOLVED] nftables & port forwarding not working

-thc wrote:

Since I do not like docker and it's complicated network architecture I can only try to help you from the "outside".

8472 wrote:

- this running test container is also publishing port 80:80 to the docker.service, and is perfectly reachable within the running network namespace, either by ping or via curl.
- I can also reach the host's main IP from within the network namespace.
- when I add 'ip route add 172.17.0.0/16 via 10.0.0.100' to the host, I can likewise reach the eth0 using 'ping 10.0.0.100' or 'curl http://10.0.0.100', or the docker container via 'ping 172.17.0.2' or 'curl http://172.17.0.2' from the host himself. Works perfectly.

No surprises here - as it should be if the host initiating those requests has itself addresses in all those address spaces.

8472 wrote:
Nov 15 10:36:01 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6960 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:01 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6960 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE7626B0000000001030307) 
Nov 15 10:36:02 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6961 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:02 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6961 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE7666A0000000001030307) 
Nov 15 10:36:04 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6962 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:04 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6962 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE76F100000000001030307) 
Nov 15 10:36:08 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6963 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:08 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6963 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE77EE20000000001030307) 
Nov 15 10:36:16 archlinux kernel: 4:nat:prerouting:dport:dnat - IN=enp0s3 OUT= MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=192.168.1.224 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6964 DF PROTO=TCP SPT=41520 DPT=80 WINDOW=
64240 RES=0x00 SYN URGP=0 
Nov 15 10:36:16 archlinux kernel: TRACE: filter:FORWARD:policy:1 IN=enp0s3 OUT=docker0 MAC=08:00:27:7f:bc:2e:f0:2f:76:db:7h:2f:08:00 SRC=192.168.1.123 DST=10.0.0.100 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=6964 DF PROTO=TCP SPT=41520 DPT=80 SE
Q=4150330551 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080A2BE79E900000000001030307)
...

This looks suspiciously like the container either (A) doesn't receive those packets or (B) doesn't know where to send the responses.

A) Can you trace incoming HTTP requests inside the container?

B) Does the IP stack in the container know where to route packets to 192.168.1.x? And if not - is the default gateway from the containers point of view the correct interface for the way back through NAT?

a) Incomming requests:
    - No if it's comming from the main host, then there is no activity in the container log whatsoever
    - Yes if I make such request from within the network namespace:

[root@archlinux ~]#  nsenter -t $(systemctl status docker.service | grep 'Main PID' | awk '{print $3}') -n -- curl -k http://localhost
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>


[root@archlinux ~]#   docker container logs --tail 5 my-nginx-1
172.17.0.1 - - [17/Nov/2023:08:28:33 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.4.0" "-"
172.17.0.1 - - [17/Nov/2023:08:29:35 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.4.0" "-"
172.17.0.1 - - [17/Nov/2023:08:30:05 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.4.0" "-"
172.17.0.1 - - [17/Nov/2023:08:32:37 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.4.0" "-"
172.17.0.1 - - [17/Nov/2023:08:32:55 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.4.0" "-"

      If you meant tracking it on the nftables level inside of the network namespace, then I can add there some nftables rules of my own (like 'meta nftrace set 1'), but this tracks nothing in the network namespace log itself. No idea why. Perhaps because there are other rules managed by docker. hmm

b)
      - "Does the IP stack in the container know where to route packets to 192.168.1.x?" - The network namespace is aware of the main host interface/IP and can ping it, but not the remaining objects in the main network.
      - "And if not - is the default gateway from the containers point of view the correct interface for the way back through NAT?" - I'm not sure, whether it's correct or not (but I hope so):

[root@archlinux ~]#  nsenter -t $(systemctl status docker.service | grep 'Main PID' | awk '{print $3}') -n -- ip r
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

      - However, shouldn't this be taken care of by the "nat:postrouting:masquerade" rule in the host's main nftables? At least that is my understanding of NATting/Masquerading, that nftables should keep it's own internal record/table, to know how to route such request back.


I don't think so, because this is not that kind of localhost redirect.
Although it's all localhost alright, but the destination is hidden in another network namespace, which at least in my eyes makes it like another host in the network.
Also the destination port is available only in this network namespace, but not on the main host himself.
At least this is my understanding of nftables redirect.
Therefore, I'm not sure whether something like this is possible with nftables redirect, at least I haven't found any such examples.
But feel free to correct me if I'm wrong.


Logic clearly dictates that the needs of the many outweigh the needs of the few.

Offline

#7 2023-11-17 11:13:00

-thc
Member
Registered: 2017-03-15
Posts: 714

Re: [SOLVED] nftables & port forwarding not working

8472 wrote:

a) Incomming requests:
    - No if it's comming from the main host, then there is no activity in the container log whatsoever
    - Yes if I make such request from within the network namespace:

The nftables ruleset correctly forwards the packets to 10.0.0.100 on interface "docker0" but they never reach their destination.
That means inside the namespace forwarding from 10.0.0.1 to 10.0.0.100 isn't working.

Offline

#8 2023-11-19 13:19:29

8472
Member
From: Slovakia
Registered: 2010-05-15
Posts: 88

Re: [SOLVED] nftables & port forwarding not working

Thank you for all hints.
Finally I have found out, that for an unknown reason the "net.ipv4.ip_forward=1" was apparently not applied correctly and that was the whole issue. hmm

Here is the whole solution in a very simple example, if anybody is interested:

pacman -S nftables docker
echo "net.ipv4.ip_forward=1" > /etc/sysctl.d/30-ipforward.conf

Restart the OS.

The /etc/nftables.conf (with most important the "tcp dport 80 accept" and the whole "table nat") content:

#!/usr/bin/nft -f

flush ruleset

table inet filter {
  chain input {
    type filter hook input priority 0;

    # Connection state based:
    ct state {established, related} accept
    ct state invalid drop

    # Allow loopback and ICMP:
    iifname lo accept
    ip protocol icmp accept
    ip6 nexthdr icmpv6 accept

    # Allow local traffic to port 22 and 80
    tcp dport {22, 80} accept

    # Reject everything else
    reject with icmp type port-unreachable
  }
}

table nat {
  chain prerouting {
    type nat hook prerouting priority dstnat;
    tcp dport 80 dnat to 10.0.0.100
    # by replacing this 'tcp dport 80 dnat to 10.0.0.100' with 'tcp dport 80 dnat to 172.17.0.2', one can reach the container network directly from host
    # requires the 'ExecStartPre=nsenter -t 1 -n -- ip route add 172.17.0.0/16 via 10.0.0.100' in the "netns.conf", to port forward directly to the container, instead of the global network namespace interface
  }
  chain postrouting {
    type nat hook postrouting priority srcnat;
    masquerade
  }
}

The network namespace:

mkdir -p /etc/systemd/system/docker.service.d/

The /etc/systemd/system/docker.service.d/netns.conf content:

[Service]
PrivateNetwork=yes
PrivateMounts=No

# cleanup
ExecStartPre=-nsenter -t 1 -n -- ip link delete docker0

# add veth
ExecStartPre=nsenter -t 1 -n -- ip link add docker0 type veth peer name docker0_ns
ExecStartPre=sh -c 'nsenter -t 1 -n -- ip link set docker0_ns netns "$$BASHPID" && true'
ExecStartPre=ip link set docker0_ns name eth0

# bring host online
ExecStartPre=nsenter -t 1 -n -- ip addr add 10.0.0.1/24 dev docker0
ExecStartPre=nsenter -t 1 -n -- ip link set docker0 up

# bring ns online
ExecStartPre=ip addr add 10.0.0.100/24 dev eth0
ExecStartPre=ip link set eth0 up
ExecStartPre=ip route add default via 10.0.0.1 dev eth0

# route to the docker container's bridge network - optional
## by enabling 'ExecStartPre=nsenter -t 1 -n -- ip route add 172.17.0.0/16 via 10.0.0.100', one can reach the container network directly from host
## can also be combined with the 'tcp dport 80 dnat to 172.17.0.2', to port forward directly to the container, instead of the global network namespace interface

Start the services:

#usermod -aG docker YOURUSERNAME;  # only required in case you're running the docker commands as non-root user
systemctl enable nftables --now;
systemctl enable docker --now;

Create a test container:

docker create --name my-nginx-1 -p80:80 --restart=unless-stopped nginx;
docker container start my-nginx-1;
docker ps -a;

As of this point, the docker container running inside of the network namespace should be reachable from another host:

$  curl http://192.168.1.224
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Also, this is the whole route table from inside of the network namespace, nothing else is required:

# nsenter -t $(systemctl status docker.service | grep 'Main PID' | awk '{print $3}') -n -- ip r
default via 10.0.0.1 dev eth0 
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

Last edited by 8472 (2023-11-19 14:21:53)


Logic clearly dictates that the needs of the many outweigh the needs of the few.

Offline

Board footer

Powered by FluxBB