You are not logged in.

#1 2018-01-03 15:52:42

nmiculinic
Member
Registered: 2015-12-25
Posts: 63

How to build docker-like network isolation from scratch?

I'm reading/learning about network namespaces, virtual eth interfaces, iptables, routing tables, etc. And I have some basic understanding of it, having read Arch wiki page on iptables and few youtube videos.

However what I'm missing is how to put everything together into something like docker does with its network handling. My google search yielded only low-quality, beginning-level blogpost/tutorial, same as youtube videos. I cannot seem to find in-depth guide how to build something from scract to understand the core concepts. For example, in docker, I see docker0 interface and in docs it read it's connected to various veth interfaces, but I don't know how can I recreate it/debug/see what exactly is connected to what.

Something like this https://www.youtube.com/watch?v=MCs5OvhV9S4 , just for understanding how docker and other production services work.

Offline

#2 2018-01-03 16:03:03

nmiculinic
Member
Registered: 2015-12-25
Posts: 63

Re: How to build docker-like network isolation from scratch?

Also is this the right place to post? Should I have gone to StackOverflow, Quora, or something else instead?

Offline

#3 2018-01-03 19:50:47

nesk
Member
Registered: 2011-03-31
Posts: 181

Re: How to build docker-like network isolation from scratch?

I'd start with formulating what specifically you're trying to accomplish. Your post is rather vague.

Offline

#4 2018-01-03 19:59:43

nmiculinic
Member
Registered: 2015-12-25
Posts: 63

Re: How to build docker-like network isolation from scratch?

Ok. I want to build docker container network isolation from scratch. Including setting up bridge interfaces, forwarding rules, namespace(as I see docker doesn't use that...why?), iptables rules, etc.

So I want to start a process in my new container, which is isolated in its own network same as in docker and understand tooling around it, how can I tweak it, etc.

Basically how to build docker from scratch guide if I were to rebuild docker. In Feynman fashion: "What I cannot create, I do not understand"

Offline

#5 2018-01-03 20:22:54

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,523
Website

Re: How to build docker-like network isolation from scratch?

Docker is open source.  Read the code.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#6 2018-01-04 01:07:09

hiciu
Member
Registered: 2010-08-11
Posts: 84

Re: How to build docker-like network isolation from scratch?

What I cannot create, I do not understand

<3. OK, so, docker networking IMO is a bit of a mess but I can show you how to use network namespaces from scratch.

Lets start with network namespaces:

$ man namespaces
...
   Network namespaces (CLONE_NEWNET)

       Network  namespaces provide isolation of the system resources associated with net‐
       working: network devices, IPv4 and IPv6 protocol stacks, IP routing tables,  fire‐
       walls,  the /proc/net directory, the /sys/class/net directory, port numbers (sock‐
       ets), and so on.  A physical network device can live in exactly one network names‐
       pace.   A  virtual  network (veth(4)) device pair provides a pipe-like abstraction
       that can be used to create tunnels between network namespaces, and can be used  to
       create a bridge to a physical network device in another namespace.

       When  a  network  namespace is freed (i.e., when the last process in the namespace
       terminates), its physical network devices are moved back to  the  initial  network
       namespace (not to the parent of the process).

       Use  of  network  namespaces  requires  a  kernel that is configured with the CON‐
       FIG_NET_NS option.

Basically, it works like this: lets say you have a process. The only way this process can communicate with outside world is by calling some kernel function (syscall). Kernel is in position where it can.. I don't want to write "lie" to your process, but let's say that your process knows only as much as kernel will tell it. Turns out this is useful feature, kernel can say different things to different processes running on the same system - for example, if my shell asks "please give me a list of running network interfaces" kernel will respond with a list ["lo0", "eth0", "eth1"], and so on; but if my media player does the same kernel can respond with "there are no running network interfaces".

I am going to assume you know about namespaces in Linux kernel. OK, so, technical details: if you have recent enough iproute2 package in the system your "/usr/bin/ip" should be able to manage network namespaces with "ip netns" command or with "ip -n" switch. You can access docs with "man ip-netns". Try this:

$ ip addr
$ sudo ip netns add vpn
$ sudo ip netns exec vpn bash
# ip addr

You'll notice that after "ip netns exec" you can see only loopback interface. That's because your process in in separate network namespace. I think I should point out here that "ip netns exec ..." will not only switch you to specified namespace, it will also do few other things to register that namespace in the system and apply some configuration from /etc/netns/. See details in the manual. This is all done by convention, nothing forces you to do the same if you are using network namespaces directly from your application. Docker is not using the same convention as far as I know.

On the "C" level I believe "ip netns exec" is calling something similar to "clone()" syscall with CLONE_NETNET flag (or "setns()" if there is something already running in that namespace).


At this point you have a new network namespace, but it's useless - there are no interfaces in it. In Linux you can create many different kinds of network interfaces, one of them is "veth" - virtual ethernet interface. See "man ip-link" for more info. It works like a pipe - there are two ends, if you send something via one end it will come out on the other end.

I'm using "ip link" to create two new linked interfaces:

$ sudo ip link add vpn0 type veth peer name vpn1
$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
...
9: vpn1@vpn0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ae:04:d8:26:a8:14 brd ff:ff:ff:ff:ff:ff
10: vpn0@vpn1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether aa:c2:16:18:1f:a7 brd ff:ff:ff:ff:ff:ff
$ sudo ip -n vpn link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

I'm moving one of them to the namespace:

$ sudo ip link set vpn1 netns vpn

And that interface disappears from my main namespace:

$ ip link show
...
10: vpn0@if9: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether aa:c2:16:18:1f:a7 brd ff:ff:ff:ff:ff:ff link-netnsid 1
$ sudo ip -n vpn link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
9: vpn1@if10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:04:d8:26:a8:14 brd ff:ff:ff:ff:ff:ff link-netnsid 0

Now all I need to do is to set an address on vpn0 in this namespace, set an address on vpn1 in the vpn namespace and optionally setup routing tables / firewall / ip forwarding in both namespaces. This is easy, do it exactly like you would've done it with another computer.

I'm using setup like this one with openvpn to make sure that all my traffic is encrypted - with iptables rules inside namespace to make sure only openvpn traffic is sent to my isp. And I can have few different browser windows, each of them sending traffic via different vpn connection smile.


You've also asked about docker-like bridge - try this:

$ sudo ip link add dev asdf0 type bridge
$ sudo ip link set up dev asdf0
$ echo assuming vpn1, vpn3 and vpn5 are already in separate namespaces
$ sudo ip link set dev vpn0 master asdf0
$ sudo ip link set dev vpn2 master asdf0
$ sudo ip link set dev vpn4 master asdf0

(you should also read about macvlan, maybe it will better fit your use case)

Last edited by hiciu (2018-01-04 01:24:14)

Offline

#7 2018-01-05 14:03:45

nmiculinic
Member
Registered: 2015-12-25
Posts: 63

Re: How to build docker-like network isolation from scratch?

Thank you, this seems nice & useful!

Does docker use network namespaces? As far as I can tell, they're not using it, but iptables rules (I'm looking at -t nat and -t filter tables; other are untouched by docker (raw, mangle, security).

I'm writing a seminar/essay for a university assignment and I've picked this topic. So when I finish in week/two I'll post the link here.

Offline

#8 2018-01-09 22:02:00

hiciu
Member
Registered: 2010-08-11
Posts: 84

Re: How to build docker-like network isolation from scratch?

Yes, yes it does but namespaces created by docker are not registered in the system the same way "ip netns" registers them.

Try this:

$ readlink /proc/self/ns/net
net:[4026531993]
$ sudo docker run -it --rm busybox readlink /proc/self/ns/net
net:[4026532500]

And see "man namespaces" under "The /proc/[pid]/ns/ directory" section.

Last edited by hiciu (2018-01-09 22:02:29)

Offline

Board footer

Powered by FluxBB