Network connects on arch VM, fails in Domain Name Resolution on host

zZzUP3RzZz · 2024-11-09 22:42:34

Hi! have a perplexing error. I have a box with arch on it (called BASE for ease of reading), and that has an arch vm on it (I'll call VM).

When I run `ping -c 3 www.google.com` on BASE, it will either hang indefinitely, or hang for a really long time and respond correctly (meaning, like this: https://wiki.archlinux.org/title/Networ … tion#Ping).
It does this regardless of whether or not the VM is running. When I run the same command on the VM, it responds quickly and correctly (as defined above).

I don't know where to begin troubleshooting this. Ive retried on BASE with nginx disabled, I use resilio sync and tried disabling that, jellyfin, tailscale, freshrss, wallabag..
I am trying to install nethogs to get a better look but.. no connection. Strangely! All of the applications listed work.

I strongly believe this is a failure in domain name resolution -- pinging 8.8.8.8 instead of www.google.com responds quickly and correctly...

PS: The internet connection just worked long enough to download nethogs. I didn't really find anything. BUT I did realize in addition to the apps listed, I'm connected to the BASE os over SSH (on LAN)?

Im trying to be thorough, but probably overlooked something obvious

/etc/systemd/network/25-br0-en.network:

[Match]
Name=en*

[Network]
Bridge=br0

/etc/systemd/network/25-br0.network:

[Match]
Name=br0

[Network]
DNS=192.168.40.1
Address=192.168.40.xx/24
Gateway=192.168.40.1

/etc/systemd/network/25-br0.netdev:

[NetDev]
Name=br0
Kind=bridge

/etc/systemd/network/90-old-wired.network:

[Match]
Name=eno1

[Network]
DHCP=yes

resolvectl status on BASE:

Global
           Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: stub
Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net 8.8.8.8#dns.google 2606:4700:4700::1111#cloudflare-dns.com
                      2620:fe::9#dns.quad9.net 2001:4860:4860::8888#dns.google

Link 2 (br0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.40.1
       DNS Servers: 192.168.40.1

Link 3 (wlp2s0)
    Current Scopes: none
         Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 4 (eno1)
    Current Scopes: none
         Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 9 (tailscale0)
    Current Scopes: none
         Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 10 (vnet0)
    Current Scopes: LLMNR/IPv6 mDNS/IPv6
         Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported

resolvectl status on VM:

Global
           Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: foreign
Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net 8.8.8.8#dns.google 2606:4700:4700::1111#cloudflare-dns.com
                      2620:fe::9#dns.quad9.net 2001:4860:4860::8888#dns.google
          DNS Domain: ~.

Link 2 (enp1s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.40.1
       DNS Servers: 192.168.40.1

Thanks!

Last edited by zZzUP3RzZz (2024-11-09 23:24:46)

cryptearth · 2024-11-09 23:11:26

please post console output as text in code-tags instead of screenshots

zZzUP3RzZz · 2024-11-09 23:25:08

Fixed!

EDIT: The image vs code block thing, not the issue

Last edited by zZzUP3RzZz (2024-11-10 02:31:18)

-thc · 2024-11-10 08:00:30

Just a few things I noticed:

Why is "/etc/systemd/network/90-old-wired.network" still around? Do you know that adding an adapter to a bridge makes it a slave to that bridge and it loses any meaningful OSI level 3 functionality? Take a look at the output of "ip a" on the host.

You somehow tinkered with "/etc/resolv.conf" inside the VM but not on the host ("resolv.conf mode: foreign"). Have you compared "/etc/resolv.conf" on the host and the VM?

zZzUP3RzZz · 2024-11-10 20:29:51

-thc wrote:

Just a few things I noticed:
Why is "/etc/systemd/network/90-old-wired.network" still around? Do you know that adding an adapter to a bridge makes it a slave to that bridge and it loses any meaningful OSI level 3 functionality? Take a look at the output of "ip a" on the host.
You somehow tinkered with "/etc/resolv.conf" inside the VM but not on the host ("resolv.conf mode: foreign"). Have you compared "/etc/resolv.conf" on the host and the VM?

I thought keeping 90-old-wired.network was necessary. I have deleted it, but the issue still persists.

Here is output of ip a:

on host:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:31:f2:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.xx.xx/xx brd 192.168.xx.255 scope global br0
       valid_lft forever preferred_lft forever
    inet6 fe80::31:f2ff:xxxx:xxxx/xx scope link proto kernel_ll 
       valid_lft forever preferred_lft forever
3: wlp2s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0c:7a:15:xx:xx:xx brd ff:ff:ff:ff:ff:ff
4: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
    link/ether a8:a1:59:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
9: tailscale0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none 
    inet6 fe80::43bf:84d5:xxxx:xxxx/xx scope link stable-privacy proto kernel_ll 
       valid_lft forever preferred_lft forever
10: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:xxxx:xxxx/xx scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

on VM:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.xx.xx/xx brd 192.168.xx.255 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:xxxx:xxxx/xx scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

I think the reason it looks that way on the vm vs the host is that when I set up the vm in virt-manager I had already configured the bridge and entered it in during setup..
I have however compared the resolv.confs and they are the exact same..

nameserver 127.0.0.53
options edns0 trust-ad
search .

-thc · 2024-11-10 20:52:38

More tests:

Is "/etc/resolv.conf" on both instances a link:

lrwxrwxrwx 1 root root 37 Oct 30  2023 /etc/resolv.conf -> /run/systemd/resolve/stub-resolv.conf

Is resolved on both instances listening on the ports:

[thc@box ~]$ sudo ss -l -p -u -n

State        Recv-Q       Send-Q                                    Local Address:Port                Peer Address:Port       Process                                          
UNCONN       0            0                                            127.0.0.54:53                       0.0.0.0:*           users:(("systemd-resolve",pid=xxx,fd=yy))       
UNCONN       0            0                                         127.0.0.53%lo:53                       0.0.0.0:*           users:(("systemd-resolve",pid=xxx,fd=yy))

What happens if you try

resolvectl query www.google.com

on the host?

zZzUP3RzZz · 2024-11-10 21:05:19

-thc wrote:

More tests:

Is "/etc/resolv.conf" on both instances a link:

lrwxrwxrwx 1 root root 37 Oct 30  2023 /etc/resolv.conf -> /run/systemd/resolve/stub-resolv.conf

Is resolved on both instances listening on the ports:

[thc@box ~]$ sudo ss -l -p -u -n

State        Recv-Q       Send-Q                                    Local Address:Port                Peer Address:Port       Process                                          
UNCONN       0            0                                            127.0.0.54:53                       0.0.0.0:*           users:(("systemd-resolve",pid=xxx,fd=yy))       
UNCONN       0            0                                         127.0.0.53%lo:53                       0.0.0.0:*           users:(("systemd-resolve",pid=xxx,fd=yy))

What happens if you try

resolvectl query www.google.com

on the host?

Hm! Resolv.conf on host is a link but on VM it is not!
host:

lrwxrwxrwx 1 root root 39 Sep 22 23:03 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

vm:

-rw-r--r-- 1 root root 920 Oct 22 17:22 /etc/resolv.conf

The results of ss are not as interesting:

UNCONN 0      0                                 127.0.0.54:53         0.0.0.0:*     users:(("systemd-resolve",pid=478,fd=22)) 
UNCONN 0      0                              127.0.0.53%lo:53         0.0.0.0:*     users:(("systemd-resolve",pid=478,fd=20))

as well as

UNCONN   0        0                       0.0.0.0:5353            0.0.0.0:*       users:(("systemd-resolve",pid=432,fd=15))   
UNCONN   0        0                       0.0.0.0:5355            0.0.0.0:*       users:(("systemd-resolve",pid=432,fd=11))

and

UNCONN 0      0                                       [::]:5353          [::]:*     users:(("systemd-resolve",pid=478,fd=16)) 
UNCONN 0      0                                       [::]:5355          [::]:*     users:(("systemd-resolve",pid=478,fd=13))

appear on both outputs.

resolvectl query www.google.com hangs the same as ping

Thanks!

-thc · 2024-11-10 21:18:06

Hmmm.

Can you ping 192.168.40.1 from your host?

Is there a firewall involved?

zZzUP3RzZz · 2024-11-10 21:26:03

both machines can ping 192.168.40.1 fine, and I'm almost definitely sure there is not a firewall in the mix.

A piece of the puzzle that may be important (I mentioned this in the OP but it bears rementioning) is that the host machine will (not always) respond to `ping -c 3 www.google.com` correctly, just after a very long time. But it does respond! Sometimes.

-thc · 2024-11-10 21:34:47

What happens if you dissolve the bridge (temporarily remove all bridge-"conf"s and replace it with your old wired.conf)?

zZzUP3RzZz · 2024-11-10 21:48:55

Ok, so that fixes the issue on the host machine!

But now, how do I set the bridge up for the VM?
(I originally set it up by following this)

Last edited by zZzUP3RzZz (2024-11-11 03:28:37)

-thc · 2024-11-11 07:57:54

Does the problem reappear if you just set up "half a bridge" (bridge and en*-slave only - no vnet* attached or even active)?

Last edited by -thc (2024-11-11 07:58:03)

zZzUP3RzZz · 2024-11-11 17:25:33

-thc wrote:

Does the problem reappear if you just set up "half a bridge" (bridge and en*-slave only - no vnet* attached or even active)?

I don't know how to do that...

Would that be deleting the 25-br0.network file?

-thc · 2024-11-11 17:43:21

No - just reinstate all three bridge conf-files and control via

brctl show

(as root) that the en*-Adapter is the only bridge member.

zZzUP3RzZz · 2024-11-11 17:47:12

-thc wrote:

No - just reinstate all three bridge conf-files and control via
brctl show
(as root) that the en*-Adapter is the only bridge member.

I still don't know what you mean. Reinstate? Also, there is no brctl because it is deprecated and replaced by 'bridge link'

Last edited by zZzUP3RzZz (2024-11-11 17:48:07)

cryptearth · 2024-11-11 18:18:40

zZzUP3RzZz wrote:

Also, there is no brctl because it is deprecated and replaced by 'bridge link'

doesn't look like to me
https://man.archlinux.org/man/brctl.8.en
https://archlinux.org/packages/extra/x8 … dge-utils/ - which is a dependency for docker - hence I doubt it's deprecated
anyway - what thc is requesting: does the issue on the host also happen with NO vm running (i.e. no vnet0 connected to br0)?

zZzUP3RzZz · 2024-11-11 18:27:52

Huh. this was my source for it being deprecated. I'll install it.

Yes, the problem still exists with no vm running. I thought I mentioned that in the OP but I did not .

progandy · 2024-11-11 19:36:48

It is marked deprecated upstream, but it shoudl still work. https://wiki.linuxfoundation.org/networking/bridge

The wiki page has documentation for using iproute2 as well as brctl: https://wiki.archlinux.org/title/Network_bridge

-thc · 2024-11-11 19:57:47

I've set up a simple bridge like yours in my Arch VM with active systemd-resolved and everything works:

# 10-br0.netdev
[NetDev]
Name=br0
Kind=bridge

# 10-tapvm.netdev
[NetDev]
Name=tapvm
Kind=tap

# 15-ens33.network
[Match]
Name=ens33

[Network]
Bridge=br0

# 15-tapvm.network
[Match]
Name=tapvm

[Network]
Bridge=br0

# 20-br0.network
[Match]
Name=br0

[Network]
DHCP=yes

[DHCPv4]
UseDNS=yes

I have no idea why your setup behaves differently.

zZzUP3RzZz · 2024-11-11 21:50:08

-thc wrote:

I have no idea why your setup behaves differently.

Ah. Thank you for your help anyway

Piece of the puzzle (should have tried this sooner)... replacing 25-br0.network with:

[Match]
Name=br0

[Network]
DHCP=yes

fixes the issue -- So it must be something to do with the static ip address. Which doesn't make sense, because the network on the VM is set up exactly the same (only difference is a number in the IP Address).
Again, I followed the directions in here. I just went back through a second time as well.

EDIT: Maybe the VM is ignoring the static IP Address file, and running dhcp anyway.. This would explain why the VM works and the host doesnt -- both static ip address setups are wrong? How do you check if DHCP is running? But I used dhcping on the vm and it replied no answer.
EDIT EDIT: That's not it -- I changed the static IP Address field on the VM and it changed to the new IP correctly.

Last edited by zZzUP3RzZz (2024-11-11 23:46:59)

cryptearth · 2024-11-12 07:16:14

well, maybe someone else can brighten up us both as I, too, have difficulties to really understand how a bridge works:
as I also play around with VMs but also want to take advantage of my PXE setup in place for me using a bridge instead of nat is far easier (I don't know if qemu even support pxe like virtualbox does)
I have my bridge configured to just use dhcp instead of a static ip and also not have vlans in place (which is a good idea if one use vms for something like hosting or cameras or other stuff that should mix with the rest of the lan)
from how I understand it: a network bridge is a somewhat virtual switch to which the host system as well as vms connect to share one physical network interface
in order for this virtual switch to work it requires an upstream link which is done by enslaving the physical interface so anything that connects to it (the vms with thier vnetX) can get a connection to tge physical lan
now what I fail to understand is this: as the host also requires a connection the bridge itself somewhat becomes its new main network interface and hence now it requires an IP
to me this somewhat contradicts the analogy that a bridge is merely a more or less dumb switch - which although require handling of ARP tables usually doesn't get its own IP unless its a dmart managed switched - and the host itself should also get something like a vnet virtual interface
I'm not quite sure how it's supposed to work but it should work with assigning a static ip the same as with dhcp - or are bridges supposed to work with dhcp only?

-thc · 2024-11-12 11:17:38

It's not that complicated.

A bridge connects two separate networks on OSI level 2 via ARP tables and doesn't care about IP addresses (like an OSI level 2 switch).

A bridge is represented as an interface (e.g. br0).

The bridge works independently from the OSI level 3 (IP) status of the interface. Without an IP address it's a "headless" bridge.

On a host with only one physical interface you probably need to assign IPv4/IPv6 addresses to the bridge interface and it will work like a phyiscal interface on OSI level 3. Additionally the bridge still works "below" that on OSI level 2 (If you have more that one physical adapter you may choose bridges, headless bridges or unabridged interfaces).

Last edited by -thc (2024-11-13 06:07:22)

zZzUP3RzZz · 2024-11-12 20:10:48

Do you think I need to assign IP addresses to the bridge (to make it act like a physical interface on OSI level 3)? There is more than one interface -- but only one is connected to the internet

I know my issue is connected to Static IP vs DHCP, but the IP address config is the same on VM and host, and the VM works. Beyond that, if I use ip a on host, the br0 interface shows as up, with the same IP that I specified in the config. (If i change the IP address in the config, it shows the change in ip a, but the problem perists)

EDIT: Replacing the DNS line with DNS=1.1.1.1 8.8.8.8 does work!I had a feeling it might, but I still feel like it's just working around the problem (esp because the same config without the change works on the VM)
So I don't think this is solved yet.

Last edited by zZzUP3RzZz (2024-11-12 20:39:41)

-thc · 2024-11-13 06:15:25

zZzUP3RzZz wrote:

Do you think I need to assign IP addresses to the bridge (to make it act like a physical interface on OSI level 3)? There is more than one interface -- but only one is connected to the internet

It should not matter. Regardless how the bridge interface acquires it's IP configuration it should work either way.

zZzUP3RzZz wrote:

I know my issue is connected to Static IP vs DHCP, but the IP address config is the same on VM and host, and the VM works.

Not exactly the same?

zZzUP3RzZz wrote:

EDIT: Replacing the DNS line with DNS=1.1.1.1 8.8.8.8 does work!I had a feeling it might, but I still feel like it's just working around the problem (esp because the same config without the change works on the VM)
So I don't think this is solved yet.

This sounds like your router (192.168.40.1) has a (DNS) problem with two different MACs/IP addresses on a single physical link.

zZzUP3RzZz · 2024-11-19 01:13:37

Sorry, I just now saw this.

The only difference between the two config files is the IP set, other than that it is the same.

This sounds like your router (192.168.40.1) has a (DNS) problem with two different MACs/IP addresses on a single physical link.

That sounds like the likely problem (considering I have no idea what else it would be). However, if that were the case I think the host connection would function when the vm is down. Also, I don't know how to troubleshoot/fix that. The DNS (1.1.1.1) setting is a hack but at least it lets me use the internet, making this issue less urgent; I would like to get it "right" though

Arch Linux

#1 2024-11-09 22:42:34

Network connects on arch VM, fails in Domain Name Resolution on host

#2 2024-11-09 23:11:26

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#3 2024-11-09 23:25:08

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#4 2024-11-10 08:00:30

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#5 2024-11-10 20:29:51

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#6 2024-11-10 20:52:38

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#7 2024-11-10 21:05:19

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#8 2024-11-10 21:18:06

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#9 2024-11-10 21:26:03

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#10 2024-11-10 21:34:47

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#11 2024-11-10 21:48:55

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#12 2024-11-11 07:57:54

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#13 2024-11-11 17:25:33

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#14 2024-11-11 17:43:21

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#15 2024-11-11 17:47:12

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#16 2024-11-11 18:18:40

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#17 2024-11-11 18:27:52

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#18 2024-11-11 19:36:48

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#19 2024-11-11 19:57:47

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#20 2024-11-11 21:50:08

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#21 2024-11-12 07:16:14

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#22 2024-11-12 11:17:38

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#23 2024-11-12 20:10:48

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#24 2024-11-13 06:15:25

Re: Network connects on arch VM, fails in Domain Name Resolution on host

#25 2024-11-19 01:13:37

Re: Network connects on arch VM, fails in Domain Name Resolution on host

Board footer