You are not logged in.

#1 2022-11-24 21:46:13

BlackMastermind
Member
Registered: 2017-01-17
Posts: 45

virtualbox 7.x causes adapter reset on host machine

This isn't really a call for support, but rather I just want to write down my findings about an issue I discovered, in case someone else is experiencing the same issue.

Situation:

I have an ArchLinux machine that is running remotely. On this machine, an Intel 4770k on a Z87-Pro motherboard with an integrated Intel I217V NIC, VirtualBox (package from the Arch repositories) is installed, and there are several virtual machines running. One of these virtual machines acts as an ssh jumphost. I access the jumphost over the public internet to admin machines on the remote network. I basically do everything through the jumphost and use ssh tunnels.

Since the end of Oktober, I noticed that occasionally my ssh session would freeze and then would start working again a minute later. At first I attributed this to an internet glitch or whatever, but when it started to happen more regularly, I looked deeper into it and I noticed that since October 26, 2022 each time I had a session freeze, I was getting these errors in the log of the ArchLinux host:

Oct 26 17:37:07 archhost kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Oct 26 17:37:09 archhost kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Oct 26 17:37:11 archhost kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Oct 26 17:37:13 archhost kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Oct 26 17:37:15 archhost kernel: e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly

I also noticed that it was always occurring when I was doing heavy network I/O through one of the virtual machines. I found out I could reliably trigger it by tunneling a graphically intensive VNC session through the ssh connection, or by simply doing an rsync of some large files.

When I was on location at the remote machine, and so not working through ssh tunnels, I could never trigger the issue. This, and the fact that I've never had an issue with this NIC in 8 years made me skeptical about it being a hardware or driver issue.
Googling the error messages, I found several older posts about using ethtool to turn off various features of the NIC, but all that accomplished was nuke the performance and didn't solve the issue.

Rolling back the kernel and firmware to previous releases, from before when I started having the issue, didn't help initially, until I got to the point where I was forced to rollback virtualbox 7.0.x to a 6.1.x release to make the vbox kernel modules play well with the older kernel. Lo and behold ... the issue disappeared. Then I upgraded the kernel and firmware back to the current version and the issue didn't reappear. As soon as install virtualbox 7.0.x though, it's disconnect time again.

I'm guessing one of the vbox kernel modules in 7.0 is doing something screwy that causes the NIC to lose connection. So for now I have reverted virtualbox to 6.1.40 and put any virtualbox packages in my IgnorePkg.

My plan now is to migrate my virtual machines away from virtualbox to kvm. I've long held off on this, because they can't coexist on the same machine, which makes migration a pain because it has to be done in a one-shot way.

Offline

Board footer

Powered by FluxBB