You are not logged in.
Hi all,
after upgrading from libvirt 1:9.3.0-1 to 1:9.4.0-1, "sudo virsh start" and virt manager do not work anymore. The virsh command I used to use to start a qemu based VM just hangs, virt manager starts but hangs on connecting to "qemu:///system".
I am using qemu KVM for my VMs. I tried to follow the installation guide in the wiki to check if everything is working, but did not find much, as it is mostly installing the packets. I checked if the demon is running, result below:
$ systemctl status libvirtd
● libvirtd.service - Virtualization daemon
Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; preset: disabled)
Active: active (running) since Tue 2023-06-06 13:59:26 CEST; 54min ago
TriggeredBy: ● libvirtd-ro.socket
● libvirtd-admin.socket
● libvirtd.socket
Docs: man:libvirtd(8)
https://libvirt.org
Process: 509 ExecStart=/usr/bin/libvirtd $LIBVIRTD_ARGS (code=exited, status=0/SUCCESS)
Main PID: 509 (code=exited, status=0/SUCCESS)
Tasks: 2 (limit: 32768)
Memory: 48.5M
CPU: 1.060s
CGroup: /system.slice/libvirtd.service
├─667 /usr/bin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
└─668 /usr/bin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
Jun 06 14:03:55 hostname dnsmasq[667]: using nameserver XXX.XXX.52.11#53
Jun 06 14:03:55 hostname dnsmasq[667]: using nameserver XXX.XXX.52.131#53
Jun 06 14:04:31 hostname dnsmasq[667]: reading /etc/resolv.conf
Jun 06 14:04:31 hostname dnsmasq[667]: using nameserver XXX.XXX.52.11#53
Jun 06 14:04:31 hostname dnsmasq[667]: using nameserver XXX.XXX.52.131#53
Jun 06 14:04:31 hostname dnsmasq[667]: using nameserver XXX.XXX.252.252#53
Jun 06 14:04:31 hostname dnsmasq[667]: reading /etc/resolv.conf
Jun 06 14:04:31 hostname dnsmasq[667]: using nameserver XXX.XXX.252.252#53
Jun 06 14:04:31 hostname dnsmasq[667]: using nameserver XXX.XXX.52.131#53
Jun 06 14:04:31 hostname dnsmasq[667]: using nameserver XXX.XXX.52.11#53
The libvirt wiki page says to check the following commands (all run from my user):
$ virsh -c qemu:///system
^C
$ virsh -c qemu:///session
Willkommen bei virsh, dem interaktiven Virtualisierungsterminal.
Tippen Sie: 'help' für eine Hilfe zu den Befehlen
'quit' zum Beenden
virsh # quit
$ sudo virsh -c qemu:///system
^C
$ sudo virsh -c qemu:///session
^C
As you can see, only the session one without root rights does work, the others hang until I aborted them.
So, now I am a bit at a loss of places to check what is hanging where to look for the actual problem.
Can anybody point me to where to start poking or has an idea of what broke with this point release? I looked into libvirt's changelog but there seems to be nothing suspicious.
Thank you very much and best regards
Uli
Last edited by cybuzuma (2023-07-14 08:15:13)
Offline
I am experiencing the same problem. Previously (for the past couple of years) everything has been working flawlessly, but suddenly after the last update the system session can't connect - it just hangs. Restarting the libvirtd service allows the connection to be made successfully, so everything still works. But after a reboot or a random period of time, the connection fails. Nothing obvious in any logs, no obvious changes in config files that I've found yet, but I have to restart the libvirtd service to have things work as expected.
The only hint that something may not be right is an error in the libvirtd.service logs that says "internal error: Cannot find start time for pid NNNN".
I'm still trying to track down the cause. I'll post if I find something.
Offline
I should mention that I also have an active user session that connects successfully without problem. It's just the system session that suddenly can't connect without a little nudge.
Offline
same problem here, I had to restart libvirtd service
result of systemctl status libvirtd (after restarting the service) is:
internal error: Cannot find start time for pid ....
Last edited by archbobo (2023-06-06 20:17:52)
Offline
Ok, so I have bad/good news from my side. I can not reproduce this anymore, it just works again, "sudo virsh" as well as virt manager.
The action that correlates (more I dare not to say) with it was that I rebootet clean, restarted libvirtd right away before checking anything else and now it works even after reboot and some uptime.
I can not find any obvious changes in my system that explain this, apart from being in a different network environment (my VM setup does rely on dnsmasq, which again reads dns config as far as I know).
Maybe you can check with a different internet connection, no internet connection or something else.
I also use wireguard vpn, but could not produce any failures related to it at the moment, all in all wild guessing here...
I will check again tomorrow when I am back in the other network.
Last edited by cybuzuma (2023-06-06 20:38:00)
Offline
The action that correlates (more I dare not to say) with it was that I rebootet clean, restarted libvirtd right away before checking anything else and now it works even after reboot and some uptime.
That's good troubleshooting, except that restarting libvirtd any time causes things to work, whether immediately after reboot or sometime later when the problem is noticed. I've done the same. Reboot again, though, and the problem reappears. So that's really the problem, I suppose: libvirtd no longer behaves as previously after a reboot (which occurs often with Arch since there have been a lot of kernel updates lately, which triggers a "reboot needed" message). So the problem is really: Why does libvirtd no longer work properly after a reboot after the latest update?
Edit: Not just after reboot. Earlier I restarted libvirtd service while verifying the behaviour on my system for this thread. Everything worked as normal. *I have not rebooted since*. I just did some other stuff until this thread was updated. So I checked again, and the connection error is back again. So after some unknown time, the problem reappears even after restarting libvirtd.service.
Last edited by AurGlass (2023-06-06 20:55:20)
Offline
cybuzuma wrote:The action that correlates (more I dare not to say) with it was that I rebootet clean, restarted libvirtd right away before checking anything else and now it works even after reboot and some uptime.
That's good troubleshooting, except that restarting libvirtd any time causes things to work, whether immediately after reboot or sometime later when the problem is noticed. I've done the same. Reboot again, though, and the problem reappears.
Not sure if I wasn't clear on that or if you misread: After above described actions, everything continues to work through multiple additional reboots without intervention.
Offline
Not sure if I wasn't clear on that or if you misread: After above described actions, everything continues to work through multiple additional reboots without intervention.
Interesting. It seems like whatever's going on is somewhat nuanced. Thanks for clarifying. I continue to be baffled.
Offline
I am also affected by this. Here's what I found:
- The libvirt 9.4.0 changelog is nearly empty at the time of writing; only one author in the git log bothered to document the changes during the development. Look at the git log to see the actual changes. https://gitlab.com/libvirt/libvirt/-/co … ight=false
- Found this issue report; it appears that libvirtd exits after 120 seconds of no activity (no VMs at boot). Not really sure if it's the same issue, but looks at least related. https://gitlab.com/libvirt/libvirt/-/issues/483
Work-around for now that does the job for me: restart libvirtd.service then 'quickly' connect and start a VM.
Offline
I found that upstream report too.. Cannot spot anything obvious in the diff...
If downgrading to 9.3.0 resolves it, a git bisect to find the offending commit would be useful. Who's up for it?
Offline
Just ran into this thread and experience the same as the rest of you, never tried to trace it out but tried libvirt-git and everything went back to normal. Likely should have checked the logs first.
Offline
> If downgrading to 9.3.0 resolves it
It doesn't. I'm still on 9.3.0 and now I've got the issue. It could be the latest systemd upgrade...investigating...
Offline
Additional observation after reading the upstream report:
If I stop libvirtd.service after the demon exited (ie 120secs without activity) and then run "virsh", the demon gets restarted and it works as intended.
As the upstream report indicates, in some cases, the demon exits after the 120s timeout, but the dnsmasq processes hang on and stop the service from being stopped. Stopping the service manually kills the hanging dnsmasq processes and allows the service to be started again on request, which is the generally intended behaviour as far as I understand.
It seems that the behaviour of systemd regarding this state of the main process exiting but some leftovers hanging on changed in a way that broke libvirtd.service which relies on being stopped when the timeout hits, so it can be started again on request.
Offline
I can verify that killing the dnsmasq processes (2 on my system) allows the connection to complete, so these leftover processes indeed seem to be the issue.
Details:
1. Stop libvirtd service
2. Kill -5 dnsmasq process
3. Launch virt-manager or virsh. Connection succeeds.
4. Exit virt-manager or virsh.
5. Wait awhile (>120s)
6. Launch virt-manager or virsh. Connection hangs (if waited long enough).
7. kill -5 dnsmasq process. Connection immediately completes.
I don't have to restart libvirtd service first - killing dnsmasq is sufficient to stop the connection from hanging. Libvirtd immediately starts new ones and everything works until the daemon times out again.
Offline
I have bisected the offending systemd commit. More information in this systemd issue [1]
It's possible this will need a fix on the libvirt side...not sure yet...
Offline
I have bisected the offending systemd commit. More information in this systemd issue [1]
It's possible this will need a fix on the libvirt side...not sure yet...
Thanks for that. Lennart definitely on brand with the blunt commentary. Reminds me of Linus from back in the day. :-)
Offline
This affects me too, work-around also works
Work-around for now that does the job for me: restart libvirtd.service then 'quickly' connect and start a VM.
Offline
Hello guys this is a duplicate of this issue - https://bbs.archlinux.org/viewtopic.php?id=286377
According do r/vfio discord server u have to do this to ""temporarily""" fix the issue - https://i.imgur.com/ySny413.png
Last edited by RounakDutta (2023-06-10 16:14:01)
Offline
Hello guys this is a duplicate of this issue - https://bbs.archlinux.org/viewtopic.php?id=286377
According do r/vfio discord server u have to do this to ""temporarily""" fix the issue - https://i.imgur.com/ySny413.png
Same problem here, I'm triying this temporaly fix but does not work for me after create the file. I understand I should to reboot the computer because this is a problem beetwen systemd and libvirtd components.
EDIT:
After reboot the system all works perfectly.
Last edited by Mario156090 (2023-06-13 14:14:56)
Offline
To summarize this far (credits to this post here and @RounakDutta for the solution and the explanation),
The problem
Tools such as virsh or virtmanager hang indefinitely while connecting to qemu:///system, but importantly not when connecting to qemu:///session. For example,
virsh --connect qemu:///session
gives a shell prompt (exit with Ctrl+D), while,
virsh --connect qemu:///system
hangs indefinitely.
If one restarts libvirtd,
systemctl restart libvirtd
then one is able to connect to qemu:///system and run VMs for a period of 120 seconds. Shutting down the VM after 120 seconds and trying to connect again hangs indefinitely.
A last check that this is your issue is to kill all dnsmasq processes. Get the PIDs of the processes, for example with,
ps ax | grep dnsmasq
Then kill the processes:
kill -5 [list of PIDs]
Then try to connect to qemu:///system within 120 seconds. It should go through.
The explanation
The issue is not with a libvirtd update, but a systemd update (253.5 at the time of writing). The conditions for a service to be considered inactive have changed, so libvirtd dies after 120 seconds, by design, but systemd thinks it is still up, because of the dnsmasq processes still running.
The temporary solution
Tell libvirtd not to die after 120 seconds:
sudo bash -c "echo 'LIBVIRTD_ARGS=\"\"' >> /etc/conf.d/libvirtd" && sudo systemctl restart libvirtd
This appends a line to /etc/conf.d/libvirtd which should be removed once the issue is fixed upstream.
Last edited by sbr (2023-06-14 10:15:31)
Offline
Offline