You are not logged in.
I am a new user of Arch Linux, and my system crashes when I try to start a Windows virtual machine (VM) using QEMU/KVM. The VM freezes during a Windows update, and the mouse becomes unresponsive. After forcing a power off and reboot, I get disk errors, and the filesystem becomes read-only. I am trying to recover my system, but I am not sure how to proceed. This is actually the second time this issue has occurred. The first crash happened during a Windows update in the VM as well, and the system behavior was the same: after rebooting, the filesystem became read-only, and I had to reinstall the entire system.
Operating System: Arch Linux
Desktop Environment: KDE
Filesystem: Btrfs
Kernel Version: linux-lts 6.12.66-1
Virtualization Tool: QEMU/KVM
Windows VM: Windows 11 Enterprise LTSC 2024
GPU Passthrough: Nvidia GPU passthrough using kvmfr module with Looking Glass
The system freezes during a Windows update process in the VM, which is running under QEMU/KVM. The VM shows the "Updating" screen, then freezes, and the mouse stops moving. The system becomes unresponsive, and I have to force a reboot.
After rebooting, I get disk errors, and all files have become read-only.
I booted into a rescue system and collected the journalctl -p 4 logs, which contain important information. I am not sure if QEMU is causing the issue, so I saved the entire journalctl -p 4 logs and have attached them here for further analysis.
Note: The coredump files are too large to attach, and there are many different coredumps. I have only included the logs for now. If you need the original coredump files or any other information, please let me know, and I will provide them.
Disk Check:
I ran smartctl, but it did not show any errors.
System Logs:
I collected the journalctl -p 4 logs, which I think are relevant to the crash.
Memory Test:
The night before the crash, I ran MemTest86 overnight, and it didn't show any errors.
Disk Issues: The filesystem turning read-only could be caused by filesystem corruption (possibly Btrfs), but I am unsure why the filesystem suddenly became corrupted.
KVM/QEMU Configuration: I am not sure if QEMU is the cause of the issue, but the crash always seems to happen during a Windows update process. This could be related to the GPU passthrough or resource allocation for the VM.
Log Analysis: Could you help me analyze the journalctl -p 4 logs to find out what caused the system crash?
Coredump Analysis: Could you also analyze the original coredump files to help identify the cause of the crash? I have many different coredumps and will provide them if needed.
Disk Recovery: What are the best steps to fix a read-only Btrfs filesystem and recover my data?
Virtualization Settings: Can you suggest any changes to my QEMU/KVM setup, especially with the GPU passthrough using the kvmfr module? I am using Nvidia GPU passthrough with Looking Glass, and I think this might be causing the problem.
Recovery Steps: What would you recommend as the next steps to restore the system and prevent this issue from happening again?
Resource Allocation: I have allocated a GPU to the Windows VM using kvmfr for the framebuffer. Could allocating too many resources to the VM cause the system to crash?
Note: The coredump files are too large to attach here, but I have many different coredumps. Please let me know if you need them.
Offline
as it isn't really clear (at least to me): does only the vm crash or the entire host system?
a few troublshooting ideas:
- have you tried tge update without the gpu passthrough?
- as BTRfs is already CoW: do you use qcow2 for the image? have you tested with RAW? test the vm image on another non-btrfs partition with ext4 - iirc CoW can be disable on a file or subvol level
- how's the host configure? is the passed gpu the only one? is it detected as primary by the uefi? are there other gpus in the system? have you tried a different gpu or swap them around in different slots?
currently way too many options - need to narrow down possible cause and chain of trigger
Offline
as it isn't really clear (at least to me): does only the vm crash or the entire host system?
a few troublshooting ideas:
- have you tried tge update without the gpu passthrough?
- as BTRfs is already CoW: do you use qcow2 for the image? have you tested with RAW? test the vm image on another non-btrfs partition with ext4 - iirc CoW can be disable on a file or subvol level
- how's the host configure? is the passed gpu the only one? is it detected as primary by the uefi? are there other gpus in the system? have you tried a different gpu or swap them around in different slots?currently way too many options - need to narrow down possible cause and chain of trigger
Thank you for your response!
To clarify:
The entire Arch Linux system crashes, not just the VM. Both the host and the Windows VM become unresponsive.
I am using qcow2 format for the VM image, and I have already disabled COW for the virtual machine image directory.
I have passed through an Nvidia discrete GPU to the Windows VM, and the host still has an Intel integrated GPU. Since I am using a laptop, I am unable to swap GPU slots.
The system had been running fine before, but after starting the Windows VM, Windows started updating automatically, and then the entire Arch Linux system froze. This issue seems similar to a previous crash I had, which also happened during a Windows update in the VM. While I am not certain that the Windows update is the direct cause, I am mentioning it here since it seems to coincide with the crashes.
Offline
well - the https://wiki.archlinux.org/title/Hybrid_graphics setup of the system (laptop) could be a benefiting factor - please share the exact model of the system
have you checked for bios updates?
Offline