You are not logged in.
Hello.
Here's my setup:
1. Supermicro X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013
2. Custom built kernel 3.16.3 (taken from kernel.org) + applied patches from OP
3. Cmdline intel_iommu=on,igfx_off pci_stub.ids=10de:1005,10de:0e1a4,1002:0b0c,1002:aac8 vfio_iommu_type1.allow_unsafe_interrupts=1 kvm_intel.emulate_invalid_guest_state=0 nohz=off
4. Latest Qemu from git built by myself.
5. Latest Seabios from git built by myself.
With this I've managed to passthrough an NVIDIA GTX TITAN. Everything works fine.
Now I'm trying to passthrough the AMD FirePro W8100 (and W9100). I set up a VM with emulated VGA and with passedthrough W8100. I set up the driver. The driver sees the W8100. But it is with yellow sign since it is not primary.
And now the problem:
I turn on the VM with the W8100 as the primary adapter and I get a kernel panic on the host (!).
Here's VM:
/usr/local/bin/qemu-system-x86_64 -enable-kvm -M q35 -m 16000 -cpu host,kvm=off \
-smp 4,sockets=1,cores=4,threads=1 \
-bios /usr/share/qemu/bios.bin \
-nodefaults \
-nographic \
-vga none \
-device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
-device vfio-pci,host=83:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \
-device vfio-pci,host=83:00.1,bus=root.1,addr=00.1 \
-net bridge,br=br0 -net nic,model=virtio,macaddr=52:54:00:12:34:60 \
-drive file=/vmstorage/win7test2kvm-firepro-node3.img,id=disk,format=raw,if=virtio \
-mon chardev=monitor0 \
-chardev socket,id=monitor0,path=/home/admin/tmp/win7test2kvm-firepro-node3.monitor,nowait,server \
Here's the kernel panic:
...
[ 2013.702012] vfio_ecap_init: 0000:83:00.0 hiding ecap 0x19@0x270
[ 2013.707946] vfio_ecap_init: 0000:83:00.0 hiding ecap 0x1b@0x2d0
[ 2020.563698] kvm: zapping shadow pages for mmio generation wraparound
[ 2028.031369] vfio-pci 0000:83:00.0: irq 146 for MSI/MSI-X
[ 2050.912282] dmar: DRHD: handling fault status reg 40
.... lots of the same dmar: DRHD: handling fault status reg 40 here
[ 2064.032487] dmar: DRHD: handling fault status reg 40
[ 2064.033072] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 31
[ 2064.033074] CPU: 31 PID: 4197 Comm: qemu-system-x86 Tainted: G W 3.16.3 #1
[ 2064.033075] Hardware name: Supermicro X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013
[ 2064.033077] 0000000000000000 000000006cf33090 ffff88307fde6c10 ffffffff817eb37c
[ 2064.033079] ffffffff81ab62c0 ffff88307fde6c90 ffffffff817e3c66 0000000000000010
[ 2064.033080] ffff88307fde6ca0 ffff88307fde6c40 000000006cf33090 0000000000000000
[ 2064.033080] Call Trace:
[ 2064.033087] <NMI> [<ffffffff817eb37c>] dump_stack+0x45/0x56
[ 2064.033089] [<ffffffff817e3c66>] panic+0xd8/0x20c
[ 2064.033093] [<ffffffff81121250>] ? restart_watchdog_hrtimer+0x50/0x50
[ 2064.033095] [<ffffffff81121312>] watchdog_overflow_callback+0xc2/0xd0
[ 2064.033097] [<ffffffff8115dd8d>] __perf_event_overflow+0x9d/0x250
[ 2064.033098] [<ffffffff8115e884>] perf_event_overflow+0x14/0x20
[ 2064.033102] [<ffffffff81032bcd>] intel_pmu_handle_irq+0x1fd/0x410
[ 2064.033105] [<ffffffff811a3381>] ? unmap_kernel_range_noflush+0x11/0x20
[ 2064.033110] [<ffffffff81436974>] ? ghes_copy_tofrom_phys+0x124/0x210
[ 2064.033113] [<ffffffff81029f5b>] perf_event_nmi_handler+0x2b/0x50
[ 2064.033115] [<ffffffff81017e90>] nmi_handle+0x90/0x130
[ 2064.033116] [<ffffffff810184ae>] default_do_nmi+0xde/0x140
[ 2064.033117] [<ffffffff81018598>] do_nmi+0x88/0xc0
[ 2064.033120] [<ffffffff817f7671>] end_repeat_nmi+0x1e/0x2e
[ 2064.033124] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400
[ 2064.033125] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400
[ 2064.033127] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400
[ 2064.033129] <<EOE>> [<ffffffff81670536>] qi_flush_dev_iotlb+0x86/0xd0
[ 2064.033130] [<ffffffff81672534>] iommu_flush_dev_iotlb+0xa4/0xd0
[ 2064.033132] [<ffffffff81672612>] iommu_flush_iotlb_psi+0xb2/0xe0
[ 2064.033133] [<ffffffff81674fce>] intel_iommu_unmap+0x1ce/0x1e0
[ 2064.033135] [<ffffffff81668340>] iommu_unmap+0xb0/0x190
[ 2064.033140] [<ffffffff81599553>] vfio_remove_dma+0xc3/0x1a0
[ 2064.033142] [<ffffffff817f36e2>] ? mutex_lock+0x12/0x2f
[ 2064.033144] [<ffffffff81599c71>] vfio_iommu_type1_ioctl+0x3e1/0xa20
[ 2064.033163] [<ffffffffa0005486>] ? kvm_set_memory_region+0x36/0x40 [kvm]
[ 2064.033169] [<ffffffffa0005902>] ? kvm_vm_ioctl+0x472/0x730 [kvm]
[ 2064.033171] [<ffffffff81597769>] vfio_fops_unl_ioctl+0x79/0x2b0
[ 2064.033174] [<ffffffff811ef660>] do_vfs_ioctl+0x2e0/0x4a0
[ 2064.033175] [<ffffffff811ef8a1>] SyS_ioctl+0x81/0xa0
[ 2064.033177] [<ffffffff8110ec66>] ? __audit_syscall_exit+0x1f6/0x2a0
[ 2064.033179] [<ffffffff817f5369>] system_call_fastpath+0x16/0x1b
[ 2065.078698] Shutting down cpus with NMI
[ 2065.082550] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 2065.092720] drm_kms_helper: panic occurred, switching back to text console
[ 2065.402288] ------------[ cut here ]------------
[ 2065.406910] kernel BUG at mm/vmalloc.c:1320!
[ 2065.411178] invalid opcode: 0000 [#1] SMP
[ 2065.415312] Modules linked in: tun ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables sg rpcsec_gss_krb5 nls_utf8 mlx4_ib ib_sa ib_mad ib_core ib_addr x86_pkg_temp_thermal coretemp crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel iTCO_wdt iTCO_vendor_support ghash_clmulni_intel mgag200 syscopyarea sysfillrect sysimgblt aesni_intel ttm lrw ahci gf128mul drm_kms_helper mlx4_core glue_helper libahci mei_me drm ablk_helper sb_edac lpc_ich ioatdma cryptd ipmi_si libata edac_core pcspkr mei shpchp mfd_core i2c_i801 wmi dca ipmi_msghandler binfmt_misc kvm_intel nfsd kvm
[ 2065.469011] CPU: 31 PID: 4197 Comm: qemu-system-x86 Tainted: G W 3.16.3 #1
[ 2065.476839] Hardware name: Supermicro X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013
[ 2065.487183] task: ffff882ff4e8d220 ti: ffff882fee33c000 task.ti: ffff882fee33c000
[ 2065.494665] RIP: 0010:[<ffffffff811a2980>] [<ffffffff811a2980>] __get_vm_area_node+0x150/0x160
[ 2065.503388] RSP: 0018:ffff88307fde65f8 EFLAGS: 00010006
[ 2065.508701] RAX: 0000000080110000 RBX: 00000000ffffffff RCX: ffffc90000000000
[ 2065.515830] RDX: 0000000000000022 RSI: 0000000000000001 RDI: 0000000000002000
[ 2065.522964] RBP: ffff88307fde6658 R08: ffffe8ffffffffff R09: 00000000ffffffff
[ 2065.530102] R10: ffffffffa08bfe84 R11: ffff882ff2eb2218 R12: 0000000000001800
[ 2065.537236] R13: 0000000000300000 R14: 00000000000080d2 R15: ffffea0001dbd780
[ 2065.544372] FS: 00007f23dd4b69c0(0000) GS:ffff88307fde0000(0000) knlGS:0000000000000000
[ 2065.552461] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2065.558210] CR2: 00007f23de4693f8 CR3: 0000002ffa139000 CR4: 00000000000427e0
[ 2065.565338] Stack:
[ 2065.567348] ffffffff811a4040 ffffffff000080d2 ffffffffa05022b9 8000000000000163
[ 2065.574806] 000080d200000001 ffff88307fde675d 000000006cf33090 ffff882ff4d73de0
[ 2065.582262] ffff882ff2eb21f8 0000000000300000 0000000000000080 ffffea0001dbd780
[ 2065.589719] Call Trace:
[ 2065.592163] <NMI>
[ 2065.594089] [<ffffffff811a4040>] ? __vmalloc_node_range+0x80/0x280
[ 2065.600586] [<ffffffffa05022b9>] ? ttm_tt_init+0x69/0xb0 [ttm]
[ 2065.606506] [<ffffffff811a4281>] __vmalloc+0x41/0x50
[ 2065.611555] [<ffffffffa05022b9>] ? ttm_tt_init+0x69/0xb0 [ttm]
[ 2065.617474] [<ffffffffa05022b9>] ttm_tt_init+0x69/0xb0 [ttm]
[ 2065.623233] [<ffffffffa08bfea8>] mgag200_ttm_tt_create+0x58/0x90 [mgag200]
[ 2065.630204] [<ffffffffa0502a5d>] ttm_bo_add_ttm+0x8d/0xc0 [ttm]
[ 2065.636213] [<ffffffffa05040e1>] ttm_bo_handle_move_mem+0x571/0x5b0 [ttm]
[ 2065.643096] [<ffffffffa0504756>] ? ttm_bo_mem_space+0x116/0x340 [ttm]
[ 2065.649633] [<ffffffffa0504e47>] ttm_bo_validate+0x247/0x260 [ttm]
[ 2065.655919] [<ffffffff8105e959>] ? iounmap+0x79/0xa0
[ 2065.660982] [<ffffffff81050059>] ? native_safe_x2apic_wait_icr_idle+0x9/0x10
[ 2065.668119] [<ffffffffa08c0522>] mgag200_bo_push_sysram+0x82/0xe0 [mgag200]
[ 2065.675160] [<ffffffffa08bba95>] mga_crtc_do_set_base.isra.8.constprop.20+0x85/0x470 [mgag200]
[ 2065.683857] [<ffffffffa08bcebb>] mga_crtc_mode_set+0x103b/0x2160 [mgag200]
[ 2065.690836] [<ffffffff8139b088>] ? __const_udelay+0x28/0x30
[ 2065.696508] [<ffffffffa01d6939>] drm_crtc_helper_set_mode+0x2e9/0x520 [drm_kms_helper]
[ 2065.704519] [<ffffffffa01d76bf>] drm_crtc_helper_set_config+0x87f/0xaa0 [drm_kms_helper]
[ 2065.712709] [<ffffffffa03ff6d1>] drm_mode_set_config_internal+0x61/0xe0 [drm]
[ 2065.719935] [<ffffffffa01d9ca3>] restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper]
[ 2065.727243] [<ffffffffa01d9ea5>] drm_fb_helper_force_kernel_mode+0x75/0xb0 [drm_kms_helper]
[ 2065.735680] [<ffffffffa01dabb9>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
[ 2065.743084] [<ffffffff8109b7ac>] notifier_call_chain+0x4c/0x70
[ 2065.749005] [<ffffffff8109b80a>] atomic_notifier_call_chain+0x1a/0x20
[ 2065.755534] [<ffffffff817e3c93>] panic+0x105/0x20c
[ 2065.760416] [<ffffffff81121250>] ? restart_watchdog_hrtimer+0x50/0x50
[ 2065.766936] [<ffffffff81121312>] watchdog_overflow_callback+0xc2/0xd0
[ 2065.773456] [<ffffffff8115dd8d>] __perf_event_overflow+0x9d/0x250
[ 2065.779630] [<ffffffff8115e884>] perf_event_overflow+0x14/0x20
[ 2065.785552] [<ffffffff81032bcd>] intel_pmu_handle_irq+0x1fd/0x410
[ 2065.791733] [<ffffffff811a3381>] ? unmap_kernel_range_noflush+0x11/0x20
[ 2065.798428] [<ffffffff81436974>] ? ghes_copy_tofrom_phys+0x124/0x210
[ 2065.804869] [<ffffffff81029f5b>] perf_event_nmi_handler+0x2b/0x50
[ 2065.811051] [<ffffffff81017e90>] nmi_handle+0x90/0x130
[ 2065.816270] [<ffffffff810184ae>] default_do_nmi+0xde/0x140
[ 2065.821836] [<ffffffff81018598>] do_nmi+0x88/0xc0
[ 2065.826623] [<ffffffff817f7671>] end_repeat_nmi+0x1e/0x2e
[ 2065.832102] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400
[ 2065.837930] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400
[ 2065.843765] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400
[ 2065.849589] <<EOE>>
[ 2065.851687] [<ffffffff81670536>] qi_flush_dev_iotlb+0x86/0xd0
[ 2065.857733] [<ffffffff81672534>] iommu_flush_dev_iotlb+0xa4/0xd0
[ 2065.863827] [<ffffffff81672612>] iommu_flush_iotlb_psi+0xb2/0xe0
[ 2065.869915] [<ffffffff81674fce>] intel_iommu_unmap+0x1ce/0x1e0
[ 2065.875835] [<ffffffff81668340>] iommu_unmap+0xb0/0x190
[ 2065.881142] [<ffffffff81599553>] vfio_remove_dma+0xc3/0x1a0
[ 2065.886796] [<ffffffff817f36e2>] ? mutex_lock+0x12/0x2f
[ 2065.892103] [<ffffffff81599c71>] vfio_iommu_type1_ioctl+0x3e1/0xa20
[ 2065.898463] [<ffffffffa0005486>] ? kvm_set_memory_region+0x36/0x40 [kvm]
[ 2065.905252] [<ffffffffa0005902>] ? kvm_vm_ioctl+0x472/0x730 [kvm]
[ 2065.911429] [<ffffffff81597769>] vfio_fops_unl_ioctl+0x79/0x2b0
[ 2065.917428] [<ffffffff811ef660>] do_vfs_ioctl+0x2e0/0x4a0
[ 2065.922907] [<ffffffff811ef8a1>] SyS_ioctl+0x81/0xa0
[ 2065.927955] [<ffffffff8110ec66>] ? __audit_syscall_exit+0x1f6/0x2a0
[ 2065.934300] [<ffffffff817f5369>] system_call_fastpath+0x16/0x1b
[ 2065.940299] Code: 00 00 00 0f bd cf 83 c1 01 83 f9 0c 0f 4c c8 b0 13 83 f9 13 0f 4f c8 49 d3 e4 e9 fd fe ff ff 4c 89 ff e8 f4 b3 01 00 31 c0 eb b3 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
[ 2065.960268] RIP [<ffffffff811a2980>] __get_vm_area_node+0x150/0x160
[ 2065.966631] RSP <ffff88307fde65f8>
I've also tried the romfile option with the BIOS dumped with ATIFlash (FreeDOS bootable USB stick with ATIFlash). But it did not change anything.
I googled but I did not find anyone's experience with W8100 and W9100 GPUs. So I'm confused on where to move.. I've read pretty much info but I'm really stuck. Any help is really appriciated.
PS I have to say that I actually don't know which set of options from my cmdline actually made the GTX TITAN passthrough possible. I've been reading this post and Alex Williamson's vfio blog and adding stuff and experimenting. After another try I've got a working VM and stopped
Thank you,
Grigory.
Last edited by GrigoryPtashko (2014-10-02 17:46:41)
Offline
Hi,
I am running a similar setup than OP (kvm vga passthrough vfio) based on Ubuntu 14.04 with kernel 3.13 and KVM 2.0. I have a weird similar to the one OP mentioned. My sound sometimes "lags". It is completely chopped and make the games lag. It only occurs on games. Video playback and music are fine. It happens both with audio through HDMI and with a dedicated USB sound card passed through (along with keyboard / mouse).
I noticed OP fixed this (in his 2Nd post), however the fix only applies on AMD board (I tried it, just to be sure), and I have an Intel one.
Anyone has this same issue or knows how to fix it?
Thanks in advance
PS: talking about sound, I have another issue. I am pretty sure "this is not a bug, this a feature", however I'll ask. When playing a Windows game, if I change input via an HDMI switch (eg: I switch back to linux), then when I go back to the game there is no longer sound. Sounds still works in Windows if I ALT + TAB, but nothing in game. I have to quit and relaunch. Happens also with an USB sound card when I switch back to Linux. If someone has as fix, I'll love to know about it.
EDIT: actually this issue is really software related. Not longer an issue then. For example, Watch_Dogs does it, while Middle Earth: Battle for Mordor doesn't.
Last edited by Nesousx (2014-10-03 16:34:01)
Offline
Hi,
I am running a similar setup than OP (kvm vga passthrough vfio) based on Ubuntu 14.04 with kernel 3.13 and KVM 2.0. I have a weird similar to the one OP mentioned. My sound sometimes "lags". It is completely chopped and make the games lag. It only occurs on games. Video playback and music are fine. It happens both with audio through HDMI and with a dedicated USB sound card passed through (along with keyboard / mouse).
I noticed OP fixed this (in his 2Nd post), however the fix only applies on AMD board (I tried it, just to be sure), and I have an Intel one.
Anyone has this same issue or knows how to fix it?
Thanks in advance
PS: talking about sound, I have another issue. I am pretty sure "this is not a bug, this a feature", however I'll ask. When playing a Windows game, if I change input via an HDMI switch (eg: I switch back to linux), then when I go back to the game there is no longer sound. Sounds still works in Windows if I ALT + TAB, but nothing in game. I have to quit and relaunch. Happens also with an USB sound card when I switch back to Linux. If someone has as fix, I'll love to know about it.
If you have a dedicated USB sound card, you can use it, well, dedicated to linux, without getting linux sound out of it. I think that should help your latest issue. And maybe the former too.
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
Nesousx wrote:Hi,
I am running a similar setup than OP (kvm vga passthrough vfio) based on Ubuntu 14.04 with kernel 3.13 and KVM 2.0. I have a weird similar to the one OP mentioned. My sound sometimes "lags". It is completely chopped and make the games lag. It only occurs on games. Video playback and music are fine. It happens both with audio through HDMI and with a dedicated USB sound card passed through (along with keyboard / mouse).
I noticed OP fixed this (in his 2Nd post), however the fix only applies on AMD board (I tried it, just to be sure), and I have an Intel one.
Anyone has this same issue or knows how to fix it?
Thanks in advance
PS: talking about sound, I have another issue. I am pretty sure "this is not a bug, this a feature", however I'll ask. When playing a Windows game, if I change input via an HDMI switch (eg: I switch back to linux), then when I go back to the game there is no longer sound. Sounds still works in Windows if I ALT + TAB, but nothing in game. I have to quit and relaunch. Happens also with an USB sound card when I switch back to Linux. If someone has as fix, I'll love to know about it.
If you have a dedicated USB sound card, you can use it, well, dedicated to linux, without getting linux sound out of it. I think that should help your latest issue. And maybe the former too.
I have been doing this for months, but it doesn't help at all. I am back with full sound over HDMI and will sell the sound card.
Offline
Duelist wrote:Nesousx wrote:Hi,
I am running a similar setup than OP (kvm vga passthrough vfio) based on Ubuntu 14.04 with kernel 3.13 and KVM 2.0. I have a weird similar to the one OP mentioned. My sound sometimes "lags". It is completely chopped and make the games lag. It only occurs on games. Video playback and music are fine. It happens both with audio through HDMI and with a dedicated USB sound card passed through (along with keyboard / mouse).
I noticed OP fixed this (in his 2Nd post), however the fix only applies on AMD board (I tried it, just to be sure), and I have an Intel one.
Anyone has this same issue or knows how to fix it?
Thanks in advance
PS: talking about sound, I have another issue. I am pretty sure "this is not a bug, this a feature", however I'll ask. When playing a Windows game, if I change input via an HDMI switch (eg: I switch back to linux), then when I go back to the game there is no longer sound. Sounds still works in Windows if I ALT + TAB, but nothing in game. I have to quit and relaunch. Happens also with an USB sound card when I switch back to Linux. If someone has as fix, I'll love to know about it.
If you have a dedicated USB sound card, you can use it, well, dedicated to linux, without getting linux sound out of it. I think that should help your latest issue. And maybe the former too.
I have been doing this for months, but it doesn't help at all. I am back with full sound over HDMI and will sell the sound card.
Durr, i've meant, dedicated to VM, not linux. Wasn't too fast to edit the message.
So, like, linux sound getting out of HDMI, VM sound getting out of USB.
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
@ Duelist, I have tried many combinations, with one or several (one per host and one per VM) cards... always the same bug happen (the first issue I was talking about). The sound is choppy then the game lag. I am thinking that this might not be sound related. Sound being choppy is probably just a side effect, but where to look? Found nothing relevant yet, in logs. It seems to happen randomly too.
About my 2nd issue, it is no longer an issue since it is software related. I edited my post. I have been playing one game all the time that causes the bug, and I remember I had it before on other games too and then thought it was system wide before testing for real.
Last edited by Nesousx (2014-10-03 16:48:52)
Offline
@ Duelist, I have tried many combinations, with one or several (one per host and one per VM) cards... always the same bug happen (the first issue I was talking about). The sound is choppy then the game lag. I am thinking that this might not be sound related. Sound being choppy is probably just a side effect, but where to look? Found nothing relevant yet, in logs. It seems to happen randomly too.
Audio might be fairly sensitive to interrupt latency. Have you tried my instructions here for making Windows use MSI interrupts for the audio device? I've done this successfully for both AMD and NVIDIA audio functions on Win8. Pinning vCPUs and using hugepages can also help latency issues. You can also pin the host interrupt for the device to a host CPU not used by the VM to try to further improve latency.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
This is probably a question for redger, but anyone with an idea can chime right in.
I've had a working Win7 passthrough for quite some time, but have wanted to migrate over from qemu commandline to libvirt for a variety of reasons. I followed the excellent guide on page 94 of the thread, but Windows 7 gets caught in the Windows cannot start, repair or reboot loop. Interestingly, if one tries to restart the VM from inside, it bluescreens.
I can post the details of the qemu script and the xml if needed, but there is nothing remarkable. Using kernel 3.15.5 and qemu 2.0.0, libvirt 1.2.9. I use the virtio drivers for the hd and network. Other VMs (new to libvirt, not migrated) seem to work OK, so I assume this is a Windows problem having to do with a "change" in hardware between qemu commandline and libvirt.
Presumably this is a common problem with migrations and the unpalatable answer is to reinstall Windows under libvirt, but I'd rather avoid that. (If I did that I'd probably also change from q35 to 440fx and give OVMF a try.) Is there a VM preparation maneuver to make the transition less likely to fail? A registry hack? Google fails to enlighten in this case.
Offline
This is probably a question for redger, but anyone with an idea can chime right in.
I've had a working Win7 passthrough for quite some time, but have wanted to migrate over from qemu commandline to libvirt for a variety of reasons. I followed the excellent guide on page 94 of the thread, but Windows 7 gets caught in the Windows cannot start, repair or reboot loop. Interestingly, if one tries to restart the VM from inside, it bluescreens.
I can post the details of the qemu script and the xml if needed, but there is nothing remarkable. Using kernel 3.15.5 and qemu 2.0.0, libvirt 1.2.9. I use the virtio drivers for the hd and network. Other VMs (new to libvirt, not migrated) seem to work OK, so I assume this is a Windows problem having to do with a "change" in hardware between qemu commandline and libvirt.
Presumably this is a common problem with migrations and the unpalatable answer is to reinstall Windows under libvirt, but I'd rather avoid that. (If I did that I'd probably also change from q35 to 440fx and give OVMF a try.) Is there a VM preparation maneuver to make the transition less likely to fail? A registry hack? Google fails to enlighten in this case.
It should be entirely possible to convert from qemu commandline to libvirt in a compatible way, but the differences can be subtle. Often people don't specify addresses for every device on the commandline and libvirt is likely to re-order the device declaration, so at a minimum you'd want to figure out the PCI address for each device and specify it explicitly in the xml. Windows gets confused enough when devices move around, but if you end up changing devices at the same time, you're probably in for a world of hurt. You probably also want to switch to a dummy disk image or snapshot so that you can make a few attempts comparing the libvirt generated commandline to your own before you risk the real VM disk image. libvirt's restrictive support for q35 just makes the conversion all that much harder too.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Nesousx wrote:@ Duelist, I have tried many combinations, with one or several (one per host and one per VM) cards... always the same bug happen (the first issue I was talking about). The sound is choppy then the game lag. I am thinking that this might not be sound related. Sound being choppy is probably just a side effect, but where to look? Found nothing relevant yet, in logs. It seems to happen randomly too.
Audio might be fairly sensitive to interrupt latency. Have you tried my instructions here for making Windows use MSI interrupts for the audio device? I've done this successfully for both AMD and NVIDIA audio functions on Win8. Pinning vCPUs and using hugepages can also help latency issues. You can also pin the host interrupt for the device to a host CPU not used by the VM to try to further improve latency.
Thanks. I'll give it a shot tomorrow.
Last edited by Nesousx (2014-10-03 21:28:36)
Offline
aw wrote:Nesousx wrote:@ Duelist, I have tried many combinations, with one or several (one per host and one per VM) cards... always the same bug happen (the first issue I was talking about). The sound is choppy then the game lag. I am thinking that this might not be sound related. Sound being choppy is probably just a side effect, but where to look? Found nothing relevant yet, in logs. It seems to happen randomly too.
Audio might be fairly sensitive to interrupt latency. Have you tried my instructions here for making Windows use MSI interrupts for the audio device? I've done this successfully for both AMD and NVIDIA audio functions on Win8. Pinning vCPUs and using hugepages can also help latency issues. You can also pin the host interrupt for the device to a host CPU not used by the VM to try to further improve latency.
Thanks. It looks like my device doesn't support MSI. Sorry if I missed something from your post, I am about to go to bed, and quite tired.
sudo lspci -v -s 1:00.0 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti PRO [Radeon HD 7950/8950 OEM / R9 280] (prog-if 00 [VGA controller]) Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
We can notice an MSI- instead of MSI+.
The fact that it has an MSI capability means that it supports MSI. -/+ tells you whether it's enabled. However, what I'm suggesting is that you assign and enable MSI interrupts for the audio function in order to reduce interrupt latency and perhaps improve the choppy audio. What you've listed here is the GPU function. Enabling MSI for both is even better, but MSI on the GPU is typically enabled by default on AMD.
Last edited by aw (2014-10-03 21:33:29)
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
@aw Thanks; I'll give it a try. However I'm unsure about something; you mentioned a
libvirt generated commandline
I've got the xml (which has some autogenerated PCI lines in it) eg.
<address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
for non-vfio passthrough devices, like the pci-bridge. I presume these are what you mean I should compare to their equivalents somewhere in Windows' Device Manager?
I've also got the qemu command line, eg.
-device vfio-pci,host=01:00.1,bus=root.1,addr=00.1
which just gets translated as
<qemu:arg value='vfio-pci,host=01:00.1,bus=root.1,addr=00.1'
which look identical so are presumably not the problem.
Offline
Hi .
I always had this question in mind :
Does using PLX chips on motherboards to expand PCI-E lanes make it harder to passthorugh devices installed in such PCI-E slots ?
I used to have ASRock Z77 Extreme 11 which has PLX chips , whenever I tried to passthrough GPUs , the whole host hanged .
I might opt for an ASUS X99-E WS , which has PLX chips as well . Can anyone confirm whether it is possible or not ?
Offline
Nesousx wrote:aw wrote:Audio might be fairly sensitive to interrupt latency. Have you tried my instructions here for making Windows use MSI interrupts for the audio device? I've done this successfully for both AMD and NVIDIA audio functions on Win8. Pinning vCPUs and using hugepages can also help latency issues. You can also pin the host interrupt for the device to a host CPU not used by the VM to try to further improve latency.
Thanks. It looks like my device doesn't support MSI. Sorry if I missed something from your post, I am about to go to bed, and quite tired.
sudo lspci -v -s 1:00.0 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti PRO [Radeon HD 7950/8950 OEM / R9 280] (prog-if 00 [VGA controller]) Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
We can notice an MSI- instead of MSI+.
The fact that it has an MSI capability means that it supports MSI. -/+ tells you whether it's enabled. However, what I'm suggesting is that you assign and enable MSI interrupts for the audio function in order to reduce interrupt latency and perhaps improve the choppy audio. What you've listed here is the GPU function. Enabling MSI for both is even better, but MSI on the GPU is typically enabled by default on AMD.
I read too fast yesterday, and you replied before my edit.
I did some change and now MSI is enabled. I'll try and let you know.
sudo lspci -v -s 1:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti PRO [Radeon HD 7950/8950 OEM / R9 280] (prog-if 00 [VGA controller])
Subsystem: PC Partner Limited / Sapphire Technology Device e210
Flags: bus master, fast devsel, latency 0, IRQ 59
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f7b00000 (64-bit, non-prefetchable) [size=256K]
I/O ports at e000 [size=256]
Expansion ROM at f7b40000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] #13
Capabilities: [2d0] #1b
Kernel driver in use: vfio-pci
Offline
Hello.
Here's my setup:
1. Supermicro X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013
2. Custom built kernel 3.16.3 (taken from kernel.org) + applied patches from OP
3. Cmdline intel_iommu=on,igfx_off pci_stub.ids=10de:1005,10de:0e1a4,1002:0b0c,1002:aac8 vfio_iommu_type1.allow_unsafe_interrupts=1 kvm_intel.emulate_invalid_guest_state=0 nohz=off
4. Latest Qemu from git built by myself.
5. Latest Seabios from git built by myself.With this I've managed to passthrough an NVIDIA GTX TITAN. Everything works fine.
Now I'm trying to passthrough the AMD FirePro W8100 (and W9100). I set up a VM with emulated VGA and with passedthrough W8100. I set up the driver. The driver sees the W8100. But it is with yellow sign since it is not primary.And now the problem:
I turn on the VM with the W8100 as the primary adapter and I get a kernel panic on the host (!).
Note that the quirks we currently have in place are designed and developed for Radeon cards. We have no idea if they're relevant or correct for FirePro cards. Quadro cards are similar, they also do not work in primary VGA mode using the GeForce quirks. Nvidia does however support secondary assignment for those cards, so we don't bother to come up with quirks for them.
Here's the kernel panic:
... [ 2013.702012] vfio_ecap_init: 0000:83:00.0 hiding ecap 0x19@0x270 [ 2013.707946] vfio_ecap_init: 0000:83:00.0 hiding ecap 0x1b@0x2d0 [ 2020.563698] kvm: zapping shadow pages for mmio generation wraparound [ 2028.031369] vfio-pci 0000:83:00.0: irq 146 for MSI/MSI-X [ 2050.912282] dmar: DRHD: handling fault status reg 40 .... lots of the same dmar: DRHD: handling fault status reg 40 here
dmar faults are usually two lines, one indicating the status register, which you've included and another indicating the faulting device, type of fault, and address, not included. The not included line is generally much more insightful.
[ 2064.032487] dmar: DRHD: handling fault status reg 40 [ 2064.033072] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 31 [ 2064.033074] CPU: 31 PID: 4197 Comm: qemu-system-x86 Tainted: G W 3.16.3 #1 [ 2064.033075] Hardware name: Supermicro X9DRFF-iG+/-7G+/-iTG+/-7TG+/X9DRFF-iG+/-7G+/-iTG+/-7TG+, BIOS 3.0 07/29/2013 [ 2064.033077] 0000000000000000 000000006cf33090 ffff88307fde6c10 ffffffff817eb37c [ 2064.033079] ffffffff81ab62c0 ffff88307fde6c90 ffffffff817e3c66 0000000000000010 [ 2064.033080] ffff88307fde6ca0 ffff88307fde6c40 000000006cf33090 0000000000000000 [ 2064.033080] Call Trace: [ 2064.033087] <NMI> [<ffffffff817eb37c>] dump_stack+0x45/0x56 [ 2064.033089] [<ffffffff817e3c66>] panic+0xd8/0x20c [ 2064.033093] [<ffffffff81121250>] ? restart_watchdog_hrtimer+0x50/0x50 [ 2064.033095] [<ffffffff81121312>] watchdog_overflow_callback+0xc2/0xd0 [ 2064.033097] [<ffffffff8115dd8d>] __perf_event_overflow+0x9d/0x250 [ 2064.033098] [<ffffffff8115e884>] perf_event_overflow+0x14/0x20 [ 2064.033102] [<ffffffff81032bcd>] intel_pmu_handle_irq+0x1fd/0x410 [ 2064.033105] [<ffffffff811a3381>] ? unmap_kernel_range_noflush+0x11/0x20 [ 2064.033110] [<ffffffff81436974>] ? ghes_copy_tofrom_phys+0x124/0x210 [ 2064.033113] [<ffffffff81029f5b>] perf_event_nmi_handler+0x2b/0x50 [ 2064.033115] [<ffffffff81017e90>] nmi_handle+0x90/0x130 [ 2064.033116] [<ffffffff810184ae>] default_do_nmi+0xde/0x140 [ 2064.033117] [<ffffffff81018598>] do_nmi+0x88/0xc0 [ 2064.033120] [<ffffffff817f7671>] end_repeat_nmi+0x1e/0x2e [ 2064.033124] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400 [ 2064.033125] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400 [ 2064.033127] [<ffffffff816700d6>] ? qi_submit_sync+0x196/0x400 [ 2064.033129] <<EOE>> [<ffffffff81670536>] qi_flush_dev_iotlb+0x86/0xd0 [ 2064.033130] [<ffffffff81672534>] iommu_flush_dev_iotlb+0xa4/0xd0 [ 2064.033132] [<ffffffff81672612>] iommu_flush_iotlb_psi+0xb2/0xe0 [ 2064.033133] [<ffffffff81674fce>] intel_iommu_unmap+0x1ce/0x1e0 [ 2064.033135] [<ffffffff81668340>] iommu_unmap+0xb0/0x190 [ 2064.033140] [<ffffffff81599553>] vfio_remove_dma+0xc3/0x1a0 [ 2064.033142] [<ffffffff817f36e2>] ? mutex_lock+0x12/0x2f [ 2064.033144] [<ffffffff81599c71>] vfio_iommu_type1_ioctl+0x3e1/0xa20 [ 2064.033163] [<ffffffffa0005486>] ? kvm_set_memory_region+0x36/0x40 [kvm] [ 2064.033169] [<ffffffffa0005902>] ? kvm_vm_ioctl+0x472/0x730 [kvm] [ 2064.033171] [<ffffffff81597769>] vfio_fops_unl_ioctl+0x79/0x2b0 [ 2064.033174] [<ffffffff811ef660>] do_vfs_ioctl+0x2e0/0x4a0 [ 2064.033175] [<ffffffff811ef8a1>] SyS_ioctl+0x81/0xa0 [ 2064.033177] [<ffffffff8110ec66>] ? __audit_syscall_exit+0x1f6/0x2a0 [ 2064.033179] [<ffffffff817f5369>] system_call_fastpath+0x16/0x1b
It looks like the watchdog kicked in while we were trying to unmap some pages from the iommu and then the mga driver exploded, as it's prone to do. What was going on with the VM at this point? The intel-iommu code has had some issues with stalling the kernel during large unmaps, but it shouldn't be happening for a small, 16G VM. You can try disabling the hardware watchdog to see if this is just that the unmap takes longer than the watchdog allows, we're only looking at ~50s from VM start to lockup.
In general though, I would not expect FirePro cards to work. AFAIK, we don't have any evidence that would suggest that they necessarily use the same set of quirks as Radeon cards.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Hi .
I always had this question in mind :
Does using PLX chips on motherboards to expand PCI-E lanes make it harder to passthorugh devices installed in such PCI-E slots ?
I used to have ASRock Z77 Extreme 11 which has PLX chips , whenever I tried to passthrough GPUs , the whole host hanged .
I might opt for an ASUS X99-E WS , which has PLX chips as well . Can anyone confirm whether it is possible or not ?
I assume you mean PLX branded PCIe switches. If they support ACS and aren't crap, then in theory they don't pose any issue for device assignment. If they don't support ACS, it pretty much guarantees that you'll always need the ACS override patch. In general, more components are just more opportunities for things to be broken.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Denso wrote:Hi .
I always had this question in mind :
Does using PLX chips on motherboards to expand PCI-E lanes make it harder to passthorugh devices installed in such PCI-E slots ?
I used to have ASRock Z77 Extreme 11 which has PLX chips , whenever I tried to passthrough GPUs , the whole host hanged .
I might opt for an ASUS X99-E WS , which has PLX chips as well . Can anyone confirm whether it is possible or not ?
I assume you mean PLX branded PCIe switches. If they support ACS and aren't crap, then in theory they don't pose any issue for device assignment. If they don't support ACS, it pretty much guarantees that you'll always need the ACS override patch. In general, more components are just more opportunities for things to be broken.
PLX advertises that most (all?) of their PEX chips support ACS . Maybe it was the motherboard that was causing issues , and not the PLX chip itself .
I've got a kernel panic just now :
[ 42.002633] irq 28: nobody cared (try booting with the "irqpoll" option)
[ 42.002633] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 3.16.3-1-ARCH #1
[ 42.002633] Hardware name: ASUS All Series/X99-DELUXE, BIOS 0904 09/22/2014
[ 42.002633] 0000000000000000 e24784535cef3a79 ffff88085fc03ad8 ffffffff8152b3bc
[ 42.002633] ffff8808323f9a00 ffff88085fc03b00 ffffffff810d02a2 ffff8808323f9a00
[ 42.002633] 0000000000000000 000000000000001c ffff88085fc03b38 ffffffff810d0657
[ 42.002633] Call Trace:
[ 42.002633] <IRQ> [<ffffffff8152b3bc>] dump_stack+0x4d/0x6f
[ 42.002633] [<ffffffff810d02a2>] __report_bad_irq+0x32/0xd0
[ 42.002633] [<ffffffff810d0657>] note_interrupt+0x257/0x2a0
[ 42.002633] [<ffffffff810cdbae>] handle_irq_event_percpu+0xae/0x1f0
[ 42.002633] [<ffffffff810cdd2d>] handle_irq_event+0x3d/0x60
[ 42.002633] [<ffffffff810d1281>] handle_fasteoi_irq+0x81/0x170
[ 42.002633] [<ffffffff8101717e>] handle_irq+0x1e/0x40
[ 42.002633] [<ffffffff81533bed>] do_IRQ+0x4d/0xe0
[ 42.002633] [<ffffffff81531bad>] common_interrupt+0x6d/0x6d
[ 42.002633] [<ffffffff810b6710>] ? __wake_up_bit+0x40/0x60
[ 42.002633] [<ffffffff8114be63>] unlock_page+0x23/0x30
[ 42.002633] [<ffffffff8114cf3e>] page_endio+0x1e/0x60
[ 42.002633] [<ffffffff811ff892>] mpage_end_io+0x42/0x60
[ 42.002633] [<ffffffff8126f4db>] bio_endio+0x6b/0xa0
[ 42.002633] [<ffffffff812766a4>] blk_update_request+0x94/0x380
[ 42.002633] [<ffffffff812769ae>] blk_update_bidi_request+0x1e/0xa0
[ 42.002633] [<ffffffff81276d01>] blk_end_bidi_request+0x21/0x60
[ 42.002633] [<ffffffff81276d50>] blk_end_request+0x10/0x20
[ 42.002633] [<ffffffffa003238d>] scsi_io_completion+0xad/0x710 [scsi_mod]
[ 42.002633] [<ffffffffa00293d2>] scsi_finish_command+0xa2/0xe0 [scsi_mod]
[ 42.002633] [<ffffffffa00321fe>] scsi_softirq_done+0x10e/0x130 [scsi_mod]
[ 42.002633] [<ffffffff8127d5eb>] blk_done_softirq+0x8b/0xb0
[ 42.002633] [<ffffffff810736c2>] __do_softirq+0xf2/0x2e0
[ 42.002633] [<ffffffff81073a06>] irq_exit+0x86/0xb0
[ 42.002633] [<ffffffff81533bf6>] do_IRQ+0x56/0xe0
[ 42.002633] [<ffffffff81531bad>] common_interrupt+0x6d/0x6d
[ 42.002633] <EOI> [<ffffffff813e88bc>] ? cpuidle_enter_state+0x4c/0xc0
[ 42.002633] [<ffffffff813e8a17>] cpuidle_enter+0x17/0x20
[ 42.002633] [<ffffffff810b710f>] cpu_startup_entry+0x32f/0x520
[ 42.002633] [<ffffffff815216c4>] rest_init+0x84/0x90
[ 42.002633] [<ffffffff818f5fc9>] start_kernel+0x45e/0x47f
[ 42.002633] [<ffffffff818f5120>] ? early_idt_handlers+0x120/0x120
[ 42.002633] [<ffffffff818f54d7>] x86_64_start_reservations+0x2a/0x2c
[ 42.002633] [<ffffffff818f5626>] x86_64_start_kernel+0x14d/0x170
[ 42.002633] handlers:
[ 42.002633] [<ffffffffa05c43e0>] vfio_intx_handler [vfio_pci]
[ 42.002633] Disabling IRQ #28
It happened when I started copying large files from my SMB share to my VM's virtio qcow2 disk file , which crashes the VM alone and not other VMs or the host .
Thank you .
Last edited by Denso (2014-10-04 15:05:53)
Offline
I've got a kernel panic just now :
[ 42.002633] irq 28: nobody cared (try booting with the "irqpoll" option) [ 42.002633] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 3.16.3-1-ARCH #1 [ 42.002633] Hardware name: ASUS All Series/X99-DELUXE, BIOS 0904 09/22/2014 [ 42.002633] 0000000000000000 e24784535cef3a79 ffff88085fc03ad8 ffffffff8152b3bc [ 42.002633] ffff8808323f9a00 ffff88085fc03b00 ffffffff810d02a2 ffff8808323f9a00 [ 42.002633] 0000000000000000 000000000000001c ffff88085fc03b38 ffffffff810d0657 [ 42.002633] Call Trace: [ 42.002633] <IRQ> [<ffffffff8152b3bc>] dump_stack+0x4d/0x6f [ 42.002633] [<ffffffff810d02a2>] __report_bad_irq+0x32/0xd0 [ 42.002633] [<ffffffff810d0657>] note_interrupt+0x257/0x2a0 [ 42.002633] [<ffffffff810cdbae>] handle_irq_event_percpu+0xae/0x1f0 [ 42.002633] [<ffffffff810cdd2d>] handle_irq_event+0x3d/0x60 [ 42.002633] [<ffffffff810d1281>] handle_fasteoi_irq+0x81/0x170 [ 42.002633] [<ffffffff8101717e>] handle_irq+0x1e/0x40 [ 42.002633] [<ffffffff81533bed>] do_IRQ+0x4d/0xe0 [ 42.002633] [<ffffffff81531bad>] common_interrupt+0x6d/0x6d [ 42.002633] [<ffffffff810b6710>] ? __wake_up_bit+0x40/0x60 [ 42.002633] [<ffffffff8114be63>] unlock_page+0x23/0x30 [ 42.002633] [<ffffffff8114cf3e>] page_endio+0x1e/0x60 [ 42.002633] [<ffffffff811ff892>] mpage_end_io+0x42/0x60 [ 42.002633] [<ffffffff8126f4db>] bio_endio+0x6b/0xa0 [ 42.002633] [<ffffffff812766a4>] blk_update_request+0x94/0x380 [ 42.002633] [<ffffffff812769ae>] blk_update_bidi_request+0x1e/0xa0 [ 42.002633] [<ffffffff81276d01>] blk_end_bidi_request+0x21/0x60 [ 42.002633] [<ffffffff81276d50>] blk_end_request+0x10/0x20 [ 42.002633] [<ffffffffa003238d>] scsi_io_completion+0xad/0x710 [scsi_mod] [ 42.002633] [<ffffffffa00293d2>] scsi_finish_command+0xa2/0xe0 [scsi_mod] [ 42.002633] [<ffffffffa00321fe>] scsi_softirq_done+0x10e/0x130 [scsi_mod] [ 42.002633] [<ffffffff8127d5eb>] blk_done_softirq+0x8b/0xb0 [ 42.002633] [<ffffffff810736c2>] __do_softirq+0xf2/0x2e0 [ 42.002633] [<ffffffff81073a06>] irq_exit+0x86/0xb0 [ 42.002633] [<ffffffff81533bf6>] do_IRQ+0x56/0xe0 [ 42.002633] [<ffffffff81531bad>] common_interrupt+0x6d/0x6d [ 42.002633] <EOI> [<ffffffff813e88bc>] ? cpuidle_enter_state+0x4c/0xc0 [ 42.002633] [<ffffffff813e8a17>] cpuidle_enter+0x17/0x20 [ 42.002633] [<ffffffff810b710f>] cpu_startup_entry+0x32f/0x520 [ 42.002633] [<ffffffff815216c4>] rest_init+0x84/0x90 [ 42.002633] [<ffffffff818f5fc9>] start_kernel+0x45e/0x47f [ 42.002633] [<ffffffff818f5120>] ? early_idt_handlers+0x120/0x120 [ 42.002633] [<ffffffff818f54d7>] x86_64_start_reservations+0x2a/0x2c [ 42.002633] [<ffffffff818f5626>] x86_64_start_kernel+0x14d/0x170 [ 42.002633] handlers: [ 42.002633] [<ffffffffa05c43e0>] vfio_intx_handler [vfio_pci] [ 42.002633] Disabling IRQ #28
It happened when I started copying large files from my SMB share to my VM's virtio qcow2 disk file , which crashes the VM alone and not other VMs or the host .
Thank you .
This is not a panic. This is the kernel reporting that IRQ28 is continuing to fire and none of the handlers for that IRQ (vfio) are claiming the interrupt. The kernel is therefore announcing that it is disabling that interrupt. Since vfio is the only handler and it thinks the device is masked (otherwise it would claim the interrupt), that might mean that INTx disable on the device doesn't actually work. You can force vfio to mask at the APIC rather than the device by adding the nointxmask=1 parameter to the vfio-pci host kernel module. This does however require that no other devices share the interrupt, which can make configuration difficult. If the device supports MSI, you can also try to make it use that in the guest to avoid the problem. See my post on the blog for more info and how to do that.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Audio bug still here after enabling MSI.
Offline
I'm getting the following error:
vfio: error, group 1 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
Here is the list of devices in that group:
/sys/bus/pci/devices/0000:02:00.0/iommu_group/devices$ dir
0000:00:01.0 0000:00:01.1 0000:01:00.0 0000:01:00.1 0000:02:00.0 0000:02:00.1
I would go ahead and list all of those devices in /etc/vfio-pci.cfg, but the problem with that is 01:00.0 and 01:00.1 are my primary display adapter. If I include that device in my vfio-pci.cfg file, it crashes the primary display when I try to bind it.
Is there a way to remove my primary display adapter from the iommu_group/devices list so that I don't get this error message? I've seen other people post this error message before, but never saw any instructions on how to change iommu_group device lists.
EDIT: After some more digging, it seems like I might need to patch something. I thought patches were only for my kernel if my primary display adapter was the integrated intel. I'll keep digging.
Last edited by dwightjl (2014-10-04 22:28:14)
Offline
I'm getting the following error:
vfio: error, group 1 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
Here is the list of devices in that group:
/sys/bus/pci/devices/0000:02:00.0/iommu_group/devices$ dir 0000:00:01.0 0000:00:01.1 0000:01:00.0 0000:01:00.1 0000:02:00.0 0000:02:00.1
I would go ahead and list all of those devices in /etc/vfio-pci.cfg, but the problem with that is 01:00.0 and 01:00.1 are my primary display adapter. If I include that device in my vfio-pci.cfg file, it crashes the primary display when I try to bind it.
Is there a way to remove my primary display adapter from the iommu_group/devices list so that I don't get this error message? I've seen other people post this error message before, but never saw any instructions on how to change iommu_group device lists.
EDIT: After some more digging, it seems like I might need to patch something. I thought patches were only for my kernel if my primary display adapter was the integrated intel. I'll keep digging.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
So you guys have helped me get set up and it's been runing ok, but there is still a major issue. My machine hangs when I reboot the vm. I wouldn't have a problem with just rebooting my pc when i reboot the vm, but the vm is crashing when rebooting to install windows updates which is completely corrupting the machine. I've had this happen multiple times, so I need to try to find a solution. I've tried to find a solution but I just haven't been able to find anything. I wish there was a way to do a search from this thread.
My setup
Running the patched 3.16 kernel from the first page.
lspci -nn
...
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF110 [GeForce GTX 580] [10de:1080] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GF110 High Definition Audio Controller [10de:0e09] (rev a1)
...
00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor DRAM Controller [8086:0c00] (rev 06)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06)
00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller [8086:0c05] (rev 06)
00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [8086:0412] (rev 06)
00:03.0 Audio device [0403]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [8086:0c0c] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB xHCI Controller [8086:8cb1]
00:16.0 Communication controller [0780]: Intel Corporation 9 Series Chipset Family ME Interface #1 [8086:8cba]
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I218-V [8086:15a1]
00:1a.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 [8086:8cad]
00:1c.0 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 [8086:8c90] (rev d0)
00:1c.2 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 3 [8086:8c94] (rev d0)
00:1c.3 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 [8086:8c96] (rev d0)
00:1c.6 PCI bridge [0604]: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 [8086:8c9c] (rev d0)
00:1d.0 USB controller [0c03]: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 [8086:8ca6]
00:1f.0 ISA bridge [0601]: Intel Corporation 9 Series Chipset Family Z97 LPC Controller [8086:8cc4]
00:1f.2 SATA controller [0106]: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] [8086:8c82]
00:1f.3 SMBus [0c05]: Intel Corporation 9 Series Chipset Family SMBus Controller [8086:8ca2]
...
02:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller [1912:0014] (rev 03)
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 11)
05:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1184]
06:01.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1184]
06:03.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1184]
06:05.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1184]
06:07.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1184]
07:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 04)
08:04.0 Multimedia audio controller [0401]: C-Media Electronics Inc CMI8788 [Oxygen HD Audio] [13f6:8788]
09:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
0a:00.0 Network controller [0280]: Qualcomm Atheros AR93xx Wireless Network Adapter [168c:0030] (rev 01)
0b:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
0c:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:1142]
from syslinux.cfg
LABEL arch-qemu
MENU LABEL Arch Qemu
LINUX ../vmlinuz-linux-mainline
APPEND root=/dev/sdb1 rootflags=subvol=__active/rootvol intel_iommu=on i915.enable_hd_vgaarb=1 pcie_acs_override=downstream pci-stub.ids=10de:1080,10de:8086:0c0c modeset.nouveau=0 modeset.radeon=0 rw
INITRD ../initramfs-linux-mainline.img
script for booting the vm:
sudo vfio-bind 0000:01:00.0 0000:01:00.1 0000:00:03.0
#qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
#-smp 6,sockets=1,cores=2,threads=1 \
#-bios /usr/share/qemu/bios.bin -vga none \
#-device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
##-device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \
#-device vfio-pci,host=01:00.1,bus=root.1,addr=00.1
qemu-system-x86_64 -enable-kvm -M q35 -m 8196 -mem-path /dev/hugepages -cpu host,kvm=off \
-smp 6,sockets=1,cores=3,threads=2 \
-boot menu=on \
-bios /usr/share/qemu/bios.bin -vga none\
-device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
-device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on -nographic \
-device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 \
-drive file=/home/anonymous/Downloads/Win.iso,id=isocd -device ide-cd,bus=ide.1,drive=isocd \
-drive file=/dev/sda,id=disk,format=raw -device ide-hd,bus=ide.0,drive=disk \
-usb -usbdevice host:046d:c52e -usbdevice host:041e:30e0 -usbdevice host:045e:0719 \
# -device ich9-intel-hda,bus=pcie.0,addr=1b.0,id=sound0 \
# -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0
# -device vfio-pci,host=07:00.0,bus=pcie.0 \
# -device vfio-pci,host=08:04.0,bus=pcie.0
# -device vfio-pci,host=0c:00.0,bus=pcie.0
# -drive file=/dev/sda,id=disk,format=raw -device ide-hd,bus=ahci.0,drive=disk \
# -boot menu=on \
# -rtc base=localtime \
fwiw dunno if this is relevant, but when i run the vm using huge pages appears to fail
qemu-system-x86_64: unable to map backing store for hugepages: Cannot allocate memory
but it was having that issue before i started trying to use the huge pages.
Bus 002 Device 002: ID 8087:8001 Intel Corp.
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 008 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 007 Device 003: ID 041e:30e0 Creative Technology, Ltd
Bus 007 Device 002: ID 0409:005a NEC Corp. HighSpeed Hub
Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:8009 Intel Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 002: ID 174c:3074 ASMedia Technology Inc.
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 007: ID 046d:c52b Logitech, Inc. Unifying Receiver
Bus 003 Device 006: ID 9886:0001
Bus 003 Device 005: ID 24f0:0137
Bus 003 Device 004: ID 174c:2074 ASMedia Technology Inc.
Bus 003 Device 003: ID 046d:c52e Logitech, Inc. MK260 Wireless Combo Receiver
Bus 003 Device 002: ID 045e:0719 Microsoft Corp. Xbox 360 Wireless Adapter
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
edit:
by the way my cpu is an intel 4790k and the mb is asrock z97 extreme6
Last edited by risho (2014-10-05 00:00:56)
Offline
Have you installed a guest, installed a VNC server, installed Catalyst drivers, and have the VM working so that you can connect to that VNC server from the host?
I did this but I can't connect to the guest VNC server from the host. I did some googling and found that the host and guest are not visible to each other on the default network.
There appear to be a lot of ways to get around this problem, none of which I understand , what method do you suggest I look into?
Offline
aw wrote:Have you installed a guest, installed a VNC server, installed Catalyst drivers, and have the VM working so that you can connect to that VNC server from the host?
I did this but I can't connect to the guest VNC server from the host. I did some googling and found that the host and guest are not visible to each other on the default network.
There appear to be a lot of ways to get around this problem, none of which I understand , what method do you suggest I look into?
Use a bridge
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
after going back through journalctl this is the last bit of information before the crash if this helps any
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:0c:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8807c0e4be88
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:0c:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8807c0e4be40
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:0c:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8807bfe96ea0
Oct 04 18:42:11 arch kernel: usb 7-2: ep 0x83 - rounding interval to 64 microframes, ep desc says 80 microframes
Oct 04 18:42:11 arch kernel: usb 3-1: reset full-speed USB device number 2 using xhci_hcd
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 1.
Oct 04 18:42:11 arch kernel: usb 3-1: hub failed to enable device, error -22
Oct 04 18:42:11 arch kernel: usb 3-1: reset full-speed USB device number 2 using xhci_hcd
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 1.
Oct 04 18:42:11 arch kernel: usb 3-1: hub failed to enable device, error -22
Oct 04 18:42:11 arch kernel: usb 3-1: reset full-speed USB device number 2 using xhci_hcd
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88080235d048
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88080235d000
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88080235d1c8
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88080235d180
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88080235de88
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88080235de40
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa2048
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa2000
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa2108
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa20c0
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa21c8
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa2180
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa2288
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa2240
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa2348
Oct 04 18:42:11 arch kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880801fa2300
Oct 04 18:42:32 arch kernel: BUG: unable to handle kernel paging request at 00007fff00000000
Oct 04 18:42:32 arch kernel: IP: [<00007fff00000000>] 0x7fff00000000
Oct 04 18:42:32 arch kernel: PGD 75ed0d067 PUD 0
Oct 04 18:42:32 arch kernel: Oops: 0010 [#1] PREEMPT SMP
Oct 04 18:42:32 arch kernel: Modules linked in: bridge stp llc md4 md5 hmac nls_utf8 cifs dns_resolver fscache ctr ccm hid_logitech_dj uas snd_usb_audio usb_storage snd_usbmidi_lib snd_hda_codec_hdmi vfio_pci vfio_iommu_type1 vfio nct6775 hwmon_vid ext4 crc16 mbcache jbd
Oct 04 18:42:32 arch kernel: i2c_designware_platform tpm e1000e i2c_designware_core i2c_core battery dw_dmac dw_dmac_core snd_timer 8250_dw gpio_lynxpoint mei_me video spi_pxa2xx_platform snd mei ptp shpchp soundcore pps_core wmi processor acpi_pad button btrfs xor raid
Oct 04 18:42:32 arch kernel: CPU: 1 PID: 1110 Comm: qemu-system-x86 Not tainted 3.16.0-1-mainline #1
Oct 04 18:42:32 arch kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme6, BIOS P1.30 05/22/2014
Oct 04 18:42:32 arch kernel: task: ffff8807bfd54750 ti: ffff8800a90f0000 task.ti: ffff8800a90f0000
Oct 04 18:42:32 arch kernel: RIP: 0010:[<00007fff00000000>] [<00007fff00000000>] 0x7fff00000000
Oct 04 18:42:32 arch kernel: RSP: 0018:ffff8800a90f3d98 EFLAGS: 00010006
Oct 04 18:42:32 arch kernel: RAX: ffff8807bf3c3e38 RBX: 000000008101478d RCX: 0000000000000000
Oct 04 18:42:32 arch kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8807bf3c3e38
Oct 04 18:42:32 arch kernel: RBP: ffff8800a90f3dd8 R08: 0000000000000000 R09: 0000000000000000
Oct 04 18:42:32 arch kernel: R10: 00000000000103c0 R11: 0000000000000293 R12: ffffffff8189eb08
Oct 04 18:42:32 arch kernel: R13: 00000000004a0418 R14: 0000000000000000 R15: 0000000000000003
Oct 04 18:42:32 arch kernel: FS: 00007f1923fff700(0000) GS:ffff88082fa40000(0000) knlGS:0000000000000000
Oct 04 18:42:32 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 04 18:42:32 arch kernel: CR2: 00007fff00000000 CR3: 0000000780c26000 CR4: 00000000001427e0
Oct 04 18:42:32 arch kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 04 18:42:32 arch kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct 04 18:42:32 arch kernel: Stack:
Oct 04 18:42:32 arch kernel: ffffffff810b6425 00000000bfd54750 0000000000000000 ffffffff8189eb00
Oct 04 18:42:32 arch kernel: 0000000000000046 0000000000000003 0000000000000000 0000000000000000
Oct 04 18:42:32 arch kernel: ffff8800a90f3e10 ffffffff810b66a9 ffff880803ab0000 0000000000000000
Oct 04 18:42:32 arch kernel: Call Trace:
Oct 04 18:42:32 arch kernel: [<ffffffff810b6425>] ? __wake_up_common+0x55/0x90
Oct 04 18:42:32 arch kernel: [<ffffffff810b66a9>] __wake_up+0x39/0x50
Oct 04 18:42:32 arch kernel: [<ffffffff8139aed8>] __vga_put+0x98/0x150
Oct 04 18:42:32 arch kernel: [<ffffffff8139b56f>] vga_put+0x5f/0x90
Oct 04 18:42:32 arch kernel: [<ffffffffa010d104>] vfio_pci_vga_rw+0x1f4/0x240 [vfio_pci]
Oct 04 18:42:32 arch kernel: [<ffffffffa010a4e5>] vfio_pci_rw+0x35/0x80 [vfio_pci]
Oct 04 18:42:32 arch kernel: [<ffffffffa010b2cf>] vfio_pci_write+0x1f/0x30 [vfio_pci]
Oct 04 18:42:32 arch kernel: [<ffffffffa00492eb>] vfio_device_fops_write+0x2b/0x30 [vfio]
Oct 04 18:42:32 arch kernel: [<ffffffff811c1b87>] vfs_write+0xb7/0x200
Oct 04 18:42:32 arch kernel: [<ffffffff811c29ca>] SyS_pwrite64+0x9a/0xc0
Oct 04 18:42:32 arch kernel: [<ffffffff811d48bb>] ? SyS_ioctl+0x6b/0xa0
Oct 04 18:42:32 arch kernel: [<ffffffff81530869>] system_call_fastpath+0x16/0x1b
Oct 04 18:42:32 arch kernel: Code: Bad RIP value.
Oct 04 18:42:32 arch kernel: RIP [<00007fff00000000>] 0x7fff00000000
and it happens either right as or a few moments after shutting down a vm.
Offline