You are not logged in.
Slabity wrote:Wow, that works perfectly with the test script! Thank you so much!
Does radeon still work ok for the host?
Anyone using radeon on the host (APU users) may want to try this too.
Many thanks Alex. This also works for me with 2 discrete Radeon cards (no APU). With the shell scripts I'm able to switch my boot VGA in BIOS to any card and VGA passthrough is always working. The configuration is the following:
+-02.0-[01]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition]
| \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
+-0b.0-[04]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
| \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Device aac8
I will give the patch mentioned here http://pastebin.com/2ukyQbkG a try in the next hours. I'll hope it will solve the issue permantly for me. Many thanks again!
Offline
Hey,
I switched my hardware to accomplish this, kernel compiled etc. everything went good.
Problem now is I can't get Qemu to start..
sudo qemu-system-x86_64 -enable-kvm -M q35 -m 1024 -cpu host -smp 6,sockets=1,cores=6,threads=1 -bios /usr/share/qemu/bios.bin -vga none \
-device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
-device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \
-device vfio-pci,host=01:00.1,bus=root.1,addr=00.1
qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error no iommu_group for device
qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed.
qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized
I can't anything on the net by searching for "vfio: error no iommu_group for device".. I have no idea what I should do..
BTW I get the same error with both GPUs!
System:
- CPU: Intel Xeon E3-1245v3 (Host uses iGPU)
- MB: MSI Z97 Gaming 7 (Supports VT-d)
- GPU1: Nvidia GeForce GTX 770 (for Gaming)
- GPU2: AMD Radeon HD5450 (for virtual HTPC)
- OS: Arch Linux (Kernel 3.14.4-3-mainline with all the patches in the package from the first post, qemu-git (2.1) and seabios-git (1.7.5))
I know that the GTX 770 works, since I saw some guys that got it to work.
I'm not sure but I don't think that it's because of two GPUs..
/sys/bus/pci/devices/0000\:01\:00.0/ does NOT contain any iommu_group devices. FYI
Last edited by shawly (2014-05-24 15:29:37)
Offline
Hey,
I switched my hardware to accomplish this, kernel compiled etc. everything went good.
Problem now is I can't get Qemu to start..sudo qemu-system-x86_64 -enable-kvm -M q35 -m 1024 -cpu host -smp 6,sockets=1,cores=6,threads=1 -bios /usr/share/qemu/bios.bin -vga none \ -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \ -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \ -device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error no iommu_group for device qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed. qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized
I can't anything on the net by searching for "vfio: error no iommu_group for device".. I have no idea what I should do..
BTW I get the same error with both GPUs!System:
- CPU: Intel Xeon E3-1245v3 (Host uses iGPU)
- MB: MSI Z97 Gaming 7 (Supports VT-d)
- GPU1: Nvidia GeForce GTX 770 (for Gaming)
- GPU2: AMD Radeon HD5450 (for virtual HTPC)
- OS: Arch Linux (Kernel 3.14.4-3-mainline with all the patches in the package from the first post, qemu-git (2.1) and seabios-git (1.7.5))I know that the GTX 770 works, since I saw some guys that got it to work.
I'm not sure but I don't think that it's because of two GPUs../sys/bus/pci/devices/0000\:01\:00.0/ does NOT contain any iommu_group devices. FYI
check kernel parameters or bios (seems to me like iommu might be disabled in your case) - see first post for details
Offline
check kernel parameters or bios (seems to me like iommu might be disabled in your case) - see first post for details
Oh my god, I just needed to set intel_iommu=on to the kernel parameters.. I feel really stupid now..
Thanks <3
Offline
aw wrote:Slabity wrote:Wow, that works perfectly with the test script! Thank you so much!
Does radeon still work ok for the host?
Anyone using radeon on the host (APU users) may want to try this too.
Many thanks Alex. This also works for me with 2 discrete Radeon cards (no APU). With the shell scripts I'm able to switch my boot VGA in BIOS to any card and VGA passthrough is always working. The configuration is the following:
+-02.0-[01]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition] | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] +-0b.0-[04]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Device aac8
I will give the patch mentioned here http://pastebin.com/2ukyQbkG a try in the next hours. I'll hope it will solve the issue permantly for me. Many thanks again!
That patch isn't effective, don't bother with it. This one seems to work for me, but it's more heavyweight than I'd like. The problem sequence is that Xorg tries to lock VGA resources which triggers some first access callbacks. The callback in the radeon code tells the arbiter that the device no longer decodes VGA resources. Now, the arbiter still let's a process lock resources that the device doesn't decode, but it loses track of them when they're released. So vfio comes along and tries to lock VGA resources on the other card and the arbiter skips disabling VGA on the radeon bridge because the device doesn't decode VGA. The patch below gratuitously disables VGA on any potentially conflicting bridge on every vga_get, which makes VGA access even slower (not that we care so much since VGA stops being used shortly after the guest OS starts running). I'll probably rework the code more for upstream to allow devices to own resources they don't decode rather than just lock them. Enjoy.
--- a/drivers/gpu/vga/vgaarb.c
+++ b/drivers/gpu/vga/vgaarb.c
@@ -244,8 +244,12 @@ static struct vga_device *__vga_tryget(struct vga_device *v
*/
WARN_ON(conflict->owns & ~conflict->decodes);
match = lwants & conflict->owns;
- if (!match)
+ if (!match) {
+ if (change_bridge)
+ pci_set_vga_state(conflict->pdev, false, 0,
+ PCI_VGA_STATE_CHANGE_BRIDGE);
continue;
+ }
/* looks like he doesn't have a lock, we can steal
* them from him
dropbox link since the forum will break formatting - https://dl.dropboxusercontent.com/u/198 … able.patch
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
i know that it is kinda wrong to bring my problems to arch linux boards, but i am pretty desperate at that point )
I'm not running arch either, don't feel bad
MB: asrock z87 pro4
CPU: intel i7 4770
OS: fedora 20host vga: IGD (details below)
kernel (with i915 module being recompiled with i915_314.patch from linux-mainline.tar.gz in first post):
[s@localhost ~]$ uname -r 3.14.4-200.fc20.x86_64 [s@localhost ~]$ modinfo i915 | head -n1 filename: /lib/modules/3.14.4-200.fc20.x86_64/updates/i915.ko [s@localhost ~]$
This doesn't prove anything, in fact it just looks like you're running the f20 distro kernel and not one recompiled to include the i915 patch. Are you trying to rebuild the kernel rpm?
couldnt have any success at all for 5 days while trying different variations, completely out of ideas for now on what may be wrong - too much outdated info out there
target guest os is win7
when trying to passthrough HD 6450:
- [test command from first post] host screen usually gets screwed (looking like pallete was overwritten randomly, i doubt X is running in indexed mode though) - fixable by switching to tty console and then back to X
- [virt-manager/kvm] Code 10 (device cannot be started) in device manager, latest catalyst drivers setup terminates while trying to detect hardware
This also sounds like the host kernel doesn't actually include the i915 patch. I'd start out with making sure you're really running a kernel patched the way you think it is.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
This doesn't prove anything, in fact it just looks like you're running the f20 distro kernel and not one recompiled to include the i915 patch. Are you trying to rebuild the kernel rpm?
well, all i wanted to "prove" - is that module currently loaded is the one rebuild by me and manually placed in module updates directory
correct me if i'm wrong, but vga arbiter patch only involves i915 module, not the kernel itself.
This also sounds like the host kernel doesn't actually include the i915 patch. I'd start out with making sure you're really running a kernel patched the way you think it is.
well, i taken the i915_314.patch from linux-mainline.tar.gz in first post - it differs from vga arbiter patch v3 that may be found in internet. but i tried the v3 one as well with no success
Offline
I had a similar problem. Change this line below
-device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 \
TO THIS:
-device vfio-pci,host=01:00.1,bus=pcie.0 \
Actually this solved one of my more bizarre issues where my new R290X would stop working whenever I physically removed my old, not passed through 4850. I always got a BSoD from atikmdag.sys “PAGE_FAULT_IN_NONPAGED_AREA”. Changing this solved this issue.
Now the only issue which remains is that my card is not being reset on VM reboots which causes the same BSoD. Sending the host to ACPI S3 (suspend to RAM, results in a D3Cold for the card thus reseting it — if I got this right) does the trick but of course I have to go to standby and back every time I want to (re-)start the VM.
Powering off the slot is unfortunately not an option:
root@myhost ~ # ls /sys/bus/pci/slots
total 0
Let’s see what my card supports:
root@myhost ~ # lspci -vvv -s 01:00
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
Subsystem: PC Partner Limited / Sapphire Technology Device e285
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 11
Region 0: Memory at e0000000 (64-bit, prefetchable) [disabled] [size=256M]
Region 2: Memory at f0000000 (64-bit, prefetchable) [disabled] [size=8M]
Region 4: I/O ports at e000 [disabled] [size=256]
Region 5: Memory at f0800000 (32-bit, non-prefetchable) [disabled] [size=256K]
Expansion ROM at f0840000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [270 v1] #19
Capabilities: [2b0 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [2c0 v1] #13
Capabilities: [2d0 v1] #1b
Kernel driver in use: vfio-pci
Kernel modules: radeon
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8
Subsystem: PC Partner Limited / Sapphire Technology Device aac8
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 10
Region 0: Memory at f0860000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
As you can see, Function Level Reset is not supported (FLReset-). I didn’t check this on my old card (which worked perfectly fine including reboots of the virtual machine).
D3Hot should be supported though but I haven’t found a way yet to convince the kernel to put it in this state but I guess it wouldn’t help anyway since NoSoftRst+.
Setting power control to auto, unbinding and rebinding didn’t help either. Even if I removed the device and rescanned … Maybe I wasn’t patient enough
root@myhost ~ # echo auto > /sys/bus/pci/devices/0000:00:01.0/power/control
root@myhost ~ # echo '0000:01:00.0' > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
root@myhost ~ # echo '0000:01:00.1' > /sys/bus/pci/devices/0000:01:00.1/driver/unbind
root@myhost ~ # echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
root@myhost ~ # echo 1 > /sys/bus/pci/devices/0000:01:00.1/remove
root@myhost ~ # echo 1 > /sys/bus/pci/rescan
root@myhost ~ # echo '0000:01:00.0' > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
root@myhost ~ # echo '0000:01:00.1' > /sys/bus/pci/devices/0000:01:00.1/driver/unbind
root@myhost ~ # echo '0000:01:00.1' > /sys/bus/pci/drivers/vfio-pci/bind
root@myhost ~ # echo '0000:01:00.0' > /sys/bus/pci/drivers/vfio-pci/bind
See the documentation on sysfs-bus-pci for explanation.
I also explored other options like setpci and AMD ZeroCore but did not find anything useful I could leverage so far.
For today I’m giving up; at least I was able to solve one of my issues and, quite frankly, it was the more important one. The fans of the old card were either dead or close to it resulting in nice “atmospheric sound” and the card itself did not really improve the thermal landscape. Keeping you posted.
For further reference:
Motherboard: ASRock Z87 Extreme6
Processor: Intel Core i7-4771
Host graphics card: Intel HD Graphics 4600
Passthrough device: Sapphire Radeon R9 290X Tri-X OC
OS: Arch Linux x86_64
Kernel: linux-mainline (3.14.1 includes acs override patch, i935 vga arbiter fixes, debug registers patch — custom config)
qemu: default (2.0.0-3)
seabios: default (1.7.3.1-2)
Status: Working
Constraints: Bus reset and CCC not working, loading VBIOS.
Last edited by blacky (2014-05-24 18:19:42)
Offline
aw wrote:This doesn't prove anything, in fact it just looks like you're running the f20 distro kernel and not one recompiled to include the i915 patch. Are you trying to rebuild the kernel rpm?
well, all i wanted to "prove" - is that module currently loaded is the one rebuild by me and manually placed in module updates directory
correct me if i'm wrong, but vga arbiter patch only involves i915 module, not the kernel itself.
But it doesn't do that either. I think all it shows is that modinfo thinks that's the module that would load, but that's after you've already booted, there may be a different i915.ko in your initramfs.
aw wrote:This also sounds like the host kernel doesn't actually include the i915 patch. I'd start out with making sure you're really running a kernel patched the way you think it is.
well, i taken the i915_314.patch from linux-mainline.tar.gz in first post - it differs from vga arbiter patch v3 that may be found in internet. but i tried the v3 one as well with no success
I've tried to get something upstream again recently, so this is the latest patch - https://lkml.org/lkml/2014/5/9/517
The maintainer won't have it because he thinks that adding a module option is admitting defeat and will prevent anyone from solving the problem (while at the same time not coming up with any workable solution himself). The nice thing about this patch in your situation is that we can tell if you're running the right module by whether /sys/modules/i915/parameters/enable_hd_vgaarb is present (the one in the currently running module, not the one on disk somewhere). You will need to set this with a module option in your /etc/modprobe.d/ (and rebuild the initramfs) to make it work.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
sinny wrote:aw wrote:This doesn't prove anything, in fact it just looks like you're running the f20 distro kernel and not one recompiled to include the i915 patch. Are you trying to rebuild the kernel rpm?
well, all i wanted to "prove" - is that module currently loaded is the one rebuild by me and manually placed in module updates directory
correct me if i'm wrong, but vga arbiter patch only involves i915 module, not the kernel itself.But it doesn't do that either. I think all it shows is that modinfo thinks that's the module that would load, but that's after you've already booted, there may be a different i915.ko in your initramfs.
aw wrote:This also sounds like the host kernel doesn't actually include the i915 patch. I'd start out with making sure you're really running a kernel patched the way you think it is.
well, i taken the i915_314.patch from linux-mainline.tar.gz in first post - it differs from vga arbiter patch v3 that may be found in internet. but i tried the v3 one as well with no success
I've tried to get something upstream again recently, so this is the latest patch - https://lkml.org/lkml/2014/5/9/517
The maintainer won't have it because he thinks that adding a module option is admitting defeat and will prevent anyone from solving the problem (while at the same time not coming up with any workable solution himself). The nice thing about this patch in your situation is that we can tell if you're running the right module by whether /sys/modules/i915/parameters/enable_hd_vgaarb is present (the one in the currently running module, not the one on disk somewhere). You will need to set this with a module option in your /etc/modprobe.d/ (and rebuild the initramfs) to make it work.
ok, you were right on the initramfs point - i completely forgot about it
tried applying your latest patch with no luck to my stock kernel sources - seems like entire drm code branch received quite a bit of work.
after that tried rebuilding i915 module with i915_314.patch from linux-mainline.tar.gz in first post, but now rebuilding initramfs as well - got corrupted graphics on boot as a result. could not even see tty consoles. the good point is that now i can be sure that rebuilt module was loaded )
will try to figure out next course of actions to take
for now thank you very much for pointing out my lack of understanding of what am i doing at all
Last edited by sinny (2014-05-24 19:46:58)
Offline
Google for "ACS override", apply the patch you find and enable it via kernel options. If one of the cards can be attached to a PCH root port (00:1c.*), the v3.15 kernel may help.
I just installed the new 3.15-rc6 kernel which in my understanding contains the acs override patch.
Nope, 3.15 includes quirks approved by Intel to advertise the ACS-like isolation capabilities of PCH-based root ports and make sure they're enabled. This means that the hardware is actually providing the isolation required. The ACS override patch is a way for the user to override the hardware isolation requirement and is not going upstream.
With 3.14 and the ACS patch I got 3 vfio groups, one for every card and I am able to pass them to any vm in any combination I want while running them parallel.(I didnt know I had to enable acs in the grub cmdline)
Just for others, my setup is an i7 4770 with an asrock z87 extreme 6 and 3x 290x running arch with 3.14-1 mainline provided by OP. Looks like I got 4 PCH root ports. Everything is now running fine and smooth.
Thank you very much !
Offline
Hello,
Thanks again for the help so far. I patched in 'i915_314.patch' and recompiled my kernel. I used 3.15-rc6 since the page 1 linux-mainline link appears to direct to 3.15-rc6 anyway (though other posts seem to refute what I'm seeing). This cleared up the graphical issues on my host. Now everything appears to be mostly working. I haven't encountered any issues with my Debian test guest (so far).
There are Windows issues though. I am able to successfully install Window 7 SP1 but hit a snag as soon as I attempt to update said guest. I've tried a few different update strategies and noted different behaviors. When I tried updating all 150 or so updates at once everything appears fine until Windows attempts to reboot. Post-reboot I am greeted with the Windows repair utilities which are unable to fix the now-broken install. This problem seems to stem from cumulative security updates and other major system updates. Some updates are okay alone (for example, Windows Explorer 11).
When I try applying just the Radeon driver for my 4770 (via Windows Update), my host system freezes. After restarting my host I am able to reboot the Windows guest *once* with the device software having installed "successfully". Subsequent Windows guest restarts lead to BSODs, however.
Does this sound like a previously discussed issue that I've missed? Maybe an issue related to another .patch from the page 1 download? Or might I have introduced odd behavior by applying the i915_3.14 patch to a kernel higher than 3.14.1?
I found an issue that was possibly related on page 62/66:
"EDIT: Of course, starting over with a fresh Win7 x64 install using ide instead of virtio, it works fine up until I try to install Windows updates, even only a few of them, at which point after reboot it dies with a different BSOD, code c000021a. -sigh-"
Except I do not get a blue screen in all my tests. Fruitless restarts with the repair utilities are more common. I am using ide-hd for my disk though.
I'm not sure what logs might provide useful information related to this issue. I noted some logs in /var/log/libvirt/qemu which seem to be restricted to VMs created with virt-manager. /var/log/libvirt/libvirtd.log doesn't appear to have related information either. Please let me know if anybody has any suggestions. I can't thank the contributors to this thread enough!
shelladept
Offline
aw wrote:sinny wrote:well, all i wanted to "prove" - is that module currently loaded is the one rebuild by me and manually placed in module updates directory
correct me if i'm wrong, but vga arbiter patch only involves i915 module, not the kernel itself.But it doesn't do that either. I think all it shows is that modinfo thinks that's the module that would load, but that's after you've already booted, there may be a different i915.ko in your initramfs.
well, i taken the i915_314.patch from linux-mainline.tar.gz in first post - it differs from vga arbiter patch v3 that may be found in internet. but i tried the v3 one as well with no success
I've tried to get something upstream again recently, so this is the latest patch - https://lkml.org/lkml/2014/5/9/517
The maintainer won't have it because he thinks that adding a module option is admitting defeat and will prevent anyone from solving the problem (while at the same time not coming up with any workable solution himself). The nice thing about this patch in your situation is that we can tell if you're running the right module by whether /sys/modules/i915/parameters/enable_hd_vgaarb is present (the one in the currently running module, not the one on disk somewhere). You will need to set this with a module option in your /etc/modprobe.d/ (and rebuild the initramfs) to make it work.
ok, you were right on the initramfs point - i completely forgot about it
tried applying your latest patch with no luck to my stock kernel sources - seems like entire drm code branch received quite a bit of work.
after that tried rebuilding i915 module with i915_314.patch from linux-mainline.tar.gz in first post, but now rebuilding initramfs as well - got corrupted graphics on boot as a result. could not even see tty consoles. the good point is that now i can be sure that rebuilt module was loaded )will try to figure out next course of actions to take
for now thank you very much for pointing out my lack of understanding of what am i doing at all
switched to fedora mainline kernel (3.15-rc5), applied your latest vga arbiter patch to i915 module, rebuilt i915 module - resulted in the same behavior i got in my previous attempt (3.14 with i915_314.patch). screen gets corrupted when X login manager starts ( i use slim, but dont think it actually matters). switching to ttys resets screen to black and stays this way whatever i try to do.
what can i do to further investigate this? maybe some X settings are needed as well?
Last edited by sinny (2014-05-25 03:19:52)
Offline
Did you set the enable_hd_vgaarb i915 option?
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
AKSN74 wrote:aw wrote:Upgrade to 3.15 and you won't need the ACS override patch. That doesn't mean that the Marvell card is going to work, because it's terribly broken (remember those DMA patches you've applied?)
Hi, aw.
I found that not only VM hangs when reboot with 2 VMs working together, but also only 1 VM working.
And I tried to shutdown, not reboot, it can back to tty successfully.
But when I try to start again with same command, it got another error message.qemu-system-x86_64: vfio-pci: Cannot read device rom at 0000:03:00.0 Device option ROM contents are probably invalid (check dmesg). Skip option ROM probe with rombar=0, or load from file with romfile=
It's may a reason why it hangs when reboot.
But I found that when I reboot VM while guest OS is Windows 7, it can reboot successfully some times.
So it is very strange for this problem.......I'll try to use 3.14.4 kernel and see it still a same problem or not.
Same problem here. I cannot reboot or reissue the qemu command without this error. Do I need to undo the vfio-bind stuffs?
Fixed. I needed to download the rom file from http://www.techpowerup.com/vgabios/ . Then add romfile=/path/to/romfile.rom to the -device line for the GPU.
Also changed the -cpu from type "host" to "qemu64"
Offline
Did you set the enable_hd_vgaarb i915 option?
previously - no
just tried setting in with no change in behavior - corrupted display on X start. on the other hand, just to see whether its just gpu die or the whole system - successfully ssh'ed to system.
journal is full of lines like that:
...
May 25 08:19:35 localhost.localdomain kernel: dmar: DRHD: handling fault status reg 2
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 43 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DRHD: handling fault status reg 3
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 44 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DRHD: handling fault status reg 3
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 44 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DRHD: handling fault status reg 2
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 44 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr 7fcd0000
DMAR:[fault reason 06] PTE Read access is not set
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 59 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr 7fce4000
DMAR:[fault reason 06] PTE Read access is not set
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 44 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr 7fcf4000
DMAR:[fault reason 06] PTE Read access is not set
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 45 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr 7fd03000
DMAR:[fault reason 06] PTE Read access is not set
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 43 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DRHD: handling fault status reg 2
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 45 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DRHD: handling fault status reg 3
May 25 08:19:35 localhost.localdomain systemd-journal[477]: Missed 46 kernel messages
May 25 08:19:35 localhost.localdomain kernel: dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr 7fd32000
DMAR:[fault reason 06] PTE Read access is not set
...
seems like on screen corruption it just keeps spamming these. i doubt they are meaningful though, useful errors were probably at the very beginning
Offline
So vfio comes along and tries to lock VGA resources on the other card and the arbiter skips disabling VGA on the radeon bridge because the device doesn't decode VGA. The patch below gratuitously disables VGA on any potentially conflicting bridge on every vga_get, which makes VGA access even slower (not that we care so much since VGA stops being used shortly after the guest OS starts running). I'll probably rework the code more for upstream to allow devices to own resources they don't decode rather than just lock them. Enjoy.
This seems like something a headless host (like mine) would benefit from, do you plan to push this/similar behaviour to official kernel with kernel option to switch it on? Ideally I would like to be able to use two GPUs each assigned to one VMs (I have enough resources to run 2 VMs concurrently, just not enough slots for more GPUs) while the host runs headless.
Thanks for your work on passthrough and please let us known when this gets into official RH release (I will probably use Fedora until then)
Offline
As you can see, Function Level Reset is not supported (FLReset-). I didn’t check this on my old card (which worked perfectly fine including reboots of the virtual machine).
D3Hot should be supported though but I haven’t found a way yet to convince the kernel to put it in this state but I guess it wouldn’t help anyway since NoSoftRst+.
It looks like no AMD Radeon R9 290X supports any kind of reset anymore. You are not alone as I have the same problem.
Offline
That patch isn't effective, don't bother with it. This one seems to work for me, but it's more heavyweight than I'd like. The problem sequence is that Xorg tries to lock VGA resources which triggers some first access callbacks. The callback in the radeon code tells the arbiter that the device no longer decodes VGA resources. Now, the arbiter still let's a process lock resources that the device doesn't decode, but it loses track of them when they're released. So vfio comes along and tries to lock VGA resources on the other card and the arbiter skips disabling VGA on the radeon bridge because the device doesn't decode VGA. The patch below gratuitously disables VGA on any potentially conflicting bridge on every vga_get, which makes VGA access even slower (not that we care so much since VGA stops being used shortly after the guest OS starts running). I'll probably rework the code more for upstream to allow devices to own resources they don't decode rather than just lock them. Enjoy.
--- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -244,8 +244,12 @@ static struct vga_device *__vga_tryget(struct vga_device *v */ WARN_ON(conflict->owns & ~conflict->decodes); match = lwants & conflict->owns; - if (!match) + if (!match) { + if (change_bridge) + pci_set_vga_state(conflict->pdev, false, 0, + PCI_VGA_STATE_CHANGE_BRIDGE); continue; + } /* looks like he doesn't have a lock, we can steal * them from him
dropbox link since the forum will break formatting - https://dl.dropboxusercontent.com/u/198 … able.patch
Alex, this patch works fine. I will apply until a better solution is upstream. So only problem left is reset - well known for not reliable working on AMD GPUs.
Offline
aw wrote:That patch isn't effective, don't bother with it. This one seems to work for me, but it's more heavyweight than I'd like. The problem sequence is that Xorg tries to lock VGA resources which triggers some first access callbacks. The callback in the radeon code tells the arbiter that the device no longer decodes VGA resources. Now, the arbiter still let's a process lock resources that the device doesn't decode, but it loses track of them when they're released. So vfio comes along and tries to lock VGA resources on the other card and the arbiter skips disabling VGA on the radeon bridge because the device doesn't decode VGA. The patch below gratuitously disables VGA on any potentially conflicting bridge on every vga_get, which makes VGA access even slower (not that we care so much since VGA stops being used shortly after the guest OS starts running). I'll probably rework the code more for upstream to allow devices to own resources they don't decode rather than just lock them. Enjoy.
--- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -244,8 +244,12 @@ static struct vga_device *__vga_tryget(struct vga_device *v */ WARN_ON(conflict->owns & ~conflict->decodes); match = lwants & conflict->owns; - if (!match) + if (!match) { + if (change_bridge) + pci_set_vga_state(conflict->pdev, false, 0, + PCI_VGA_STATE_CHANGE_BRIDGE); continue; + } /* looks like he doesn't have a lock, we can steal * them from him
dropbox link since the forum will break formatting - https://dl.dropboxusercontent.com/u/198 … able.patch
Alex, this patch works fine. I will apply until a better solution is upstream. So only problem left is reset - well known for not reliable working on AMD GPUs.
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 76715cc..bdb6126 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -3350,7 +3350,7 @@ static void vfio_pci_reset_handler(void *opaque)
QLIST_FOREACH(group, &group_list, next) {
QLIST_FOREACH(vdev, &group->device_list, next) {
- if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
+ if (!vdev->reset_works || !vdev->has_flr) {
vdev->needs_reset = true;
}
}
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index 76715cc..bdb6126 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -3350,7 +3350,7 @@ static void vfio_pci_reset_handler(void *opaque) QLIST_FOREACH(group, &group_list, next) { QLIST_FOREACH(vdev, &group->device_list, next) { - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) { + if (!vdev->reset_works || !vdev->has_flr) { vdev->needs_reset = true; } }
Alex I tried it already as you posted it a couple of weeks ago on qemu-devel but this doesn't help on my R9 290X. Windows VM still got BSOD on reboot. Linux VM with fglrx got kernel oops once X is started. Only thing which works is Linux VM with radeon driver but this is still more or less useless test as HAWAII XT is unaccelerated in X at all for now due to stability concerns.
Last edited by mbroemme (2014-05-25 18:41:18)
Offline
aw wrote:diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index 76715cc..bdb6126 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -3350,7 +3350,7 @@ static void vfio_pci_reset_handler(void *opaque) QLIST_FOREACH(group, &group_list, next) { QLIST_FOREACH(vdev, &group->device_list, next) { - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) { + if (!vdev->reset_works || !vdev->has_flr) { vdev->needs_reset = true; } }
Alex I tried it already as you posted it a couple of weeks ago on qemu-devel but this doesn't help on my R9 290X. Windows VM still got BSOD on reboot. Linux VM with fglrx got kernel oops once X is started. Only thing which works is Linux VM with radeon driver but this is still more or less useless test as HAWAII XT is unaccelerated in X at all for now due to stability concerns.
I guess I keep posting it because I don't know why it doesn't work. I added an HD8570 to my collection, since it seems to be based on the same generation of chips as the latest, but it works well. Can you manually test whether the card is affected by a bus reset? Try getting the device with something on the screen, ex. quit QEMU while the device is at seabios. Find the bridge upstream from the device, ex
$ lspci -tv | grep AMD
+-01.0-[01]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240 OEM]
| \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
00:01.0 is the upstream bridge.
Read the bridge control register
$ sudo setpci -s 00:01.0 3e.w
0018
OR 0x40 into the result
$ printf %04x $(( 0x0018 | 0x40 ))
0058
Write this to the same register:
$ sudo sudo setpci -s 00:01.0 3e.w=0058
Then restore the original value:
$ sudo sudo setpci -s 00:01.0 3e.w=0018
If the bus reset works, then the screen should have cleared and the monitor gone out of sync. If nothing happened, then I don't know how we can reset the device. I think one of my old GeForce cards is also impervious to bus resets.
Last edited by aw (2014-05-25 18:58:50)
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Did you set the enable_hd_vgaarb i915 option?
How do you achieve this ? I have applied the i915 patch to the 3.15r6 branch here: (needed it for my marvell controller)
https://github.com/awilliam/linux-vfio release dma-alias-v4
And on my system I have:
for i in /sys/module/i915/parameters/*; do echo $i=$(cat $i); done
/sys/module/i915/parameters/disable_display=N
/sys/module/i915/parameters/disable_power_well=1
/sys/module/i915/parameters/enable_cmd_parser=0
/sys/module/i915/parameters/enable_fbc=-1
/sys/module/i915/parameters/enable_hangcheck=Y
/sys/module/i915/parameters/enable_hd_vgaarb=N
/sys/module/i915/parameters/enable_ips=1
/sys/module/i915/parameters/enable_ppgtt=1
/sys/module/i915/parameters/enable_psr=0
/sys/module/i915/parameters/enable_rc6=-1
/sys/module/i915/parameters/fastboot=N
/sys/module/i915/parameters/invert_brightness=0
/sys/module/i915/parameters/lvds_channel_mode=0
/sys/module/i915/parameters/lvds_downclock=0
/sys/module/i915/parameters/lvds_use_ssc=-1
/sys/module/i915/parameters/modeset=-1
/sys/module/i915/parameters/panel_ignore_lid=1
/sys/module/i915/parameters/powersave=1
/sys/module/i915/parameters/prefault_disable=N
/sys/module/i915/parameters/preliminary_hw_support=0
/sys/module/i915/parameters/reset=Y
/sys/module/i915/parameters/semaphores=-1
/sys/module/i915/parameters/vbt_sdvo_panel_type=-1
Also I am reading about FLR which I am sure my R9 270x will not support:
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
Is this a dead end for me? Want to know If I should swap cards to one of my nvidias before trying anymore troubleshooting.
Right now when running :
qemu-system-x86_64 -enable-kvm -M q35 -m 1024 -cpu qemu64 -smp 6,sockets=1,cores=6,threads=1 -bios /usr/share/qemu/bios.bin -vga none -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on -device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 -vnc :1
I get an output on my monitor but it is just a black screen.
Thanks in advance!
Offline
aw wrote:Did you set the enable_hd_vgaarb i915 option?
How do you achieve this ? I have applied the i915 patch to the 3.15r6 branch here: (needed it for my marvell controller)
https://github.com/awilliam/linux-vfio release dma-alias-v4
And on my system I have:
for i in /sys/module/i915/parameters/*; do echo $i=$(cat $i); done ... /sys/module/i915/parameters/enable_hd_vgaarb=N ...
Boot with i915.enable_hd_vgaarb=1 or add "options i915 enable_hd_vgaarb=1" to modprobe.d and rebuild your initramfs
Also I am reading about FLR which I am sure my R9 270x will not support:
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
Is this a dead end for me? Want to know If I should swap cards to one of my nvidias before trying anymore troubleshooting.
No graphics cards support FLR. The question is whether the newer AMD cards respond to bus reset or not. See previous post to test.
Right now when running :
qemu-system-x86_64 -enable-kvm -M q35 -m 1024 -cpu qemu64 -smp 6,sockets=1,cores=6,threads=1 -bios /usr/share/qemu/bios.bin -vga none -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on -device vfio-pci,host=01:00.1,bus=root.1,addr=00.1 -vnc :1
I get an output on my monitor but it is just a black screen.
You won't get any output until i915 VGA arbiter support is enabled.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
--- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -244,8 +244,12 @@ static struct vga_device *__vga_tryget(struct vga_device *v */ WARN_ON(conflict->owns & ~conflict->decodes); match = lwants & conflict->owns; - if (!match) + if (!match) { + if (change_bridge) + pci_set_vga_state(conflict->pdev, false, 0, + PCI_VGA_STATE_CHANGE_BRIDGE); continue; + } /* looks like he doesn't have a lock, we can steal * them from him
dropbox link since the forum will break formatting - https://dl.dropboxusercontent.com/u/198 … able.patch
Here's the version proposed upstream https://lkml.org/lkml/2014/5/25/94
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline