You are not logged in.
when i set intel_iommu=on i got a disgraceful bug (less than 5s start the screen got stalled code-like)
and i don't have idea what i can do to get this working
If you can't enable the IOMMU, you can't do PCI device assignment, end of story. If you provide us a snapshot of some of the errors you're seeing when you enable it, maybe we can help. You might also try a newer kernel to see if it can be enabled there, you haven't specified your host environment.
EDIT: if that screenshot is what happens when you try to boot with intel_iommu=on, try intel_iommu=on,igfx_off
Last edited by aw (2014-11-04 18:53:30)
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
edit: too late
Last edited by slis (2014-11-04 18:46:08)
Offline
Hello again !
I achieved to do VGA-Passthrough on another machine.
It is a desktop and more modern configuration but the GPU is quite old.CPU: i5-4670T (Haswell family)
GPU: Nvidia 9500 GSIt's working fine, now going to do some tuning but there I have some questions first.
Question 1: While I could do VGA-PT with my laptop without applying any patches, I can't with this new config (black screen). Especially the vga-arbiter patch. From what I understood from this article from Alex Williamson (What's the deal with VGA arbitration ?), this is because my brand new CPU's IGP is too recent ?
Alex Williamson wrote:The native IGD driver would really like to continue running when VGA routing is directed elsewhere, so the PCI method of disabling access is sub-optimal. At some point Intel knew this and included mechanisms in the device that allowed VGA MMIO and I/O port space to be disabled separately from PCI MMIO and I/O port space. These mechanisms are what the i915 driver uses today when it opts-out of VGA arbitration.
The problem is that modern IGD devices don't implement the same mechanisms for disabling legacy VGA. The i915 driver continues to twiddle the old bits and opt-out of arbitration, but the IGD VGA is still claiming VGA transactions. This is why many users who claim they don't need the i915 patch finish their proclamation of success with a note about the VGA color palate being corrupted or noting some strange drm interrupt messages on the host dmesg. [...]
So why can't we simply fix the i915 driver to correctly disable legacy VGA support? Well, in their infinite wisdom, hardware designers have removed (or broken) that feature of IGD.
My laptop's CPU is intel 2nd-gen, I have some color problems whenever I shutdown the VM, but this doesn't bother me much. So, I guess the feature is still there and not broken, but is on my intel 4th-gen CPU, right ?
The only IGDs that work with VGA arbitration are the ones were the GPU is on the chipset, not the CPU. If you saw screen corruption on your laptop, even if it didn't bother you, it's the same problem. It's probably more a difference of the GPU being assigned and whether the guest drivers are able to initialize it despite not having VGA access to it.
Question 2: My laptop's configuration needs a vbios rom in order to work, but this is not the case of my desktop's GPU. In fact, whenever I specify an extracted vbios it won't work: black screen (I extracted it myself from a live-cd, I strictly followed the same procedure than for my laptop vbios). My first thought was the GPU is a bit too old but I think it's a stupid idea. I used rom-parser to check the vbios and it says it's fine, reporting correct vendors ids. Where could it comes from ? Any suggestions ? It tickles me a lot !
Was the GPU the primary graphics device when you extracted the BIOS? PCs will load primary graphics BIOS into a different region and it may be modified runtime. If that's the one you get when extracted, it may not work correctly. If QEMU/VFIO is able to get the correct ROM from the card, why do you care?
Question 3: Now that I begin to understand PCI-PT in qemu, should I try to write a tutorial for french people ? Isn't it a bit risky since I don't have a wide knowledge of the internal mechanics and couldn't properly troubleshoot specific case issues ?
Those of us who don't speak french won't be able to identify any mistakes, but more helpful documentation is always a good thing.
Last edited by aw (2014-11-04 18:47:38)
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
[sorry, meant to edit, not re-post]
Last edited by aw (2014-11-04 18:47:10)
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
You are missing intel_iommu=on
edit: forgot intel_
when i enable intel_iommu=on i got the image result.
*edit*
tried with intel_iommu=on,igfx_off
accly the same.
*edit2*
i'm recompiling git kernel disabling some optimizations to see how it does.
gonna post here if got problems.
*final edit*
got all working with less optimzed code, it was gcc fault
Last edited by Uramekus (2014-11-04 19:51:28)
Offline
mouton_sanglant wrote:Hello again !
I achieved to do VGA-Passthrough on another machine.
It is a desktop and more modern configuration but the GPU is quite old.CPU: i5-4670T (Haswell family)
GPU: Nvidia 9500 GSIt's working fine, now going to do some tuning but there I have some questions first.
Question 1: While I could do VGA-PT with my laptop without applying any patches, I can't with this new config (black screen). Especially the vga-arbiter patch. From what I understood from this article from Alex Williamson (What's the deal with VGA arbitration ?), this is because my brand new CPU's IGP is too recent ?
Alex Williamson wrote:The native IGD driver would really like to continue running when VGA routing is directed elsewhere, so the PCI method of disabling access is sub-optimal. At some point Intel knew this and included mechanisms in the device that allowed VGA MMIO and I/O port space to be disabled separately from PCI MMIO and I/O port space. These mechanisms are what the i915 driver uses today when it opts-out of VGA arbitration.
The problem is that modern IGD devices don't implement the same mechanisms for disabling legacy VGA. The i915 driver continues to twiddle the old bits and opt-out of arbitration, but the IGD VGA is still claiming VGA transactions. This is why many users who claim they don't need the i915 patch finish their proclamation of success with a note about the VGA color palate being corrupted or noting some strange drm interrupt messages on the host dmesg. [...]
So why can't we simply fix the i915 driver to correctly disable legacy VGA support? Well, in their infinite wisdom, hardware designers have removed (or broken) that feature of IGD.
My laptop's CPU is intel 2nd-gen, I have some color problems whenever I shutdown the VM, but this doesn't bother me much. So, I guess the feature is still there and not broken, but is on my intel 4th-gen CPU, right ?
The only IGDs that work with VGA arbitration are the ones were the GPU is on the chipset, not the CPU. If you saw screen corruption on your laptop, even if it didn't bother you, it's the same problem. It's probably more a difference of the GPU being assigned and whether the guest drivers are able to initialize it despite not having VGA access to it.
Not sure to understand what that implies. Should I care about it even if it works perfectly fine (even while gaming) ?
Question 2: My laptop's configuration needs a vbios rom in order to work, but this is not the case of my desktop's GPU. In fact, whenever I specify an extracted vbios it won't work: black screen (I extracted it myself from a live-cd, I strictly followed the same procedure than for my laptop vbios). My first thought was the GPU is a bit too old but I think it's a stupid idea. I used rom-parser to check the vbios and it says it's fine, reporting correct vendors ids. Where could it comes from ? Any suggestions ? It tickles me a lot !
Was the GPU the primary graphics device when you extracted the BIOS? PCs will load primary graphics BIOS into a different region and it may be modified runtime. If that's the one you get when extracted, it may not work correctly. If QEMU/VFIO is able to get the correct ROM from the card, why do you care?
I think it was, the (motherboard) BIOS got an option to choose the main Graphic Device.
Anyway, I don't need i. It was just curious.
Question 3: Now that I begin to understand PCI-PT in qemu, should I try to write a tutorial for french people ? Isn't it a bit risky since I don't have a wide knowledge of the internal mechanics and couldn't properly troubleshoot specific case issues ?
Those of us who don't speak french won't be able to identify any mistakes, but more helpful documentation is always a good thing.
I'll do it.
One extra question: I saw many people videos achieving to pass the resulting output inside a window inside the host screen. How can I do that ?
I tried with xfreerdp but couldn't success. RemoteFX doesn't seem to be what I'm looking for. I noticed VNC has been mentionned many times in the previous posts but a friend told me to never use BNC because it's a total crap (slow and not free). I never used it but, is it the right way to pass the GPU output to my host ? is it time for me to make my own opinion of VNC ?
Offline
aw wrote:mouton_sanglant wrote:Hello again !
I achieved to do VGA-Passthrough on another machine.
It is a desktop and more modern configuration but the GPU is quite old.CPU: i5-4670T (Haswell family)
GPU: Nvidia 9500 GSIt's working fine, now going to do some tuning but there I have some questions first.
Question 1: While I could do VGA-PT with my laptop without applying any patches, I can't with this new config (black screen). Especially the vga-arbiter patch. From what I understood from this article from Alex Williamson (What's the deal with VGA arbitration ?), this is because my brand new CPU's IGP is too recent ?
My laptop's CPU is intel 2nd-gen, I have some color problems whenever I shutdown the VM, but this doesn't bother me much. So, I guess the feature is still there and not broken, but is on my intel 4th-gen CPU, right ?
The only IGDs that work with VGA arbitration are the ones were the GPU is on the chipset, not the CPU. If you saw screen corruption on your laptop, even if it didn't bother you, it's the same problem. It's probably more a difference of the GPU being assigned and whether the guest drivers are able to initialize it despite not having VGA access to it.
Not sure to understand what that implies. Should I care about it even if it works perfectly fine (even while gaming) ?
Do you care that your VM is doing reads and writes to the host GPU in ways that can corrupt its state? I do. I'd expect you should care too. Besides, without VGA going to the correct device, why even bother with x-vga=on, you might as well assign the GPU as a secondary device, or at least disable VGA access to it.
One extra question: I saw many people videos achieving to pass the resulting output inside a window inside the host screen. How can I do that ?
I tried with xfreerdp but couldn't success. RemoteFX doesn't seem to be what I'm looking for. I noticed VNC has been mentionned many times in the previous posts but a friend told me to never use BNC because it's a total crap (slow and not free). I never used it but, is it the right way to pass the GPU output to my host ? is it time for me to make my own opinion of VNC ?
It's usually done through VNC but I think the videos make it look far more functional than it really is. Conceptually it's the same as trying to play a game from a remote system over VNC, the only difference for the VM is that you can get a high performance local network connection between the host and guest. VNC also consumes quite a bit of CPU to do the encoding and decoding, so you need extra cores for that. I think that most people here are using a 2nd monitor for the assigned GPU output or switching inputs on their monitor.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Ok, so I will apply the vga-arb patch when I got this remote control working out.
I don't really care about the extra cpu cycles because I won't use it often (once or twice a month). I just added "-vnc 0.0.0.0:1,mypasswd" to my qemu command but when connecting with client, the qemu console shows up. So I guess I can't go that way and I must install a vnc server inside the guest. Right ?
Offline
Ok, so I will apply the vga-arb patch when I got this remote control working out.
I don't really care about the extra cpu cycles because I won't use it often (once or twice a month). I just added "-vnc 0.0.0.0:1,mypasswd" to my qemu command but when connecting with client, the qemu console shows up. So I guess I can't go that way and I must install a vnc server inside the guest. Right ?
yes, -vnc only works with emulated graphics devices
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Our of curiosity, why is the linux-mainline-3.17.2.tar.gz (3.17.2 + acs override patch + i915 vga arbiter fixes) on the first post not in the AUR?
Offline
Our of curiosity, why is the linux-mainline-3.17.2.tar.gz (3.17.2 + acs override patch + i915 vga arbiter fixes) on the first post not in the AUR?
You wanna maintain it? Go ahead.
Offline
I'm passing through two GTX 780 Ti to one Win7 guest simultaneously and they work perfectly in the guest system now (and latest drivers), without SLI though.
Now to try and get SLI working I tried a few different configurations with and without ioh3420 in between but in any case the NVidia control panel does not offer SLI settings.
As the cards are running in the VM perfectly fine as two single cards side by side I guess it's not the driver disabling it on purpose due to VM detection but rather a missing requirement on the emulated hardware.
As real hardware requires to have a SLI-enabled mainboard and possibly having SLI enabled in BIOS or through jumper settings, does anyone know what's happening there on real hardware and if any of that is implemented in qemu/seabios?
Offline
I'm passing through two GTX 780 Ti to one Win7 guest simultaneously and they work perfectly in the guest system now (and latest drivers), without SLI though.
Now to try and get SLI working I tried a few different configurations with and without ioh3420 in between but in any case the NVidia control panel does not offer SLI settings.
As the cards are running in the VM perfectly fine as two single cards side by side I guess it's not the driver disabling it on purpose due to VM detection but rather a missing requirement on the emulated hardware.As real hardware requires to have a SLI-enabled mainboard and possibly having SLI enabled in BIOS or through jumper settings, does anyone know what's happening there on real hardware and if any of that is implemented in qemu/seabios?
From the wikipedia page:
Not all motherboards with multiple PCI-Express x16 slots support SLI. Recent motherboards as of August 2014 that support it are Intel's Z and X series chipsets (Z68, Z78, Z87, Z97, and X79) and AMD's 990FX chipset. Aside from a few exceptions, older motherboards needed the nForce series of chipsets to allow for SLI.
Sounds like it's pretty tied to specific hardware, which QEMU does not implement.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
From the wikipedia page:
Not all motherboards with multiple PCI-Express x16 slots support SLI. Recent motherboards as of August 2014 that support it are Intel's Z and X series chipsets (Z68, Z78, Z87, Z97, and X79) and AMD's 990FX chipset. Aside from a few exceptions, older motherboards needed the nForce series of chipsets to allow for SLI.
Sounds like it's pretty tied to specific hardware, which QEMU does not implement.
Hmm, interesting. I wonder if the chipset actually has to do any additional work since there was a hypervisor which faked it: http://www.techpowerup.com/forums/threa … ds.153046/
Also since Duelist got Crossfire working I'm rather optimistic that not a whole new emulated chipset implementation is required for SLI.
Last edited by mmm (2014-11-06 16:34:04)
Offline
aw wrote:From the wikipedia page:
Not all motherboards with multiple PCI-Express x16 slots support SLI. Recent motherboards as of August 2014 that support it are Intel's Z and X series chipsets (Z68, Z78, Z87, Z97, and X79) and AMD's 990FX chipset. Aside from a few exceptions, older motherboards needed the nForce series of chipsets to allow for SLI.
Sounds like it's pretty tied to specific hardware, which QEMU does not implement.
Hmm, interesting. I wonder if the chipset actually has to do any additional work since there was a hypervisor which faked it: http://www.techpowerup.com/forums/threa … ds.153046/
Also since Duelist got Crossfire working I'm rather optimistic that not a whole new emulated chipset implementation is required for SLI.
Ahem!
First, i'm dumb. Do not forget that.
Second, there is a clarification note that i am writing now.
Third, my crossfire system DOES NOT HAVE any external bridges, it works solely through PCI-E. It is XDMA, it is very different from NVidia's SLI.I even have to enable IOMMU in bios to get that crossfire working under linux.
Fourth, NVidia intents to break everything at will. Too much software blocks. You've seen that linus torvalds reaction on nvidia? I agree with him.
Last edited by Duelist (2014-11-06 18:38:15)
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
Awesome! Now i understand what is happening with my VM and why does it crash. But still not getting why it doesn't.
Preface: I do not know anything about linux memory management and x86 memory management and i DO NOT know how IOMMU works, so there MAY be huge misunderstandings and errors. Do not rely on my brain and train of thoughts.
A short story about how fat firefox saved the day multiple times and yet managed to show me webpages with the power of google.
I have IOMMU enabled, but working in it's default, noforce mode, without iommu=pt.
I thought it was enough just to get IOMMU working, and it'll handle the stuff itself.
So, i have rather sparse system RAM ranges in /proc/iomem, like 256k here, 256m there, 1g over here etc. and a huge chunk(~4-6g) in the end, 0x100001000-0x24EFFFFFF.
Without using -realtime mlock=on, VM gets a memory range that can be paged and moved around by kernel. PCI device drivers in guest, however, do not expect this happening, they do DMA to some address, that DMA get translated by IOMMU, it gets out of VM's memory range and BAM - IO_PAGE_FAULT, VM crash, sometimes host gets shutdown due to memory corruption caused or something.(previously i thought that there was an interrupt storm happening, but MSI gets enabled normally, and watch -n1 "cat /proc/interrupts" until host shutdown showed me no unusual stuff)
So, we have problems due to memory paging and moving and iommu not translating everything right. I pass -realtime mlock=on, qemu mlocks itself SOME, maybe even MULTIPLE chunks of memory somewhere. When i run the QEMU after the system startup, there is very little memory consumed and QEMU being one of the first processes launched, so it has memory near zero, where it is very sparse. Guest PCI device driver does DMA to some address, IOMMU remaps this again and we have a page fault again.
BUT EVENTUALLY, it starts working when ~800mb of memory is consumed or some time has passed. The following criteria was met at my first QEMU startup, it mlocked itself memory in the range where it can have large and continous chunks of memory, and guest doing DMA is fine. It can be damaged if the kernel gives QEMU sparse enough memory to mlock - we will get page faults again.
So, as i think, the problem hides in the IOMMU - it does redirect some DMA. Then i've tried reading linux kernel parameters, and there is no documentation given at all:
iommu= [x86]
off
force
noforce
biomerge
panic
nopanic
merge
nomerge
forcesac
soft
pt [x86, IA-64]
Yeah. What any of these mean?
So i went to google, and found this LKML message archived. Dated 2007. Better than nothing, but not approved, and there is no clarification on iommu=pt option, since there was no such option back in that time.
Awesome.
Maybe boot-options will help?
Yay, some clarification, but no iommu=pt option again. Yeah, i could've used nice, old iommu=soft, but something hints me that would be slow.
Then i went to google again, and found a part of kernel source code here with a nice comment:
39 /*
40 * This variable becomes 1 if iommu=pt is passed on the kernel command line.
41 * If this variable is 1, IOMMU implementations do no DMA translation for
42 * devices and allow every device to access to whole physical memory. This is
43 * useful if a user wants to use an IOMMU only for KVM device assignment to
44 * guests and not for driver dma translation.
45 */
Hmm... That seems to FIT EXACTLY FOR MY USE!
But that's not the end. Things are getting interesting.
Let's see what it looks like when VM is working right.
After a brief read of this...
We start a qemu, stop it to give us some time until windows boots and starts GPU drivers.
Then we do ps -ef | grep qemu to determine it's PID.
As root: cat /proc/QEMU-PID/maps | grep vfio
We have six ranges of memory mapped, looking like this:
7fc908000000-7fc918000000 rw-s b0000000 00:09 8529 anon_inode:[vfio-device]
7fc918000000-7fc928000000 rw-s c0000000 00:09 8529 anon_inode:[vfio-device]
7fca63b08000-7fca63b48000 rw-s fe200000 00:09 8529 anon_inode:[vfio-device]
7fca63b48000-7fca63b88000 rw-s fe300000 00:09 8529 anon_inode:[vfio-device]
7fca63c30000-7fca63c34000 rw-s fe260000 00:09 8529 anon_inode:[vfio-device]
7fca63c38000-7fca63c3c000 rw-s fe360000 00:09 8529 anon_inode:[vfio-device]
Well, as lspci -v hints us, these are really PCI device memory ranges!
But these addresses are virtual, it's how it looks like from the process' side. That isn't very interesting to us, but still interesting.
How do we translate virtual address to physical?
The power of google showed me this neat program that does that.
Copy-paste-save-gcc...
Big endian? 0
Vaddr: 0x7fc908000000, Page_size: 4096, Entry_size: 8
Reading /proc/3469/pagemap at 0x3fe4840000
[0]0xc3 [1]0x32 [2]0x1d [3]0x0 [4]0x0 [5]0x0 [6]0x0 [7]0x86
Result: 0x86000000001d32c3
PFN: 0x1d32c3
Yay, a physical address! And it fits. And VM works nice. And it doesn't crash.
Apparently, i didn't managed to crash the VM again on purpose, but when it does, it looks like...
AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0021 address=0x00000001198c0000 flags=0x0010]
Yet the address is different(the example used here is very old) and it touches some other process' memory or even excedes the system ram range.
What i am not understanding fully - why does the DMA get remapped normally when it isn't out of qemu's memory bounds.
Maybe that whole story is a huge lie, heh, read the preface.
Morale:
IOMMU=PT IN KERNEL PARAMETERS OR DEATH!
Neat line in dmesg:
AMD-Vi: Initialized for Passthrough Mode
P.S.
Sorry for the double post, heh.
Last edited by Duelist (2014-11-06 19:30:00)
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
Did you try "-mem-prealloc"? If dynamic memory allocation is your problem, I feel like that should help.
Edit: nvm - saw it in your previous post now. Still wondering though how dynamic allocation could be a problem then.
Last edited by mmm (2014-11-06 20:28:50)
Offline
Oh dear, where to begin...
So, i have rather sparse system RAM ranges in /proc/iomem, like 256k here, 256m there, 1g over here etc. and a huge chunk(~4-6g) in the end, 0x100001000-0x24EFFFFFF.
/proc/iomem is not RAM, it's device MMIO addresses
Without using -realtime mlock=on, VM gets a memory range that can be paged and moved around by kernel.
Only when an assigned device is NOT present.
PCI device drivers in guest, however, do not expect this happening, they do DMA to some address, that DMA get translated by IOMMU, it gets out of VM's memory range and BAM - IO_PAGE_FAULT, VM crash,
This is why when there's an assigned device guest memory is pinned, making the mlock redundant.
sometimes host gets shutdown due to memory corruption caused or something.(previously i thought that there was an interrupt storm happening, but MSI gets enabled normally, and watch -n1 "cat /proc/interrupts" until host shutdown showed me no unusual stuff)
So, we have problems due to memory paging and moving and iommu not translating everything right. I pass -realtime mlock=on, qemu mlocks itself SOME, maybe even MULTIPLE chunks of memory somewhere. When i run the QEMU after the system startup, there is very little memory consumed and QEMU being one of the first processes launched, so it has memory near zero, where it is very sparse. Guest PCI device driver does DMA to some address, IOMMU remaps this again and we have a page fault again.
BUT EVENTUALLY, it starts working when ~800mb of memory is consumed or some time has passed. The following criteria was met at my first QEMU startup, it mlocked itself memory in the range where it can have large and continous chunks of memory, and guest doing DMA is fine. It can be damaged if the kernel gives QEMU sparse enough memory to mlock - we will get page faults again.
So, as i think, the problem hides in the IOMMU - it does redirect some DMA. Then i've tried reading linux kernel parameters, and there is no documentation given at all:
iommu= [x86] off force noforce biomerge panic nopanic merge nomerge forcesac soft pt [x86, IA-64]
Yeah. What any of these mean?
So i went to google, and found this LKML message archived. Dated 2007. Better than nothing, but not approved, and there is no clarification on iommu=pt option, since there was no such option back in that time.
Awesome.
Maybe boot-options will help?
Yay, some clarification, but no iommu=pt option again. Yeah, i could've used nice, old iommu=soft, but something hints me that would be slow.Then i went to google again, and found a part of kernel source code here with a nice comment:
39 /* 40 * This variable becomes 1 if iommu=pt is passed on the kernel command line. 41 * If this variable is 1, IOMMU implementations do no DMA translation for 42 * devices and allow every device to access to whole physical memory. This is 43 * useful if a user wants to use an IOMMU only for KVM device assignment to 44 * guests and not for driver dma translation. 45 */
Hmm... That seems to FIT EXACTLY FOR MY USE!
But that's not the end. Things are getting interesting.
Let's see what it looks like when VM is working right.After a brief read of this...
We start a qemu, stop it to give us some time until windows boots and starts GPU drivers.
Then we do ps -ef | grep qemu to determine it's PID.
As root: cat /proc/QEMU-PID/maps | grep vfio
We have six ranges of memory mapped, looking like this:7fc908000000-7fc918000000 rw-s b0000000 00:09 8529 anon_inode:[vfio-device] 7fc918000000-7fc928000000 rw-s c0000000 00:09 8529 anon_inode:[vfio-device] 7fca63b08000-7fca63b48000 rw-s fe200000 00:09 8529 anon_inode:[vfio-device] 7fca63b48000-7fca63b88000 rw-s fe300000 00:09 8529 anon_inode:[vfio-device] 7fca63c30000-7fca63c34000 rw-s fe260000 00:09 8529 anon_inode:[vfio-device] 7fca63c38000-7fca63c3c000 rw-s fe360000 00:09 8529 anon_inode:[vfio-device]
Well, as lspci -v hints us, these are really PCI device memory ranges!
But these addresses are virtual, it's how it looks like from the process' side. That isn't very interesting to us, but still interesting.How do we translate virtual address to physical?
The power of google showed me this neat program that does that.Copy-paste-save-gcc...
Big endian? 0 Vaddr: 0x7fc908000000, Page_size: 4096, Entry_size: 8 Reading /proc/3469/pagemap at 0x3fe4840000 [0]0xc3 [1]0x32 [2]0x1d [3]0x0 [4]0x0 [5]0x0 [6]0x0 [7]0x86 Result: 0x86000000001d32c3 PFN: 0x1d32c3
Yay, a physical address! And it fits. And VM works nice. And it doesn't crash.
Um, ok. What's PFN 0x1d32c3 tell you? That's a Page Frame Number, so you need to multiply by the page size (4k) to get a physical address, so 0x1d32c3000 is the physical address, what does that tell us?
You can also note in this example:
01:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
Region 0: [virtual] Memory at f21a0000 (64-bit, non-prefetchable) [size=16K]
^^^^^^^^^^^
Region 3: [virtual] Memory at f21c0000 (64-bit, non-prefetchable) [size=16K]
^^^^^^^^^^^
7f96f35a2000-7f96f35a5000 rw-s f21c1000 00:09 7271 anon_inode:[vfio-device]
^^^^^^^^^^
7f96f35a5000-7f96f35a9000 rw-s f21a0000 00:09 7271 anon_inode:[vfio-device]
^^^^^^^^^^
But I'm still not sure what you're reading into that.
Apparently, i didn't managed to crash the VM again on purpose, but when it does, it looks like...
AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0021 address=0x00000001198c0000 flags=0x0010]
Yet the address is different(the example used here is very old) and it touches some other process' memory or even excedes the system ram range.
adjusting the guest memory size is how I typically test whether these are stray DMAs or something is broken. That address is just over 4.5G, so if you boot your guest with 2G of ram, do you still get these types of faults?
What i am not understanding fully - why does the DMA get remapped normally when it isn't out of qemu's memory bounds.
Maybe that whole story is a huge lie, heh, read the preface.
<ahhem> I think you're jumping to some inaccurate conclusions here.
Morale:
IOMMU=PT IN KERNEL PARAMETERS OR DEATH!
Not true at all.
What happens if you use the vfio_iommu_type1 module option disable_hugepages=1? By default vfio will attempt to map the largest contiguous memory chunk it can find through the IOMMU. AMD-Vi supports nearly any power of two size, but with AMD going out of favor with server vendors, I don't expect it gets much testing. The disable_hugepages forces the vfio code to map each page individually, avoiding superpage bugs in the IOMMU driver.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Hello everyone !
I noticed a hiccup that occurs to every VM along with the host every half a minute .
To my surprise , rebuilding qemu-git solved the issue for me , but wait ! I did a little change :
I removed Haswell "optimizations" from makepkg.conf :
-march=core-avx2 -mtune=haswell (if I remember these values correctly) , and returned everything to default : -march=x86-64 -mtune=generic .
Rebuilt qemu-git + ovmf-svn , reboot and all hiccups are gone !
So generic gcc profile (for me at least) was "optimized" alot better than specific CPU optimizations .
Hope this helps anyone having similar issues !
Last edited by Denso (2014-11-07 01:36:47)
Offline
This is why when there's an assigned device guest memory is pinned, making the mlock redundant.
Then why does it helps?
Um, ok. What's PFN 0x1d32c3 tell you? That's a Page Frame Number, so you need to multiply by the page size (4k) to get a physical address, so 0x1d32c3000 is the physical address, what does that tell us?
Yes, i've forgot to mention that this needs to multiplied by page size. I did that. That part of my bunch-of-letters is kinda useless because i haven't managed to get my VMs faulty memory chunks. Once i got a host shutdown. But the idea was simple - i determine the physical address, add 256M to it, and if compared to MMIO table it looks way to bad - that could mean that the page faults are inevitable.
adjusting the guest memory size is how I typically test whether these are stray DMAs or something is broken. That address is just over 4.5G, so if you boot your guest with 2G of ram, do you still get these types of faults?
Heh, why i didn't come up with such an idea.. But digging in the error logs - i've had different addresses, even pointing to near-zero(~<600M) system RAM ranges.
<ahhem> I think you're jumping to some inaccurate conclusions here.
Preface.
Morale:
IOMMU=PT IN KERNEL PARAMETERS OR DEATH!Not true at all.
Well, it appears to be fixing page faults on my system. After some reboots mixed with complete power cycles i couldn't reproduce page faults again. Or maybe i'm just very lucky.
What happens if you use the vfio_iommu_type1 module option disable_hugepages=1? By default vfio will attempt to map the largest contiguous memory chunk it can find through the IOMMU. AMD-Vi supports nearly any power of two size, but with AMD going out of favor with server vendors, I don't expect it gets much testing. The disable_hugepages forces the vfio code to map each page individually, avoiding superpage bugs in the IOMMU driver.
It sounds very different from "just not using hugepages". I'll test that.
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
Hello everyone !
I noticed a hiccup that occurs to every VM along with the host every half a minute .
To my surprise , rebuilding qemu-git solved the issue for me , but wait ! I did a little change :
I removed Haswell "optimizations" from makepkg.conf :
-march=core-avx2 -mtune=haswell (if I remember these values correctly) , and returned everything to default : -march=x86-64 -mtune=generic .
Rebuilt qemu-git + ovmf-svn , reboot and all hiccups are gone !
So generic gcc profile (for me at least) was "optimized" alot better than specific CPU optimizations .
Hope this helps anyone having similar issues !
Please note, since GCC 4.9 generic optimization is now based on Intel Core and AMD Bulldozer microarchitectures' optimizations. What GCC version you were using when hiccups appeared?
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
Denso wrote:Hello everyone !
I noticed a hiccup that occurs to every VM along with the host every half a minute .
To my surprise , rebuilding qemu-git solved the issue for me , but wait ! I did a little change :
I removed Haswell "optimizations" from makepkg.conf :
-march=core-avx2 -mtune=haswell (if I remember these values correctly) , and returned everything to default : -march=x86-64 -mtune=generic .
Rebuilt qemu-git + ovmf-svn , reboot and all hiccups are gone !
So generic gcc profile (for me at least) was "optimized" alot better than specific CPU optimizations .
Hope this helps anyone having similar issues !
Please note, since GCC 4.9 generic optimization is now based on Intel Core and AMD Bulldozer microarchitectures' optimizations. What GCC version you were using when hiccups appeared?
4.9.2
Maybe GCC knows whats best for my CPU better than me
Offline
Shader and TM units count doubled on the slave crossfire card. Maybe that is related to video memory magic.
The cards are identical. Only differences are different heat pipe ends and a serial number. I don't know if it's a GPU-Z bug or VM errors(or maybe it actually works so), but it works and looks funny.
Well, appears like i was extremely lucky yesterday having no io page faults.
Today first VM launch resulted in a page fault, and i've recorded that. The address in question was located somewhere near 2.6Gb of MMIO address space.
Now i've disabled hugepages and the current VM instance seem to be stable.
Also, aw, modinfo kvm tells me that there is
parm: allow_unsafe_assigned_interrupts:Enable device assignment on platforms without interrupt remapping support. (bool)
which seem to be very alike to...
parm: allow_unsafe_interrupts:Enable VFIO IOMMU support for on platforms without interrupt remapping support. (bool)
in your vfio_iommu_type1 module. Am i right that it does the same thing, but for pci-assign? Or it enables interrupt remapping system-wide, regardless of method used?
Oh, and also,append a fix of that copy-paste "for on platforms" typo to your TODO list;)
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
Also, aw, modinfo kvm tells me that there is
parm: allow_unsafe_assigned_interrupts:Enable device assignment on platforms without interrupt remapping support. (bool)
which seem to be very alike to...
parm: allow_unsafe_interrupts:Enable VFIO IOMMU support for on platforms without interrupt remapping support. (bool)
in your vfio_iommu_type1 module. Am i right that it does the same thing, but for pci-assign? Or it enables interrupt remapping system-wide, regardless of method used?
Oh, and also,append a fix of that copy-paste "for on platforms" typo to your TODO list;)
Right, those are equivalent options, one for pci-assign and one for vfio-pci. Neither do anything to enable interrupt remapping, they just allow use of the driver without interrupt remapping enabled.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Right, those are equivalent options, one for pci-assign and one for vfio-pci. Neither do anything to enable interrupt remapping, they just allow use of the driver without interrupt remapping enabled.
But how does it works without interrupt remapping then? Using the emulated, kvm vcpu's PIC? Or something?
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline