The right approach for Intel iGPU + Nvidia GPU hardware acceleration

thraizz · 2021-11-25 15:49:35

Hi,
I hope this first post is okay as its not a direct question but more a general lack of knowledge on my part.
I have been using arch for the past two years and recently switched to it as a daily driver for work, using a ThinkPad P14s.
It has a i7 10610U (CometLake) and the CometLake-U GT2 iGPU, as well as a Quadro P520.

Now here comes the struggle.

I tried everything to get good performance out of the P520. Various kernel parameters, early loading of nvidia modules / just intel / both, setting it as PrimaryGPU - nothing gave me satisfying performance.

Now, what is the right way to use the Quadro P520 GPU?

For a long time I believed that performance would be best if I ran the P520 by using the OutputClass section with the Option "PrimaryGPU" "yes" for the nvidia output and the intel iGPU only as modesetting for display output. But when using the P520 as primary gpu, I dont have any hardware acceleration available, or so it seems.
I see that the Xorg server and various other processes are running when checking with nvidia-smi, and I also see that nothing is using the iGPU with intel_gpu_top. But playing youtube videos over firefox (with or without h.264, doesnt matter) brings me high loads (something around 3.20).

Only specifying the nvidia OutputClass and marking the iGPU as PrimaryGPU and setting VAAPI + VDPAU env vars to point to intel seem to work better.
The load goes down to 0.70 and I can see that the iGPU actually provides hardware acceleration by looking at intel_gpu_top. I use most recent (git version) intel-media-drivers from the AUR, as well as most recent nvidia drivers.

Why is the iGPU so much better at video decoding / hw acceleration than the P520? Am I doing something wrong? Is it just the missing support for various formats on the P520? I And why do I have to use modesetting for the iGPU? Shouldnt I use intel?
I am at a complete loss with this.

Thanks for any feedback / thoughts on this.

V1del · 2021-12-01 17:15:18

The biggest advantage to HW accelerated video playback is the possibility to directly show a decoded video frame to the screen without roundtripping back to the CPU. The very definition of an Optimus system by it's design prevents this optimisation (your nvidia card does not have a screen, so what needs to happen it needs to render whatever, then tell the CPU/intel GPU "I'm done plaster this on the screen please" which will always incur a certain overhead), not even including the fact that video HW accel in general is wonky on linux.

So by this logic alone, if you have a workload that the intel GPU is sufficient handling on it's own, you'll almost always be better off with the igpu rather than invoking the nvidia GPU. You'll have to find workloads that the dGPU is so much better at that you have a net positive in comparison to the CPU overhead. For the majority of normal video decoding workloads the intel GPU will be adequate.

This is just the general theory behind the hardware, in addition to that on linux specifically. the predominant HW decoding API has turned out to be vaapi and all these browser methods will use VAAPI. Nvidia support for vaapi is basically non-existant since they've initially done VDPAU (... which they were technically first, so if intel had played ball we could have just one decoding API, but I digress) and more recently have nvdec/nvenc which you'll find in ffmpeg/obs and mpv but not in browsers. nvidia has vaapi support via an old wrapper library that wraps VAAPI over VDPAU, but that hasn't really been maintained in years and is "slowish" in general, so that's an additional linux specific perf hit you will have for this particular workload.

As for modesetting and xf86-video-intel, that has no relation to any of this and defines which driver Xorg will use for rendering 2D operations. The "problem" here is that xf86-video-intel is barely, to not at all maintained and hasn't seen a stable release in nearly 8 years. It has a plethora of bugs and strange behaviors that haven't been fixed in years. Ubuntu and Redhat officially don't set it up anymore for newer than Sandybridge GPUs. It should really only be used if you have an older GPU/ a specific usecase that you can't/don't want to handle differently. E.g. You can enable built in Tearing prevention on it, but those are often better and more fine grainedly handled by standalone compositors. And it technically has faster 2D code compared to modesetting, but you'd probably only notice that in synthetic benchmarks.

If you want to use the nvidia GPU more, I'd probably suggest you just setup nvidia-prime and then use it on demand for example for games or other heavier tasks where you'd see an actual benefit. Alternatively, as it's a Quadro you might have a BIOS/UEFI option to disable the intel GPU in HW which should prevent these overheads mentioned in the first paragraph and allow full direct access to the HW.

thraizz · 2021-12-01 17:26:17

After your reply to me in the other post, I tried to think of some questions as another post, but noticed I couldn't even specify what I wanted to know.
But this answers it! Thank you for your time and the detailed answer.

cloverskull · 2021-12-01 18:34:27

@V1del this is an excellent write-up, and while I don't have hardware that could make use of your advice here, it is very interesting to read and I appreciate you taking the time.

Arch Linux

#1 2021-11-25 15:49:35

The right approach for Intel iGPU + Nvidia GPU hardware acceleration

#2 2021-12-01 17:15:18

Re: The right approach for Intel iGPU + Nvidia GPU hardware acceleration

#3 2021-12-01 17:26:17

Re: The right approach for Intel iGPU + Nvidia GPU hardware acceleration

#4 2021-12-01 18:34:27

Re: The right approach for Intel iGPU + Nvidia GPU hardware acceleration

Board footer