You are not logged in.

#1 2023-05-25 14:04:38

uncharted
Member
Registered: 2023-05-25
Posts: 4

Dedicated GPU is slower than integrated GPU

The whole investigation began with me attempting to get my browser to use hardware-accelerated video decoding. After my research, I settled on these CLI flags for my Brave-Beta browser.

 1.53.65 Chromium: 114.0.5735.35 (Official Build) beta (64-bit) 
 --ignore-gpu-blocklist --enable-features=VaapiVideoDecoder,VaapiVideoEncoder,VaapiVideoDecodeLinuxGL,VaapiIgnoreDriverChecks --disable-features=UseChromeOSDirectVideoDecoder,UseSkiaRenderer

I was able to get hardware accelerated decoding using this, but the stats in nvtop showed that I was using Integrated GPU only.
When launching browser with prefix `DRI_PRIME=1`, I was not able to get accelerated video decoding - although, the brave://gpu was misleading by giving a list of supported codecs in the Video Acceleration Information section

In the driver information section of brave://gpu, I was always seeing that gpu0 to be the active one irrespective of DRI_PRIME=1 variable:

GPU0
VENDOR= 0x1002, DEVICE=0x1638, DRIVER_VENDOR=Mesa, DRIVER_VERSION=23.1.0 *ACTIVE*
GPU1
VENDOR= 0x1002, DEVICE=0x7340

These are the details of my GPU

$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.3.3-arch1-1) (0x1638)
    Version: 23.1.0
    Accelerated: yes
    Video memory: 512MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 43 MB, largest block: 43 MB
    VBO free aux. memory - total: 7491 MB, largest block: 7491 MB
    Texture free memory - total: 43 MB, largest block: 43 MB
    Texture free aux. memory - total: 7491 MB, largest block: 7491 MB
    Renderbuffer free memory - total: 43 MB, largest block: 43 MB
    Renderbuffer free aux. memory - total: 7491 MB, largest block: 7491 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 512 MB
    Total available memory: 8176 MB
    Currently available dedicated video memory: 43 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.3.3-arch1-1)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 23.1.0
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.1.0
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 23.1.0
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
$ DRI_PRIME=1 glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon RX 5500M (navi14, LLVM 15.0.7, DRM 3.52, 6.3.3-arch1-1) (0x7340)
    Version: 23.1.0
    Accelerated: yes
    Video memory: 4096MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 4076 MB, largest block: 4076 MB
    VBO free aux. memory - total: 7639 MB, largest block: 7639 MB
    Texture free memory - total: 4076 MB, largest block: 4076 MB
    Texture free aux. memory - total: 7639 MB, largest block: 7639 MB
    Renderbuffer free memory - total: 4076 MB, largest block: 4076 MB
    Renderbuffer free aux. memory - total: 7639 MB, largest block: 7639 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 4096 MB
    Total available memory: 11760 MB
    Currently available dedicated video memory: 4076 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon RX 5500M (navi14, LLVM 15.0.7, DRM 3.52, 6.3.3-arch1-1)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 23.1.0
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.1.0
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 23.1.0
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
$ xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x54 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 4 outputs: 2 associated providers: 1 name:AMD Radeon Graphics @ pci:0000:09:00.0
Provider 1: id: 0x83 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 5 outputs: 1 associated providers: 1 name:AMD Radeon RX 5500M @ pci:0000:03:00.0

$ xrandr --setprovideroffloadsink 1 0

I benchmarked both of them using glmark2

$ glmark2
=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.3.3-arch1-1)
    GL_VERSION:     4.6 (Compatibility Profile) Mesa 23.1.0
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 14209 FrameTime: 0.070 ms
[build] use-vbo=true: FPS: 14772 FrameTime: 0.068 ms
[texture] texture-filter=nearest: FPS: 12228 FrameTime: 0.082 ms
[texture] texture-filter=linear: FPS: 12336 FrameTime: 0.081 ms
[texture] texture-filter=mipmap: FPS: 11861 FrameTime: 0.084 ms
[shading] shading=gouraud: FPS: 10293 FrameTime: 0.097 ms
[shading] shading=blinn-phong-inf: FPS: 10059 FrameTime: 0.099 ms
[shading] shading=phong: FPS: 10063 FrameTime: 0.099 ms
[shading] shading=cel: FPS: 9732 FrameTime: 0.103 ms
[bump] bump-render=high-poly: FPS: 6928 FrameTime: 0.144 ms
[bump] bump-render=normals: FPS: 15724 FrameTime: 0.064 ms
[bump] bump-render=height: FPS: 15793 FrameTime: 0.063 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 9289 FrameTime: 0.108 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 4449 FrameTime: 0.225 ms
[pulsar] light=false:quads=5:texture=false: FPS: 11087 FrameTime: 0.090 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 4477 FrameTime: 0.223 ms
[desktop] effect=shadow:windows=4: FPS: 8617 FrameTime: 0.116 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 2319 FrameTime: 0.431 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 3028 FrameTime: 0.330 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 3342 FrameTime: 0.299 ms
[ideas] speed=duration: FPS: 9972 FrameTime: 0.100 ms
[jellyfish] <default>: FPS: 6455 FrameTime: 0.155 ms
[terrain] <default>: FPS: 529 FrameTime: 1.892 ms
[shadow] <default>: FPS: 9016 FrameTime: 0.111 ms
[refract] <default>: FPS: 865 FrameTime: 1.157 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 12098 FrameTime: 0.083 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 11913 FrameTime: 0.084 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 12271 FrameTime: 0.081 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 11828 FrameTime: 0.085 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 11925 FrameTime: 0.084 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 11877 FrameTime: 0.084 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 11928 FrameTime: 0.084 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 11907 FrameTime: 0.084 ms
=======================================================
                                  glmark2 Score: 9489
=======================================================
$ DRI_PRIME=1 glmark2
=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon RX 5500M (navi14, LLVM 15.0.7, DRM 3.52, 6.3.3-arch1-1)
    GL_VERSION:     4.6 (Compatibility Profile) Mesa 23.1.0
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 3054 FrameTime: 0.328 ms
[build] use-vbo=true: FPS: 3554 FrameTime: 0.281 ms
[texture] texture-filter=nearest: FPS: 3541 FrameTime: 0.282 ms
[texture] texture-filter=linear: FPS: 3568 FrameTime: 0.280 ms
[texture] texture-filter=mipmap: FPS: 3538 FrameTime: 0.283 ms
[shading] shading=gouraud: FPS: 3569 FrameTime: 0.280 ms
[shading] shading=blinn-phong-inf: FPS: 3567 FrameTime: 0.280 ms
[shading] shading=phong: FPS: 3567 FrameTime: 0.280 ms
[shading] shading=cel: FPS: 3568 FrameTime: 0.280 ms
[bump] bump-render=high-poly: FPS: 3576 FrameTime: 0.280 ms
[bump] bump-render=normals: FPS: 3568 FrameTime: 0.280 ms
[bump] bump-render=height: FPS: 3539 FrameTime: 0.283 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3507 FrameTime: 0.285 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 3568 FrameTime: 0.280 ms
[pulsar] light=false:quads=5:texture=false: FPS: 3557 FrameTime: 0.281 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 3563 FrameTime: 0.281 ms
[desktop] effect=shadow:windows=4: FPS: 3567 FrameTime: 0.280 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 1923 FrameTime: 0.520 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 2731 FrameTime: 0.366 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 2614 FrameTime: 0.383 ms
[ideas] speed=duration: FPS: 3553 FrameTime: 0.282 ms
[jellyfish] <default>: FPS: 3560 FrameTime: 0.281 ms
[terrain] <default>: FPS: 1812 FrameTime: 0.552 ms
[shadow] <default>: FPS: 3553 FrameTime: 0.282 ms
[refract] <default>: FPS: 3384 FrameTime: 0.296 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 3564 FrameTime: 0.281 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 3560 FrameTime: 0.281 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 3565 FrameTime: 0.281 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 3567 FrameTime: 0.280 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 3570 FrameTime: 0.280 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 3566 FrameTime: 0.280 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 3561 FrameTime: 0.281 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 3569 FrameTime: 0.280 ms
=======================================================
                                  glmark2 Score: 3381
=======================================================


As you can see, the integrated GPU is showing 3x more score. I am not entirely sure if I should take these scores at their face value or if there are more caveats here, but I feel something fishy with the performance.
It would be great if someone can help me with fixing this performance issue and also getting my chromium browsers use dedicated GPU with hardware accelerated video decoding.

Offline

#2 2023-05-25 14:22:00

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,431

Re: Dedicated GPU is slower than integrated GPU

You generally want hardware accelerated video decoding to use the integrated GPU regardless, since the biggest win of hardware accelerated video decoding is the roundtrip avoidance for the display of the video, which you cant avoid if you render on the display-less dedicated card.

You don't have a perceivable performance issue, you have a measured performance issue. But that's not because you have a performance issue but because the glmark2 benchmark is so low level and factually irrelevant for modern use cases that the fact that you have to do a display copy is so immensely expensive that you lose frames during the copy. Optimus and optimus like systems and the use of the dedicated card makes only sense if you actually require the added performance because the integrated is too slow to render, but not too slow to display a rendered image. You need to test actual workloads that make the integrated card actually struggle to see the actual benefit of the added dedicated GPU. Rendering thousands upon thousands of frames you aren't going to see is not really usable.  Pick an actuall taxing benchmark (or rather don't worry about benchmarks, use the thing, run a game with and without prime-run and use whichever card performs better on a case to case basis)

Offline

#3 2023-05-25 15:04:24

uncharted
Member
Registered: 2023-05-25
Posts: 4

Re: Dedicated GPU is slower than integrated GPU

V1del wrote:

You generally want hardware accelerated video decoding to use the integrated GPU regardless, since the biggest win of hardware accelerated video decoding is the roundtrip avoidance for the display of the video, which you cant avoid if you render on the display-less dedicated card.

You don't have a perceivable performance issue, you have a measured performance issue. But that's not because you have a performance issue but because the glmark2 benchmark is so low level and factually irrelevant for modern use cases that the fact that you have to do a display copy is so immensely expensive that you lose frames during the copy. Optimus and optimus like systems and the use of the dedicated card makes only sense if you actually require the added performance because the integrated is too slow to render, but not too slow to display a rendered image. You need to test actual workloads that make the integrated card actually struggle to see the actual benefit of the added dedicated GPU. Rendering thousands upon thousands of frames you aren't going to see is not really usable.  Pick an actuall taxing benchmark (or rather don't worry about benchmarks, use the thing, run a game with and without prime-run and use whichever card performs better on a case to case basis)


I believe you are correct when you say that I am obsessing over numbers when I won't actually require anything close to the artificial benchmarks' displayed fps exceeding 13000. To be quite honest, in my case, video decoding in the browser at 1440p 60 fps or occasionally 4K 60 fps is probably how I will use the GPU the most on Linux. My attempts to force the use of my dedicated GPU for video rendering on Linux are mostly motivated by my observations of Windows 11. When I watch this 8K clip at various resolutions on the same laptop while running Windows 11 on the other partition, the playback stutters for resolutions higher than 1080p 60 fps when forcing the graphics setting to use an integrated GPU there - strangely, the default behaviour is using the dedicated GPU, or I might have changed this settling long time back.

The north star I am chasing is to make my laptop not stutter while rendering a 4k video and  do it without completely taxing the CPU.


UPDATE: I did a thorough test using this benchmarking software: basemark. I tested with both graphics APIs, Vulkan 1.0 and OpenGL 4.5, with both of the GPUs using custom 1920*1280 resolution and high content quality, and everything else left to default. My assumption that the integrated GPU is performing better than the dedicated one based on GLMark2 was proved wrong. Dedicated GPU indeed perform better than integrated unless there is some configuration issue.

Here are the results for reference:

Vulkan, RX 550M - C8409
Vulkan, Radeon Integrated - C2711

OpenGL, RX 550M - C9574
OpenGL, Radeon Integrated - C2694

Screenshot-20230525-225734.jpg

Last edited by uncharted (2023-05-25 19:05:18)

Offline

Board footer

Powered by FluxBB