Compute shader on Intel GPU wrong vs. Nvidia GPU right - driver issue?

scippie · 2024-03-26 19:52:49

I am quite certain that I have discovered a bug in the Intel GPU driver.
Drivers and things like that, I know not much about. I am always glad that installing Arch succeeds but I don't even know what the driver is named, is it Mesa? SGI?
So I certainly don't know where to notify this bug if it is actually a real bug.

So here is what I am doing: I am writing a compute shader that draws an anti-aliased thick line on a texture layer.
I based my code on the compute shader code on the info on http://members.chello.at/easyfilter/bresenham.html (last algorithm).

At first, I wrote it cleanly like I would write a compute shader myself. It failed, very strange graphical results.
So I rewrote it, thinking I had done something wrong. It failed again, with the same result.
So I started debugging deeper and deeper and at some point, I noticed that commenting one pixel writing line changed the result to something almost correct (except for the missing pixels that is then).
So, as a last try, I simply copied the code and adapted it to shader code. It is absolutely ugly, but again, it has the same result.
Then I did something out-of-the-box: I ran my code with prime-run (so it runs on the Nvidia driver) to see if it was a driver issue, and... suddenly the code worked fine!
So... I am quite certain that there is a bug in the Intel GPU driver. I think it over-optimizes something it shouldn't. I even added volatile to make sure no caching was used while I am quite sure the code can't use it anyway.

Description of what it does wrong: the line which I am trying to draw from (20, 20) to (250, 150) is drawn completely wrong, it looks more like it is drawn to from (20, 20) to (50, 150) and it misses lots of pixels (of the ones drawn in the last part of the code).
I marked in the code where I think something gets optimized wrong, but I am certainly not sure about that because I don't know what it actually makes of my code.
I also tried replacing the setPixelAA function with the setPixel function (also provided) that doesn't read from the pixel buffer (for blending) to rule that out.

To get a second opinion, here is my code.

The glsl compute shader:

#version 430 // Changing this to 430 core, 460, ... doesn't help

layout(local_size_x = 1) in;
layout(rgba8, binding = 0) uniform volatile image2D img; // Volatile is not necessary imo. but it was something I tried to make it better

void setPixel(ivec2 pos)
{
  imageStore(img, pos, vec4(1));
}

void setPixelAA(ivec2 pos, float a)
{
  vec4 cS = vec4(1);
  vec4 cD = imageLoad(img, pos);
  vec4 c = mix(cD, cS, a);
  imageStore(img, pos, c);
}

void drawLineW(ivec2 p0, ivec2 p1, float wd)
{
  ivec2 d = ivec2(abs(p1.x - p0.x), abs(p1.y - p0.y));
  ivec2 s = ivec2(sign(p1.x - p0.x), sign(p1.y - p0.y));
  int err = d.x - d.y, e2, x2, y2;
  float ed = (d.x + d.y == 0)? 1.0: length(vec2(d));

  int x0 = p0.x;
  int y0 = p0.y;
  int x1 = p1.x;
  int y1 = p1.y;
  wd = (wd + 1) / 2;
  for (; ; )
  {
    setPixelAA(ivec2(x0, y0), max(0.0, 1.0 - abs(err - d.x + d.y) / ed - wd + 1.0));
    e2 = err; x2 = x0;
    if (2 * e2 >= -d.x)
    {
      e2 += d.y;
      y2 = y0;
      for (; e2 < ed * wd && (p1.y != y2 || d.x > d.y); e2 += d.x)
      {
        setPixelAA(ivec2(x0, y2), 1.0 - max(0.0, abs(e2) / ed - wd + 1.0)); // Commenting this line fixes it
        y2 += s.y; // When the previous line is not commented, I think this gets optimized badly
      }
      if (x0 == x1) break;
      e2 = err; err -= d.y; x0 += s.x;
    }
    if (2 * e2 <= d.y)
    {
      for (e2 = d.x - e2; e2 < ed * wd && (p1.x != x2 || d.x < d.y); e2 += d.y)
      {
        setPixelAA(ivec2(x2, y0), 1.0 - max(0.0, abs(e2) / ed - wd + 1.0));
        x2 += s.x;
      }
      if (y0 == y1) break;
      err += d.x; y0 += s.y;
    }
  }
}

void main()
{
  ivec2 p0 = ivec2(20, 20);
  ivec2 p1 = ivec2(250, 150);
  drawLineW(p0, p1, 5.0);
}

This code is terrible like this, I originally wrote nice code like I said, but I thought I was rewriting the code wrong at first. That's why it's now this ugly almost copy/paste.

OpenGL code is written in Nim, but that's not important I think.
It simply initiates one invocation of the code which has only one local invocation (as coded in the shader).

      glUseProgram(progGui)
      glBindImageTexture(0, texWindow, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA8);
      glDispatchCompute(1, 1, 1)
      glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT)

      # ... then the texture is rendered to the screen with a copy shader

The texture is created correctly, rendering to the screen is correct, ... has all been proven because I have drawn other things on it without any issue, even preloaded the texture with a colored background in the code that creates the texture.

Unless I am doing something really stupid that by accident gets accepted by the Nvidia (propriety) driver but not Intel driver (but all other previous tests worked just fine and this is certainly not my first shader), my actual question is: can I report it somewhere? How do I do that? Where do I do that?

Update: I compiled my shader with glslang to create a SPIR-V binary (and changed my code to use it), hoping that making a unified binary would solve it. It doesn't, so it feels like this is an issue after compilation?
Update: I just tested it on an AMD GPU and it works fine too.
Update: Finally I was able to make a build for Windows, and in Windows, on the same machine, I don't have the issue, so it definitely is a bug in the Linux Intel GPU driver.

Last edited by scippie (2024-03-28 13:45:27)

scippie · 2024-03-28 15:26:18

FYI: https://bbs.archlinux.org/viewtopic.php?id=294315

I have submitted a bug report.

Arch Linux

#1 2024-03-26 19:52:49

Compute shader on Intel GPU wrong vs. Nvidia GPU right - driver issue?

#2 2024-03-28 15:26:18

Re: Compute shader on Intel GPU wrong vs. Nvidia GPU right - driver issue?

Board footer