You are not logged in.

#451 2025-02-03 15:28:58

lpr1
Member
Registered: 2017-10-08
Posts: 109

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
AMDGPU testing

Build: linux-amdgpu-testing-6.13.1.arch1-4, linux-amdgpu-testing-headers-6.13.1.arch1-4
Included patches:
- Remove GFXOFF/PG workaround from amdgpu_amdkfd
- Remove GFXOFF workarounds from gfx part
- Remove GFXOFF workaroud from amdgpu_dpm
- Remove GFXOFF workarounds from sdma
- Deny GFXOFF for amdgpu_job operations
- Deny GFXOFF for amdgpu_ring operations
- Switch PG to UNGATE before ringing doorbell for AMDGPU_RING_TYPE_COMPUTE (rework of the stable patch)

Build: linux-amdgpu-testing-6.13.1.arch1-5, linux-amdgpu-testing-headers-6.13.1.arch1-5
Included patches:
- Remove GFXOFF/PG workaround from amdgpu_amdkfd
- Remove GFXOFF workarounds from gfx part
- Remove GFXOFF workaroud from amdgpu_dpm
- Remove GFXOFF workarounds from sdma
- Deny GFXOFF for amdgpu_job operations
- Deny GFXOFF for amdgpu_ring operations
- Switch PG to UNGATE before ringing doorbell for AMDGPU_RING_TYPE_COMPUTE (rework of the stable patch)
- Remove 100ms delay from amdgpu_gfx_off_ctrl (from stable patches)

Build: linux-amdgpu-testing-6.13.1.arch1-6, linux-amdgpu-testing-headers-6.13.1.arch1-6
Included patches:
- Remove GFXOFF/PG workaround from amdgpu_amdkfd
- Remove GFXOFF workarounds from gfx part
- Remove GFXOFF workaroud from amdgpu_dpm
- Remove GFXOFF workarounds from sdma
- Deny GFXOFF for amdgpu_ring operations only

GitHub

What to check:
- stability
If stable:
- temperature
- average package power (sudo turbostat -s PkgWatt)
- performance - for example WebGL Aquarium fps for different amount of fish

Kernel option to keep during testing period: fsck.mode=force

laikm from pacoandres for reporting issues

Both kernels are freezing my system instantly as soon as GDM is loaded? I must be doing something wrong, also why is pahole a dependency for headers?

I'll try to add new entry for testing kernel and see how it goes, but amdgpu.ppfeaturemask=0xfff73fff definitively solves the issue.

Offline

#452 2025-02-03 15:51:07

lpr1
Member
Registered: 2017-10-08
Posts: 109

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

All 3 test kernels from the last post freeze the system instantly.

title   AMDGPU TESTING (linux)
linux   /vmlinuz-linux-amdgpu-testing
initrd  /initramfs-linux-amdgpu-testing.img
options root=PARTUUID=x-x-x-x rw rootfstype=ext4

Nothing unusual, this error is always present for some reason:
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: Secure display: Generic Failure.
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0

EDIT: Oh I see edit about glibc update.
EDIT1: It freezes with glibc 2.40 at GDM load.

Last edited by lpr1 (2025-02-03 16:05:41)

Offline

#453 2025-02-03 16:15:15

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

AMDGPU stable

glibc-2.41
Build: linux-amdgpu-stable-6.13.1.arch1-3, linux-amdgpu-stable-headers-6.13.1.arch1-3
Included patches:
- Remove 100ms delay from amdgpu_gfx_off_ctrl
- Add PG to amdgpu_gfx_enforce_isolation_ring_begin_use/end_use

GitHub

Offline

#454 2025-02-03 17:00:44

lpr1
Member
Registered: 2017-10-08
Posts: 109

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
AMDGPU stable

glibc-2.41
Build: linux-amdgpu-stable-6.13.1.arch1-3, linux-amdgpu-stable-headers-6.13.1.arch1-3
Included patches:
- Remove 100ms delay from amdgpu_gfx_off_ctrl
- Add PG to amdgpu_gfx_enforce_isolation_ring_begin_use/end_use

GitHub

Ok, this one loads fine, I guess I rolled back to the wrong 2.40 version (since there are multiple). Should I test this one?

Offline

#455 2025-02-03 17:02:00

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

lpr1 wrote:
Mechanicus wrote:
AMDGPU stable

glibc-2.41
Build: linux-amdgpu-stable-6.13.1.arch1-3, linux-amdgpu-stable-headers-6.13.1.arch1-3
Included patches:
- Remove 100ms delay from amdgpu_gfx_off_ctrl
- Add PG to amdgpu_gfx_enforce_isolation_ring_begin_use/end_use

GitHub

Ok, this one loads fine, I guess I rolled back to the wrong 2.40 version (since there are multiple). Should I test this one?

Yes. This is just a rebuild of stable patches on top of Linux 6.13.1 and with new glibc 2.41+r2+g0a7c7a3e283a-1.

Last edited by Mechanicus (2025-02-03 17:02:48)

Offline

#456 2025-02-03 17:05:08

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

lpr1 wrote:

All 3 test kernels from the last post freeze the system instantly.

title   AMDGPU TESTING (linux)
linux   /vmlinuz-linux-amdgpu-testing
initrd  /initramfs-linux-amdgpu-testing.img
options root=PARTUUID=x-x-x-x rw rootfstype=ext4

Nothing unusual, this error is always present for some reason:
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: Secure display: Generic Failure.
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0

EDIT: Oh I see edit about glibc update.
EDIT1: It freezes with glibc 2.40 at GDM load.

Just to clarify: you installed those kernels after today's system update (that comes with new glibc and gcc), or before that?

Offline

#457 2025-02-03 17:26:01

lpr1
Member
Registered: 2017-10-08
Posts: 109

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
lpr1 wrote:

All 3 test kernels from the last post freeze the system instantly.

title   AMDGPU TESTING (linux)
linux   /vmlinuz-linux-amdgpu-testing
initrd  /initramfs-linux-amdgpu-testing.img
options root=PARTUUID=x-x-x-x rw rootfstype=ext4

Nothing unusual, this error is always present for some reason:
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: Secure display: Generic Failure.
16:39:10 archlinux kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0

EDIT: Oh I see edit about glibc update.
EDIT1: It freezes with glibc 2.40 at GDM load.

Just to clarify: you installed those kernels after today's system update (that comes with new glibc and gcc), or before that?

Probably, since I had to rollback to 2.40, I don't know when they got updated since I didn't pay attention, but I rolled back to glibc-2.40-1-x86_64.pkg.tar.zst 22-Jul-2024 16:52 10M from archive, so probably the wrong version, since there are newer (glibc-2.40+r16+gaa533d58ff-1-x86_64.pkg.tar.zst 03-Aug-2024 16:49 10M). The linux-amdgpu-stable-6.13.1.arch1-3-x86_64 loads fine with glibc 2.41 and I'm testing it now, so far it's good, but will see over time.

Offline

#458 2025-02-03 17:29:35

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

lpr1 wrote:
Mechanicus wrote:

Just to clarify: you installed those kernels after today's system update (that comes with new glibc and gcc), or before that?

Probably, since I had to rollback to 2.40, I don't know when they got updated since I didn't pay attention, but I rolled back to glibc-2.40-1-x86_64.pkg.tar.zst 22-Jul-2024 16:52 10M from archive, so probably the wrong version, since there are newer (glibc-2.40+r16+gaa533d58ff-1-x86_64.pkg.tar.zst 03-Aug-2024 16:49 10M). The linux-amdgpu-stable-6.13.1.arch1-3-x86_64 loads fine with glibc 2.41 and I'm testing it now, so far it's good, but will see over time.

Well... I asked because I tested a bit all three builds on my Raven machine. And all of them seemed stable.

Last edited by Mechanicus (2025-02-03 19:55:08)

Offline

#459 2025-02-03 17:59:16

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
lpr1 wrote:
Mechanicus wrote:

Just to clarify: you installed those kernels after today's system update (that comes with new glibc and gcc), or before that?

Probably, since I had to rollback to 2.40, I don't know when they got updated since I didn't pay attention, but I rolled back to glibc-2.40-1-x86_64.pkg.tar.zst 22-Jul-2024 16:52 10M from archive, so probably the wrong version, since there are newer (glibc-2.40+r16+gaa533d58ff-1-x86_64.pkg.tar.zst 03-Aug-2024 16:49 10M). The linux-amdgpu-stable-6.13.1.arch1-3-x86_64 loads fine with glibc 2.41 and I'm testing it now, so far it's good, but will see over time.

Well... I asked because I tested a bit all three build on my Raven machine. And all of them seemed stable.

I've tested testing 1-3 and 1-4 before and after the glibc update (I've noticed there was an update after the first test) and the system got freeze in less than one minute with the two releases and glib versions 2.40 and 2.41.

Now I'm testing 6.13.1-arch1-3-amdgpu-stable and I think it's good.
Temp and power are similar to repo kernel with patched mesa.

All tests have been passed, except the recovery. I'll try this one when stop working.

EDIT: I've forgotten to post the aquarium results (I'm using a 75Hz monitor):

  • 5,000 fishes: 75 fps, 38W, 55ºC

  • 10,000 fishes: 65fps, 46W, 66ºC

  • 15,000 fishes: 45fps, 46W, 68ºC

  • 20,000 fishes: 35fps, 46W, 68ºC

  • 25,000 fishes: 30fps, 46W, 68ºC

  • 30,000 fishes: 25 fps, 46W, 68ºC

Last edited by pacoandres (2025-02-03 18:12:34)

Offline

#460 2025-02-03 18:14:00

Sora
Member
Registered: 2025-01-15
Posts: 1

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Sorry for the rough post.. but. I've been following this because of my own AMD cards and iGPU crashing.. here is my own experience.
I have come to the conclusion its the AMD dreaded "reset bug". GFXOFF patch wont work, because you're re-enabling it, which is still able to trigger the "power off/low power" portion of the code resulting in the bug.

I turned off GFXOFF in my bios (ASRock B650E Taichi [non-lite]) and have had 100% stability since (Uptime is a little over 9 hours which is 8 hours above what I could get, with any attempts previously) with all of my crashing attempts (that were reproducible failing to trigger it) with the current released arch kernel. I've tried to crash my 7900XTX and my Raphael iGPU (7900X3D) they have had no instability since turning this feature off.

The reason I've come to the conclusion its the "reset bug" is that it was added, but turned off in ~2018. We had no issues.
It was recently turned on in 6.8 ~8 months ago. I had 4 months roughly of stability prior to the crashing being introduced due to a lack of updated bios, I updated my bios and began crashing. I bought a new PSU and even went as far as changing power outlets thinking the power was just to dirty/unstable. It got better but was inconsistent. After a kernel update I kept getting random power off (resets. not full power off) or black screens it was actually the GPU blacking out.. which is exactly the same symptoms I had trying to use a Windows 10 VM (black screen / host crashing with AMD gpu's going back through POST) I bought a WX3100 work card from ebay and had same results. Constant instability (even with iGPU disabled)

After disabling GFXOFF in the bios, all of the instability and my own repeatable reproducible crashes have stopped functioning. I've played games, youtube, watched like 5 videos via MPV at the same time (while skimming 1+ hour long videos), CTRL + PAGEDOWN to tab swap rapidly in firefox and haven't frozen, or crashed/blackscreen.

Linux aki 6.12.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 18 Jan 2025 02:26:57 +0000 x86_64 GNU/Linux
13:00:23 up  9:03,  1 user,  load average: 5.31, 5.12, 5.46

Source for my own thoughts:
https://www.phoronix.com/search/GFXOFF
2018 - Default Off, Tested and caused issues
2024-01-19 - Enabled, crashing/instability.

*Editing up time, because I read wrong number

Last edited by Sora (2025-02-03 18:22:04)

Offline

#461 2025-02-03 18:57:21

NuSkool
Member
Registered: 2015-03-23
Posts: 205

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:

Well... I asked because I tested a bit all three build on my Raven machine. And all of them seemed stable.

Do you have a Vega machine for testing?

I'm testing on a refurb 'HP EliteDesk 705 G4', which only costs a little over $100.00 atm.
They're readily available on Amazon for example:  https://www.amazon.com/HP-EliteDesk-Min … NB7RD?th=1

Would having a Vega based system streamline testing at this point, would you like one for testing, and be worth it at this stage?

Perhaps people could donate to get you one for free?
I'd be willing to throw in ~$25 to get you setup if 4-5 more people would step in.

Anyone else?

Offline

#462 2025-02-03 19:37:59

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:
Mechanicus wrote:

Well... I asked because I tested a bit all three build on my Raven machine. And all of them seemed stable.

Do you have a Vega machine for testing?

Anyone else?

I have Athlon 200GE with Vega 3. I perform some tests on it while it is not in use. Also I think it is better to have wider range of AMD chips and not focus on Vega only.
Why? - Because the fix proposed to kernel is GFX9-only. But the instability issues are related not to Vega only family.
My goal is to eliminate the root cause, not the consequences.

Last edited by Mechanicus (2025-02-03 20:17:34)

Offline

#463 2025-02-03 19:39:03

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

pacoandres wrote:

Now I'm testing 6.13.1-arch1-3-amdgpu-stable and I think it's good.
Temp and power are similar to repo kernel with patched mesa.

All tests have been passed, except the recovery. I'll try this one when stop working.

EDIT: I've forgotten to post the aquarium results (I'm using a 75Hz monitor):

  • 5,000 fishes: 75 fps, 38W, 55ºC

  • 10,000 fishes: 65fps, 46W, 66ºC

  • 15,000 fishes: 45fps, 46W, 68ºC

  • 20,000 fishes: 35fps, 46W, 68ºC

  • 25,000 fishes: 30fps, 46W, 68ºC

  • 30,000 fishes: 25 fps, 46W, 68ºC

The recovery test is passed, it takes less than a second to recover and everything works. After this test I've repeat some more and everything works without freezes.
A more detailed description with logs here.

Last edited by pacoandres (2025-02-03 19:40:03)

Offline

#464 2025-02-03 19:46:47

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

AMDGPU testing

glibc-2.41

Build: linux-amdgpu-testing-6.13.1.arch1-7, linux-amdgpu-testing-headers-6.13.1.arch1-7
Included patches:
- Remove 100ms delay from amdgpu_gfx_off_ctrl
- Deny GFXOFF in amdgpu_ring_commit for GFX IP (with PG workaround)

GitHub

What to check:
- stability
If stable:
- temperature
- average package power (sudo turbostat -s PkgWatt)
- performance - for example WebGL Aquarium fps for different amount of fish

Kernel option to keep during testing period: fsck.mode=force

laikm from pacoandres for reporting issues

Last edited by Mechanicus (2025-02-04 13:53:40)

Offline

#465 2025-02-03 20:06:56

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:
Mechanicus wrote:

Well... I asked because I tested a bit all three build on my Raven machine. And all of them seemed stable.

Do you have a Vega machine for testing?

I'm testing on a refurb 'HP EliteDesk 705 G4', which only costs a little over $100.00 atm.
They're readily available on Amazon for example:  https://www.amazon.com/HP-EliteDesk-Min … NB7RD?th=1

Would having a Vega based system streamline testing at this point, would you like one for testing, and be worth it at this stage?

Perhaps people could donate to get you one for free?
I'd be willing to throw in ~$25 to get you setup if 4-5 more people would step in.

Anyone else?

This proposal is very kind of You. But Amazon doesn't ship anything to Ukraine directly. Through this "black hole" only our US managers can laundry billions. neutral

Offline

#466 2025-02-03 22:13:27

lpr1
Member
Registered: 2017-10-08
Posts: 109

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
lpr1 wrote:
Mechanicus wrote:

Just to clarify: you installed those kernels after today's system update (that comes with new glibc and gcc), or before that?

Probably, since I had to rollback to 2.40, I don't know when they got updated since I didn't pay attention, but I rolled back to glibc-2.40-1-x86_64.pkg.tar.zst 22-Jul-2024 16:52 10M from archive, so probably the wrong version, since there are newer (glibc-2.40+r16+gaa533d58ff-1-x86_64.pkg.tar.zst 03-Aug-2024 16:49 10M). The linux-amdgpu-stable-6.13.1.arch1-3-x86_64 loads fine with glibc 2.41 and I'm testing it now, so far it's good, but will see over time.

Well... I asked because I tested a bit all three builds on my Raven machine. And all of them seemed stable.

Got it, ye, I got instant freeze on all 3 kernels and it seems it's not related to the glibc version as @pacoandres experienced similar thing. I also do not think this is related to workarounds anyway, unless those patches are relatively recent ones, but I don't know. I see new builds, so what are we supposed to test now?

Offline

#467 2025-02-03 22:51:06

Horo86
Member
Registered: 2025-01-25
Posts: 2

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:
Mechanicus wrote:

Well... I asked because I tested a bit all three build on my Raven machine. And all of them seemed stable.

Do you have a Vega machine for testing?

I'm testing on a refurb 'HP EliteDesk 705 G4', which only costs a little over $100.00 atm.
They're readily available on Amazon for example:  https://www.amazon.com/HP-EliteDesk-Min … NB7RD?th=1

Would having a Vega based system streamline testing at this point, would you like one for testing, and be worth it at this stage?

Perhaps people could donate to get you one for free?
I'd be willing to throw in ~$25 to get you setup if 4-5 more people would step in.

Anyone else?

I've exactly this same machine with a Ryzen 2400G (Vega 11). Maybe I can help in some way?
Problem is that I do not have lot of time being out of home almost all day...

Offline

#468 2025-02-03 23:55:28

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

lpr1 wrote:
Mechanicus wrote:
lpr1 wrote:

Probably, since I had to rollback to 2.40, I don't know when they got updated since I didn't pay attention, but I rolled back to glibc-2.40-1-x86_64.pkg.tar.zst 22-Jul-2024 16:52 10M from archive, so probably the wrong version, since there are newer (glibc-2.40+r16+gaa533d58ff-1-x86_64.pkg.tar.zst 03-Aug-2024 16:49 10M). The linux-amdgpu-stable-6.13.1.arch1-3-x86_64 loads fine with glibc 2.41 and I'm testing it now, so far it's good, but will see over time.

Well... I asked because I tested a bit all three builds on my Raven machine. And all of them seemed stable.

Got it, ye, I got instant freeze on all 3 kernels and it seems it's not related to the glibc version as @pacoandres experienced similar thing. I also do not think this is related to workarounds anyway, unless those patches are relatively recent ones, but I don't know. I see new builds, so what are we supposed to test now?

linux-amdgpu-testing-6.13.1.arch1-7 and 8 aimed to find the actual place where the problem starts.

Last edited by Mechanicus (2025-02-03 23:59:09)

Offline

#469 2025-02-04 03:30:19

NuSkool
Member
Registered: 2015-03-23
Posts: 205

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I was all ready to write up a 28hr report on linux-amdgpu-stable 6.13.1.arch1-2
Unfortunately, I've been wasting all this time retesting linux-amdgpu 6.13.arch1-2!
A 'uname -rs' saved me from giving a false report........

I'll move onto testing linux-amdgpu-stable 6.13.1.arch1-2 unless I hear otherwise.

EDIT:
Preliminary report on:

pacman -Q linux-amdgpu-stable : linux-amdgpu-stable 6.13.1.arch1-2
uname -rs                                         : Linux 6.13.1-arch1-2-amdgpu-stable
pacman -Q glibc                              : glibc 2.40+r66+g7d4b6bcae91f-1
cat /proc/cmdline                            : .... rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force

Power usage at idle desktop w/ chromium            129F     2.62W
WebGL Aquarium Chromium :  5000 fish   60fps  154F   13.12W
                                                       10000 fish  60fps  159F   17.39W
                                                       20000 fish  40fps  162F   17.19W
                                                       30000 fish  31fps  169F   17.95W
Note: Firefox: ~ 8-10 fps less

Only about an hour into this test, seems stable enough to do some testing, but  I'll add more results in about 24hr.

Last edited by NuSkool (2025-02-04 05:05:15)

Offline

#470 2025-02-04 07:32:23

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Sora wrote:

Sorry for the rough post.. but. I've been following this because of my own AMD cards and iGPU crashing.. here is my own experience.
I have come to the conclusion its the AMD dreaded "reset bug". GFXOFF patch wont work, because you're re-enabling it, which is still able to trigger the "power off/low power" portion of the code resulting in the bug.

I turned off GFXOFF in my bios (ASRock B650E Taichi [non-lite]) and have had 100% stability since (Uptime is a little over 9 hours which is 8 hours above what I could get, with any attempts previously) with all of my crashing attempts (that were reproducible failing to trigger it) with the current released arch kernel. I've tried to crash my 7900XTX and my Raphael iGPU (7900X3D) they have had no instability since turning this feature off.

The reason I've come to the conclusion its the "reset bug" is that it was added, but turned off in ~2018. We had no issues.
It was recently turned on in 6.8 ~8 months ago. I had 4 months roughly of stability prior to the crashing being introduced due to a lack of updated bios, I updated my bios and began crashing. I bought a new PSU and even went as far as changing power outlets thinking the power was just to dirty/unstable. It got better but was inconsistent. After a kernel update I kept getting random power off (resets. not full power off) or black screens it was actually the GPU blacking out.. which is exactly the same symptoms I had trying to use a Windows 10 VM (black screen / host crashing with AMD gpu's going back through POST) I bought a WX3100 work card from ebay and had same results. Constant instability (even with iGPU disabled)

After disabling GFXOFF in the bios, all of the instability and my own repeatable reproducible crashes have stopped functioning. I've played games, youtube, watched like 5 videos via MPV at the same time (while skimming 1+ hour long videos), CTRL + PAGEDOWN to tab swap rapidly in firefox and haven't frozen, or crashed/blackscreen.

Linux aki 6.12.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 18 Jan 2025 02:26:57 +0000 x86_64 GNU/Linux
13:00:23 up  9:03,  1 user,  load average: 5.31, 5.12, 5.46

Source for my own thoughts:
https://www.phoronix.com/search/GFXOFF
2018 - Default Off, Tested and caused issues
2024-01-19 - Enabled, crashing/instability.

*Editing up time, because I read wrong number

Have you tried to test any of the kernel or mesa patches the are suggested in this thread?
The point is that not every one can disable GFXOFF in the bios as they don't have that choice (mine doesn't. I'm using an Asus B450M-A). Some of the releases posted here bring stability to some of us.

Offline

#471 2025-02-04 07:36:36

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Don't know if someone's testing  linux-amdgpu-testing-6.13.1.arch1-7. I'm going to do it now.

Offline

#472 2025-02-04 08:22:42

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:

I was all ready to write up a 28hr report on linux-amdgpu-stable 6.13.1.arch1-2
Unfortunately, I've been wasting all this time retesting linux-amdgpu 6.13.arch1-2!
A 'uname -rs' saved me from giving a false report........

I'll move onto testing linux-amdgpu-stable 6.13.1.arch1-2 unless I hear otherwise.

EDIT:
Preliminary report on:

pacman -Q linux-amdgpu-stable : linux-amdgpu-stable 6.13.1.arch1-2
uname -rs                                         : Linux 6.13.1-arch1-2-amdgpu-stable
pacman -Q glibc                              : glibc 2.40+r66+g7d4b6bcae91f-1
cat /proc/cmdline                            : .... rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force

Power usage at idle desktop w/ chromium            129F     2.62W
WebGL Aquarium Chromium :  5000 fish   60fps  154F   13.12W
                                                       10000 fish  60fps  159F   17.39W
                                                       20000 fish  40fps  162F   17.19W
                                                       30000 fish  31fps  169F   17.95W
Note: Firefox: ~ 8-10 fps less

Only about an hour into this test, seems stable enough to do some testing, but  I'll add more results in about 24hr.

I've copied this results into https://github.com/pacoandres/laikm/iss … 2633179337 for having all together.

Offline

#473 2025-02-04 10:37:21

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Thank you, @NuSkool! I just want to clarify that stable builds are for normal use in case if testing kernel crashes. linux-amdgpu-stable-6.13.1.arch1-2 and : linux-amdgpu-stable-6.13.1.arch1-3 are same, the only difference is glibc version they are compiled with. So you can skip -2, update your system to the latest and use -3.

Offline

#474 2025-02-04 10:47:28

lpr1
Member
Registered: 2017-10-08
Posts: 109

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
lpr1 wrote:
Mechanicus wrote:

Well... I asked because I tested a bit all three builds on my Raven machine. And all of them seemed stable.

Got it, ye, I got instant freeze on all 3 kernels and it seems it's not related to the glibc version as @pacoandres experienced similar thing. I also do not think this is related to workarounds anyway, unless those patches are relatively recent ones, but I don't know. I see new builds, so what are we supposed to test now?

linux-amdgpu-testing-6.13.1.arch1-7 and 8 aimed to find the actual place where the problem starts.

Alright, but they/you are moving too fast, 2-3 days is required to assure there's no issue or if there it is one, so I keep testing 1-2 for now, at least 2 more days.

Offline

#475 2025-02-04 11:09:19

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

lpr1 wrote:

Alright, but they/you are moving too fast, 2-3 days is required to assure there's no issue or if there it is one, so I keep testing 1-2 for now, at least 2 more days.

Well... probably. This is my typical speed of problem resolution, sorry smile

Offline

Board footer

Powered by FluxBB