You are not logged in.

#1 2019-12-13 03:35:11

faceyneck
Member
Registered: 2017-12-16
Posts: 44

[SOLVED] Errata Conspiracy? AMD R9 290X Problems

I'll try to give as much useful information as I can.

Hardware information:

 -`                    rj@archlinux 
                  .o+`                   ------------ 
                 `ooo/                   OS: Arch Linux x86_64 
                `+oooo:                  Host: All Series 
               `+oooooo:                 Kernel: 5.4.2-zen1-1-zen 
               -+oooooo+:                Uptime: 28 mins 
             `/:-:++oooo+:               Packages: 1262 (pacman) 
            `/++++/+++++++:              Shell: bash 5.0.11 
           `/++++++++++++++:             DE: Plasma 
          `/+++ooooooooooooo/`           WM: KWin 
         ./ooosssso++osssssso+`          Theme: Breeze-Dark [GTK2/3] 
        .oossssso-````/ossssss+`         Icons: breeze-dark [GTK2/3] 
       -osssssso.      :ssssssso.        Terminal: yakuake 
      :osssssss/        osssso+++.       CPU: Intel i7-4790K (8) @ 4.400GHz 
     /ossssssss/        +ssssooo/-       GPU: AMD ATI Radeon R9 290X/390X 
   `/ossssso+/:-        -:/+osssso+-     Memory: 1953MiB / 15950MiB 
  `+sso+:-`                 `.-/+oso:
 `++:.                           `-/+/                           
 .`                                 `/                           

Main problem: Graphics card is clearly struggling, with or without the attempt at gaming. Page Up or Page Down in web browser makes the fan ramp up all crazy. I attempted to run Quake Champions, and screen goes black with fan pegged at 100% until I force shutdown. WITHOUT GRAPHICS CARD BEING PARTICULARLY HOT. Easily can touch it with my hand and leave it there.

General computer usage is BARELY okay, not great. Watching a YouTube video often causes the fan to ramp up. Although it hasn't had this exact same crash without playing video games, the card is clearly working very hard just to maintain the desktop.

Graphics drivers are mesa-git.

I'm not 100% sure what I should try to find via journalctl, but here seem to be the important logs I've noticed, from around the time the crashes happened:

WALLS of this:

Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41189, resource id: 100663297, major code: 15 (QueryTree), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41194, resource id: 100663297, major code: 18 (ChangeProperty), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41339, resource id: 39848706, major code: 3 (GetWindowAttributes), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41343, resource id: 39848707, major code: 3 (GetWindowAttributes), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 41344, resource id: 39848707, major code: 14 (GetGeometry), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41347, resource id: 39848708, major code: 3 (GetWindowAttributes), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 41348, resource id: 39848708, major code: 14 (GetGeometry), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41351, resource id: 39848709, major code: 3 (GetWindowAttributes), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 41352, resource id: 39848709, major code: 14 (GetGeometry), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41355, resource id: 39848710, major code: 3 (GetWindowAttributes), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 41356, resource id: 39848710, major code: 14 (GetGeometry), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 41359, resource id: 39848711, major code: 3 (GetWindowAttributes), minor code: 0
Dec 12 18:55:15 archlinux kwin_x11[793]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 41360, resource id: 39848711, major code: 14 (GetGeometry), minor code: 0



Dec 12 18:56:55 archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=49063, emitted seq=49065
Dec 12 18:56:55 archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process QuakeChampions. pid 3201 thread QuakeChamp:cs0 pid 3218
Dec 12 18:56:55 archlinux kernel: amdgpu 0000:01:00.0: GPU reset begin!
Dec 12 18:57:11 archlinux kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Dec 12 18:57:11 archlinux kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
Dec 12 18:57:21 archlinux kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Dec 12 18:57:31 archlinux kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:45:plane-5] flip_done timed out
Dec 12 18:57:36 archlinux kernel: amdgpu: [powerplay] VI should always have 2 performance levels

...and lastly, this error code, which might be the inspiration for the Scary Pasta:

Dec 08 23:40:21 archlinux kernel: [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)
Dec 10 01:31:40 archlinux kernel: [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)
Dec 10 01:32:21 archlinux kernel: [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)
Dec 10 01:42:33 archlinux kernel: [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)
Dec 11 19:50:35 archlinux kernel: [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)
Dec 11 22:44:18 archlinux kernel: [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)

What I've tried:

Setup various kernel parameters, like disabling Radeon via radeon.cik_support=0. radeon.si_support=0 etc. Or the inverse; force enabled.
Kernel parameters to make sure amdgpu is running: amdgpu.cik_support=1 etc. Or the inverse to disable.

The failures also happen regardless of whether or not DPM is disabled or enabled as a kernel parameter. Same with amdgpu.dc, although I don't know what that does.


I had been using Debian on this exact same setup, and played many games for many hours on there. I am HOPING there's some config somewhere that I've messed up, because I don't see how Debian would somehow avoid excessive GPU usage.

Lastly; I had the same problem with Arch about a year ago. So I'm fairly certain there's something odd about this card, and the standard configurations from the Installation Guide.

Thanks for reading. Sorry if there's a bunch of useless information here.

Last edited by faceyneck (2019-12-17 07:30:05)

Offline

#2 2019-12-13 11:46:35

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,102

Re: [SOLVED] Errata Conspiracy? AMD R9 290X Problems

Do you have early microcode loading  configured ?


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#3 2019-12-13 17:55:33

faceyneck
Member
Registered: 2017-12-16
Posts: 44

Re: [SOLVED] Errata Conspiracy? AMD R9 290X Problems

Lone_Wolf wrote:

Do you have early microcode loading  configured ?

Not yet. I'll set that up now, but need to increase my /boot drive, which isn't part of my LVM, so it might be a while.


I'm watching my GPU Load/temps, and it's staying at a solid 95-96C just sitting on the desktop with me writing this message. You think CPU microcode would be responsible for this bug?

EDIT: To elaborate, the temperature is pegged at the lowest of 95C at 0-6% load, then load shoots up to 95% or higher, fans roar up, and then it'll drop right back off. This happens even when I'm doing nothing on the computer at all. Just with the monitors on.

I'll report back once I get the early microcode loading taken care of.

It's somehow gotten worse since last night.

Last edited by faceyneck (2019-12-13 18:00:43)

Offline

#4 2019-12-13 20:01:13

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: [SOLVED] Errata Conspiracy? AMD R9 290X Problems

For the last person I saw here with temperature problems with a 290 card, that was the thermal paste of the card being bad, probably because of age. Things were fine after replacing the paste.

If you don't know what this is about, search for photos or video about how to disassemble the exact model of graphics card you are using.

If you decide to try to work on this, I feel the main thing to look out for is to not destroy the thermal pads for the memory and voltage regulator chips. You will want to keep the original thermal pads. They are annoying to replace because the height has to fit exactly.

Offline

#5 2019-12-14 12:13:02

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,102

Re: [SOLVED] Errata Conspiracy? AMD R9 290X Problems

The ERRATA have to do with the processor, not the gpu and are normally solved by uefi/bios firmware or microcode updates.
You can also verify if you have the latest firmware for your board.

Bad thermal paste is worth looking into.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#6 2019-12-15 23:22:30

faceyneck
Member
Registered: 2017-12-16
Posts: 44

Re: [SOLVED] Errata Conspiracy? AMD R9 290X Problems

Ropid wrote:

For the last person I saw here with temperature problems with a 290 card, that was the thermal paste of the card being bad, probably because of age. Things were fine after replacing the paste.

If you don't know what this is about, search for photos or video about how to disassemble the exact model of graphics card you are using.

If you decide to try to work on this, I feel the main thing to look out for is to not destroy the thermal pads for the memory and voltage regulator chips. You will want to keep the original thermal pads. They are annoying to replace because the height has to fit exactly.


Thanks for the recommendation.

Yeah I've been building computers for a while. Now I just hope that I actually have some thermal paste laying around. I'll give it a go. Thanks again.

Offline

#7 2019-12-15 23:26:16

faceyneck
Member
Registered: 2017-12-16
Posts: 44

Re: [SOLVED] Errata Conspiracy? AMD R9 290X Problems

Lone_Wolf wrote:

The ERRATA have to do with the processor, not the gpu and are normally solved by uefi/bios firmware or microcode updates.
You can also verify if you have the latest firmware for your board.

Bad thermal paste is worth looking into.


I updated the firmware, and set it up via early loading, mkinitcpio -p for all 3 of my kernels (Zen, Linux and Linux-LTS.)

I just tried playing Half Life 2. It was working for about 5 minutes and then crashed out. This is better results than I had previously.

I'll go ahead and re-do thermal paste on the card, and report back with the outcome.

I'm pretty sure updating the microcode helped, as I'm now currently writing this reply on the machine, and the graphics card isn't getting maxed out at crazy intervals like it used to.

Come to think of it; the screws had gotten loose on the card, so I took it out to tighten them up. That's when this problem started.

Offline

#8 2019-12-17 07:29:05

faceyneck
Member
Registered: 2017-12-16
Posts: 44

Re: [SOLVED] Errata Conspiracy? AMD R9 290X Problems

Alright, thanks everyone.

I replaced the thermal paste, and idle temp went down over 30 DEGREES CELSIUS! lol

I'm monitoring the GPU at idle right now, and it's down to 42-45C. I'm gonna go ahead and mark this one solved.

Thanks again.

TL;DR - Replace thermal paste on your GPU.

Offline

Board footer

Powered by FluxBB