You are not logged in.

#1 2023-02-21 21:10:19

Hello_324
Member
Registered: 2023-02-21
Posts: 6

[SOLVED] Figuring out MCE errors

Hiyas! A few days ago my computer suddenly crashed, after which I could no longer boot properly. Whenever the screen would normally adjust to the right resolution it would just black screen instead. The same thing is happening with the latest version of the arch installer. I found that using the nomodeset kernel parameter let's you get past that, but sadly doesn't fix anything. The error message I usually get is

Feb 19 19:04:30 archlinux kernel: mce: [Hardware Error]: Machine check events logged
Feb 19 19:04:30 archlinux kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 5: b6a0000001000108
Feb 19 19:04:30 archlinux kernel: mce: [Hardware Error]: TSC 0 ADDR fff80445a52ef6 SYND 4d000000 IPID 500b000000000 
Feb 19 19:04:30 archlinux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1676829720 SOCKET 0 APIC 8 microcode a201009

Trying it again now after a few days also gets me

Feb 20 20:57:36 archlinux kernel: AMD-Vi: Completion-Wait loop timed out
Feb 20 20:57:36 archlinux kernel: AMD-Vi: Completion-Wait loop timed out
Feb 20 20:57:36 archlinux kernel: AMD-Vi: Completion-Wait loop timed out
Feb 20 20:57:36 archlinux kernel: pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
Feb 20 20:57:36 archlinux kernel: AMD-Vi: Extended features (0x58f77ef22294ade, 0x0): PPR X2APIC NX GT IA GA PC GA_vAPIC
Feb 20 20:57:36 archlinux kernel: AMD-Vi: Interrupt remapping enabled
Feb 20 20:57:36 archlinux kernel: AMD-Vi: X2APIC enabled
Feb 20 20:57:36 archlinux kernel: AMD-Vi: Virtual APIC enabled
Feb 20 20:57:36 archlinux kernel: PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Feb 20 20:57:36 archlinux kernel: software IO TLB: mapped [mem 0x00000000d7147000-0x00000000db147000] (64MB)
Feb 20 20:57:36 archlinux kernel: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:04:00.0 address=0x100218650]
Feb 20 20:57:36 archlinux kernel: LVT offset 0 assigned for vector 0x400
Feb 20 20:57:36 archlinux kernel: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:04:00.0 address=0x100218670]

Turning off iommu removes those, but sadly doesn't fix anything

I tried changing GPUs from my R9 270 to an old Radeon HD 5750, which got rid of all these problems, making me think it was a GPU issue. However I managed to get a replacement R9 270, which has the exact same issues, making me think it might've not been it. I've already tried Memtest, and since the working GPU is only with 1 PCIE power cable, I've also tested both in single use and the non-working GPUs with non-necessary stuff disconnected from the PSU, to no avail.
Is there way to go on about figuring out what might be the problem here? I tried the tools from https://wiki.archlinux.org/title/Machin … _exception , but sadly errors seem to occur before rasdaemon.service gets started

Last edited by Hello_324 (2023-03-07 12:43:23)

Offline

#2 2023-02-21 22:11:17

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,717

Re: [SOLVED] Figuring out MCE errors

changing GPUs from my R9 270 to an old Radeon HD 5750, which got rid of all these problems

R9 270 is Curacao or Pitcairn, according to wikipedia (I'm not learning geographics) a southern island chip (while the HD 5750 is much older)

=> Do you use it along the radeon or the amdgpu driver?
https://wiki.archlinux.org/title/AMDGPU … K)_support
And does that have an impact on the situation?

Offline

#3 2023-02-21 22:31:48

Hello_324
Member
Registered: 2023-02-21
Posts: 6

Re: [SOLVED] Figuring out MCE errors

seth wrote:

=> Do you use it along the radeon or the amdgpu driver?

Yes, I already had it enabled it from a long time ago. The GPU used to work just fine until the crash

Offline

#4 2023-02-21 22:46:37

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,717

Re: [SOLVED] Figuring out MCE errors

And does it work w/ the radeon module?

Offline

#5 2023-02-21 22:59:20

Hello_324
Member
Registered: 2023-02-21
Posts: 6

Re: [SOLVED] Figuring out MCE errors

seth wrote:

And does it work w/ the radeon module?

Sorry if I misunderstand things, but do you want me to change

options amdgpu si_support=1
options amdgpu cik_support=1
options radeon si_support=0
options radeon cik_support=0

to

options amdgpu si_support=0
options amdgpu cik_support=0
options radeon si_support=1
options radeon cik_support=1

?
The former is what I was using before (and still have on) which worked before, doing an

lspci -k | grep -A 3 -E "(VGA|3D)"

with the 5750 gives me

	Kernel driver in use: radeon
	Kernel modules: radeon, amdgpu

Also sorry I completely forgot, but I'm dualbooting on this system, and Windows 10 also crashes after the Windows logo where it would usually adjust the resolution.

Offline

#6 2023-02-21 23:43:20

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,717

Re: [SOLVED] Figuring out MCE errors

Sorry if I misunderstand things, but do you want me to…

Yes.

However

I'm dualbooting on this system, and Windows 10 also crashes after the Windows logo

3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.

But this rather suggests the GPU is broken and/or underpowered. Did you change its PCIe slot?
Did you forget to connect a dedicated 6/8-pin power connector?

Offline

#7 2023-02-22 21:10:39

Hello_324
Member
Registered: 2023-02-21
Posts: 6

Re: [SOLVED] Figuring out MCE errors

seth wrote:

Sorry if I misunderstand things, but do you want me to…

Yes.

However

I'm dualbooting on this system, and Windows 10 also crashes after the Windows logo

3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.

But this rather suggests the GPU is broken and/or underpowered. Did you change its PCIe slot?
Did you forget to connect a dedicated 6/8-pin power connector?

Tried it, sadly it didn't work.
I already had it fast-start disabled and also tried the 2nd PCIe slot to no avail. All power connectors are plugged in, plugging in only 1 of the 2 4pins makes you get no output at all. Going to try again with a new PSU that hopefully arrives in a few days

Offline

#8 2023-02-25 17:55:21

Hello_324
Member
Registered: 2023-02-21
Posts: 6

Re: [SOLVED] Figuring out MCE errors

Trying out with a new PSU I actually managed to get past the black screen at the arch install CD on the first time, getting the following ERROR/warning at the start:

[drm:uvd_v1_0_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
radeon: 0000:2b:00.0: failed initializing UVD (-1).

After I then tried to properly boot the old problems came back, even with the arch install CD. Is there any way to find out what exactly is causing it without exchanging hardware pieces one at a time? Me having had the same issue with another GPU of the same model, and the GPUs working with nomodeset makes me think it might not be a GPU error either

Offline

#9 2023-02-25 21:25:45

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,717

Re: [SOLVED] Figuring out MCE errors

Windows 10 also crashes

changing GPUs from my R9 270 to an old Radeon HD 5750, which got rid of all these problems

Windows means it's the HW, the change means it's directly related to the GPU (model)
PSU (power) or GPU (chip) or board (bus) - I don't really see what else could interfere.

Offline

#10 2023-02-26 00:50:45

Hello_324
Member
Registered: 2023-02-21
Posts: 6

Re: [SOLVED] Figuring out MCE errors

seth wrote:

PSU (power) or GPU (chip) or board (bus) - I don't really see what else could interfere.

I thought so too, but trying both a new PSU and another GPU of the same model gave me the same errors. Maybe I got unlucky with the 2nd GPU, guess I will try it again with a different brand, once I can get my hand on one

Edit:
Sorry it indeed seems to have been the GPU. Trying the same GPU (with the same PSU) with a different motherboard + CPU brings about the same problem. Guess I really got unlucky with the 2nd GPU

Last edited by Hello_324 (2023-02-26 01:45:42)

Offline

#11 2023-02-26 07:58:43

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,717

Re: [SOLVED] Figuring out MCE errors

seth wrote:

And does it work w/ the radeon module?

Edit: sorry, forgot about the windows situation.

Both 270 are a decade old…

Please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.

Last edited by seth (2023-02-26 08:00:47)

Offline

Board footer

Powered by FluxBB