You are not logged in.

#1 2024-04-28 16:22:49

TyraVex
Member
Registered: 2024-04-28
Posts: 2

System Crashes with "Hardware Error" over multi-core burst load

Hello everyone,

I am encountering a persistent and troubling issue with my system and I was hoping to get some insights or potential solutions from the community here. Here is my setup:

- **Motherboard**: ASUS TUF 450 Plus Gaming
- **CPU**: AMD Ryzen 5 5950X
- **RAM**: 4x32 GB DDR4 3200 MHz
- **OS**: Arch Linux x84_64 (tested on both standard and zen kernels)

### Problem Description:

Whenever I execute tasks that make my CPU work intensely across all cores, my system instantly crashes (machine learning or encode workflows for instance). The crash is accompanied by a "Hardware Error" message in the logs. Importantly, this error presents a different CPU core or thread as the failure point each time it occurs.

**Error**:
```
[ 1.066295] Hardware Error]: System Fatal error.
[ 1.066301] Hardware Error]: CPU:28 (19:21:2) MC6_STATUS[-|UE|MiscV|-|IPCC|T
C|SyndV|-|-|-]: 0xbaa0000000000118
[ 1.066312] Hardware Error]: IPID: 0x00060b0a00000000, Syndrome: 0x00000004
d000000
[ 1.066317] Hardware Error]: Floating Point Unit Ext. Error Code: 0
[ 1.066318] Hardware Error]: cache level: RESV, tx: GEN, mem-tx: RD
```

### Steps Taken So Far:

1. **Kernel Tests**: I've tried both the regular Linux kernel, and the Linux Zen kernel, but the issue persists identically on both.

2. **Memory Test**: Using Memtest86+ v6.20, I checked the RAM for errors but all tests passed without any issues.

3. **Hardware Monitoring**: Temperatures and voltages are all within normal ranges before the crashes occur (it does't have time to overheat anyway).

4. **Updated BIOS**:  My BIOS is up to date and officially supports my CPU and RAM configurations.

### Suspicions:

- Could this be related to a CPU defect, or is it more likely an issue with the motherboard settings or compatibility? I know that a 5950x is overkill for a b450, but it is officially supported, so I don't know what to think about it.

- Power supply instability under load? I have a 750w PSU, but the system crashes anyway even when I limit my RTX 3090 to 100W.

Any guidance would be greatly appreciated.
Thank you all in advance for your support and suggestions!

Offline

#2 2024-04-28 16:52:43

TyraVex
Member
Registered: 2024-04-28
Posts: 2

Re: System Crashes with "Hardware Error" over multi-core burst load

After a bit of research, it's clear that my motherboard VRMs aren't enough for a 5950x. I guess i'll have to undervolt/underclock or change my MB

Offline

Board footer

Powered by FluxBB