You are not logged in.
I have a Ryzen 5900, 64GB ECC DRAM in a 2x32 config, x570 motherboard. I've parts cannon'd everything but the problem recurred. It seems that certain kernel versions, maybe, lead to reduced/fewer crashes, but I had a crash and updated the kernel, and now I'm having even more crashes!
I'm also running rasdaemon and ras-ms-ctl. In journalctl, I get messages like
[Hardware Error]: CPU:0 (19:21:2) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
...
rasdaemon[764]: <...>-27606 [000] ..... 0.000918 mce_record 2024-05-28 07:31:11 -0600 Unified Memory Controller (bank=17), status= dc2040000000011b, etc. This happened to be a "corrected" error, but it seems to still have crashed the machine. Any help would be appreciated.
Last edited by jamdox (2024-05-29 04:05:56)
Offline
Disabling rasdaemon and ras-ms-ctl didn't help.
Offline
This crashes happens on high CPU/memory load or before/after suspend (change power state)?
You may want to experiment with CPU power (higher) / memory frequency (lower) settings in bios, disable power states savings (C states in bios), etc. or kernel power parameters...
Offline