You are not logged in.

#1 2010-02-08 16:26:23

timm
Member
From: Wisconsin
Registered: 2004-02-25
Posts: 417

Random reboots - hardware problems [Moved to Kernel & Hardware]

I'm running a dual Xeon machine, and this morning it started random reboots.  I checked the bios monitoring for temperature fluctuations, and all seems stable and within range at idle.

The machine crashes with MCE errors, didn't get many of them but a couple

Processor context corrupt
Some CPUs didn't answer in synchronization

Any idea where I begin looking?

Last edited by timm (2010-03-20 20:41:02)

Offline

#2 2010-02-08 17:00:23

davidm
Member
Registered: 2009-04-25
Posts: 371

Re: Random reboots - hardware problems [Moved to Kernel & Hardware]

timm wrote:

Any idea where I begin looking?

Memory if you haven't checked it already.  Run memtest from a livecd and rule that out.

Offline

#3 2010-02-08 22:24:05

timm
Member
From: Wisconsin
Registered: 2004-02-25
Posts: 417

Re: Random reboots - hardware problems [Moved to Kernel & Hardware]

Memtest finds no problems.  I suspect it is a software issue, since it happened right after a major update.  If I try to run anything significant, it crashes.  This is a fairly new machine and has worked flawlessly up to today.

I'm now just trying to get the data off of the machine so we can work tomorrow, today was a wasted day.  I don't know if I'm just too frustrated to search intelligently, but I cannot find the info I need for that.  Just trying to scp the info to another machine crashes it.

So I need to boot with the arch live cd.
somehow mount my lvm system on the existing box
then copy everything over to a spare box until I have time to figure out what's wrong.
Anybody who can step me through this, or at least points me to the info to do it, earns my greatest appreciation.

[bump]
I decided to try rolling back my kernel to the one that was on before - it appears more stable so am running some tests but I can't get it to crash doing much harder things than it was earlier.  Looks like I might have found a bug, although I have no idea how to report that.

So, I guess I don't need help right now, but if anyone knows the answers to how to do the above and wants to post to the thread for future searchers, feel free, but for now I'll mark this solved.

Last edited by timm (2010-02-08 23:12:52)

Offline

#4 2010-03-20 20:20:25

timm
Member
From: Wisconsin
Registered: 2004-02-25
Posts: 417

Re: Random reboots - hardware problems [Moved to Kernel & Hardware]

This was marked solved, but perhaps it should have been "avoided".  I think it's a kernel problem so am taking the new information to that forum.

Offline

Board footer

Powered by FluxBB