You are not logged in.

#1 2013-02-04 20:52:50

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

[SOLVED] How do you diagnose faulty hardware? (already tried memtest)

Here are the issues:

  • resume from suspend occasionally fails (various kernel versions)

  • compiling large applications often leads to an internal compiler error (signal 11, various kernel versions, various gcc versions)

Other than that, everything works (apparently).

After optimistically researching possible software causes (testing different kernel versions, compiling it myself, compiling gcc), I'm left with the conclusion that this is almost certainly a hardware issue. Luckily I should still be under warranty.

My question is, how can I diagnose the problem? It only appears sporadically and I don't want some rushed technician to miss it. I have run memtest86+ but it reported no errors. What else can I do? I want to be able to tell the technicians exactly where to look or, better yet, just tell them what to replace.

Thanks!

Last edited by Xyne (2013-02-04 23:39:05)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#2 2013-02-04 21:09:06

lagagnon
Member
From: an Island in the Pacific...
Registered: 2009-12-10
Posts: 1,087
Website

Re: [SOLVED] How do you diagnose faulty hardware? (already tried memtest)

I would try "mprime" - it will check to see whether you have CPU calculation errors. It is in the AUR.
http://mersenne.org/


Philosophy is looking for a black cat in a dark room. Metaphysics is looking for a black cat in a dark room that isn't there. Religion is looking for a black cat in a dark room that isn't there and shouting "I found it!". Science is looking for a black cat in a dark room with a flashlight.

Offline

#3 2013-02-04 21:12:43

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,597
Website

Re: [SOLVED] How do you diagnose faulty hardware? (already tried memtest)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#4 2013-02-04 21:43:55

nomorewindows
Member
Registered: 2010-04-03
Posts: 3,362

Re: [SOLVED] How do you diagnose faulty hardware? (already tried memtest)

Something was wrong with my motherboard and it took 8 hours to run memtest+ for something that would usually take 30 minutes.  The Cache speeds were too low from what they normally were expected.


I may have to CONSOLE you about your usage of ridiculously easy graphical interfaces...
Look ma, no mouse.

Offline

#5 2013-02-04 23:38:37

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [SOLVED] How do you diagnose faulty hardware? (already tried memtest)

Thanks for the replies. mprime detected at least one error so it does indeed seem to be a hardware problem. hmm mad
I'll leave memtest running overnight to see if anything else shows up.

Oh well, at least that explains a few things.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#6 2013-02-04 23:39:40

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,597
Website

Re: [SOLVED] How do you diagnose faulty hardware? (already tried memtest)

@Xyne - Small FFTs or Blend?  Having the same thing on one of my old AMD boards now, by the way... 10 years old though so what can you do smile

Last edited by graysky (2013-02-04 23:41:26)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#7 2013-02-04 23:55:13

lagagnon
Member
From: an Island in the Pacific...
Registered: 2009-12-10
Posts: 1,087
Website

Re: [SOLVED] How do you diagnose faulty hardware? (already tried memtest)

Xyne wrote:

Thanks for the replies. mprime detected at least one error so it does indeed seem to be a hardware problem. hmm mad

I repaired one computer a few years ago that had problems similar to yours and failed mprime. I removed the CPU fan and CPU, cleaned all the small gold contacts on the CPU base with an electronic cleaner, reseated the CPU, renewed the CPU fan heat sink compound and that fixed the problem. However, that rarely works, usually it is just an internal CPU fault, requiring replacement.


Philosophy is looking for a black cat in a dark room. Metaphysics is looking for a black cat in a dark room that isn't there. Religion is looking for a black cat in a dark room that isn't there and shouting "I found it!". Science is looking for a black cat in a dark room with a flashlight.

Offline

#8 2013-02-04 23:57:24

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [SOLVED] How do you diagnose faulty hardware? (already tried memtest)

graysky wrote:

@Xyne - Small FFTs or Blend?  Having the same thing on one of my old AMD boards now, by the way... 10 years old though so what can you do smile

Blend. The error showed up in the first series of tests. I let it run for 2 more but stopped it after that without running any other tests. I'm guessing it's a CPU/motherboard issue as it shows up with CPU-intensive tasks regardless of memory usage. Memtest86 may indicate otherwise but I suppose the complexity of hardware interaction makes it impossible to fully localize the error from software. Hopefully the technicians will be able to conclusively determine the cause and replace the faulty component. I really don't want to play package pong with them.


lagagnon wrote:

I repaired one computer a few years ago that had problems similar to yours and failed mprime. I removed the CPU fan and CPU, cleaned all the small gold contacts on the CPU base with an electronic cleaner, reseated the CPU, renewed the CPU fan heat sink compound and that fixed the problem. However, that rarely works, usually it is just an internal CPU fault, requiring replacement.

I might try that if it wasn't still under warranty, but I'd rather just let them deal with it.

Last edited by Xyne (2013-02-05 00:00:38)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#9 2013-02-04 23:58:33

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,597
Website

Re: [SOLVED] How do you diagnose faulty hardware? (already tried memtest)

@lagagnon - His problem could be related to any number of things, memory, memory controllers, etc.  If the mprime fails small FFTs it is likely some voltage settings on the CPU.  If it's failing on blend, more likely the former.


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

Board footer

Powered by FluxBB