You are not logged in.
I have a relatively new computer -- not a clunker -- that has random slowdowns quite frequently. They will last anywhere from 10 seconds to 3+ minutes. It always starts in just one application, like firefox or pidgin, then it "spreads" to other ones. Often, I will type a "cd" or "ls" command in a terminal while the hangs are happening. They don't actually do anything until the hangup is over. This has happened on a regular basis for months.
I've tried switching applications. It happens with nothing but XFCE and Firefox running, or XFCE and Starcraft. It first started under KDE. I've tried moving from firefox to chrome and back, switching terminal emulators, nothing. It also happens under Windows, however, so I believe it's a hardware problem.
I have no idea what it is though! I've tried:
Switching out my Nvidia 9200 GT with the 8200 on-board graphics
A different network card from my on-board (wired)
Different hard disks -- the Windows and Linux installs are on different disks (one is IDE, the other is SATA)
Swapping out my 2GiB RAM stick with two other RAM sticks (512MiB each) in every slot
This leaves only my PSU, CPU, and motherboard as possible culprits after help from the #archlinux channel in narrowing down the above. Where do I go from here? I haven't any idea how to check the health of my motherboard and CPU, but I see no extra slowdown when I run Folding at Home on one of my cores (it's a quad). Whenever I reboot to my BIOs and look at the voltages my PSU is putting out, everything seems to be okay.
I've had the same hardware since last Christmas when I upgraded my video card. It worked fine somewhere in July, IIRC, when this all started. Can anyone offer any help? I haven't any idea what I should do next.
Last edited by Nathan (2010-10-23 02:41:46)
Offline
Perhaps it is fragmentation since it has lasted for so long a time.
Prediction...This year will be a very odd year!
Hard work does not kill people but why risk it: Charlie Mccarthy
A man is not complete until he is married..then..he is finished.
When ALL is lost, what can be found? Even bytes get lonely for a little bit! X-ray confirms Iam spineless!
Offline
It can't be fragmentation. I have plenty of free space and it happens on both Windows and Linux. Windows was even installed early August, just after Starcraft II was released.
Offline
Never experienced that before - the fact that it spans O/S's does point to hardware but I'm at a loss to explain it to you. Are you overclocking the machine? Did you setup the BIOS yourself or is it on auto (dangerous sometimes).
Does the HDD light stay on during the freeze?
Could it be a bad cable or HDD?
Have you run memtest86+ to rule out memory?
How about some HDD tests via smartmon-tools?
CPU and memory can be stressed with linpack (see sig for the AUR package).
With my linpack package, install it then edit /etc/linpack.conf and adjust the problem size to match the recommended values based on your physical RAM. Then run it as your regular users. Here's an example of linpack's output (you should let it run for a good 50-60 iterations) making sure that the "Residual" amount never changes through out all the runs. If it does, your CPU is not stable as such and you will likely need to tweak your BIOS settings (vcore, etc.):
$ cat lin_xeon64.txt
Fri Oct 8 15:12:24 EDT 2010
Intel(R) LINPACK data
Current date/time: Fri Oct 8 15:12:24 2010
CPU frequency: 3.400 GHz
Number of CPUs: 4
Number of threads: 4
Parameters are set to:
Number of tests : 1
Number of equations to solve (problem size) : 29000
Leading dimension of array : 29000
Number of trials to run : 100
Data alignment value (in Kbytes) : 4
Maximum memory requested that can be used = 6728584096, at the size = 29000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
29000 29000 4 339.705 47.8681 8.235009e-10 3.484842e-02
29000 29000 4 345.663 47.0430 8.235009e-10 3.484842e-02
Last edited by graysky (2010-10-08 19:26:21)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I would first of all try using a Live Linux CD, and stressing the computer with it. That takes your hard drive out of the equation. If you still get similar slow downs you then know for sure it is a hardware fault and probably a CPU and/or motherboard problem.
Philosophy is looking for a black cat in a dark room. Metaphysics is looking for a black cat in a dark room that isn't there. Religion is looking for a black cat in a dark room that isn't there and shouting "I found it!". Science is looking for a black cat in a dark room with a flashlight.
Offline
... If you still get similar slow downs you then know for sure it is a hardware fault and probably a CPU and/or motherboard problem.
My money is on on the system hanging while trying to read from a bad disk sector. You might look at some of the tools for looking at drive health.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
what about high temperature of the cpu or grphic card?
ezik
Offline
I was unable to run linpack -- it seems it only works on Intel CPUs. I can give some more information, however:
I don't think it's a hard drive issue at all. The problem spans both WIndows and Linux, isolated on two different HDDs. One is IDE, the other is SATA.
My CPU temp never goes above 45°C. It's usually 42°C to 39°C. My GPU temp is usually around 50°C. I have three exaust fans and a large opening letting air in, two more fans on my main HDD. One temperature sensor reads 21°C.
I've ran long and short SMART tests on both drives. Both are clean. The drive holding Linux is about a year and a half old.
I've ran memtest86+ overnight twice, both times no errors. I also tried replacing my ram with two 512MiB sticks I knew to work in different slots; the problem persisted.
Offline
I endorse the idea of trying a live OS to isolate the problem from the present situation.
In doing such an approach, it would be advisable to have an OS without kde or gnome.
One such Live system is CTKArchLive which uses openbox, pacmanfm, aufs, wicd, arora, midori and can be run entirely in ram. It has x86_64 mode and should be compatible with quad core.
It is normally in french but an addon provided as a download permits english. It is ~550 mb for the 64bit install. Can install to flash with DD.
This approach eliminates much of the software your system includes and will possibly give insight into a solution.
There are other Live OS'es capable of use as well.
Prediction...This year will be a very odd year!
Hard work does not kill people but why risk it: Charlie Mccarthy
A man is not complete until he is married..then..he is finished.
When ALL is lost, what can be found? Even bytes get lonely for a little bit! X-ray confirms Iam spineless!
Offline
A live CD will not help isolate the problem. The only hardware I have not tested is my PSU, CPU, and motherboard. It's a hardware problem. The problem is figuring out which one I should replace. I don't see how a LiveCD will test anything -- it happens in two OS's with completely different hard drives and applications.
Offline
CPU or motherboard fault then. The most expensive problem to fix unfortunately...
Philosophy is looking for a black cat in a dark room. Metaphysics is looking for a black cat in a dark room that isn't there. Religion is looking for a black cat in a dark room that isn't there and shouting "I found it!". Science is looking for a black cat in a dark room with a flashlight.
Offline
I was unable to run linpack -- it seems it only works on Intel CPUs. I can give some more information, however:
I don't think it's a hard drive issue at all. The problem spans both WIndows and Linux, isolated on two different HDDs. One is IDE, the other is SATA.
My CPU temp never goes above 45°C. It's usually 42°C to 39°C. My GPU temp is usually around 50°C. I have three exaust fans and a large opening letting air in, two more fans on my main HDD. One temperature sensor reads 21°C.
I've ran long and short SMART tests on both drives. Both are clean. The drive holding Linux is about a year and a half old.
I've ran memtest86+ overnight twice, both times no errors. I also tried replacing my ram with two 512MiB sticks I knew to work in different slots; the problem persisted.
OK... mprime (which is prime95 for linux) is also extremely useful in diagnosing hardware problems. I like linpack because it usually uncovers instabilities faster, but mprime works well too. Run it in torture test mode doing small FFTs. It will break on a "rounding error" if it experiences a problem. With mprime/p95 24-30 h of testing is usually good. In other words if it throws no errors in that timeframe, your hardware is said to be 'stable' by most. If it does throw an error, you have many places to look. In my experience it's a voltage issue, ie one or more of the motherboard's vcores is not set high enough.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Did you tried to update your bios?
And also did you tried to disable all integrated components like floppy / ide / sata controlers, audio, serial ports, usb ports, etc etc?
.::. TigTex @ Portugal .::.
Offline
I've been running mprime for a total of about 9 hours now, and so far no errors. Also, I my computer is quite usable when it's going -- no more hanging than usual. All four cores are running at 100%.
I have not tried to update my BIOs. Although the change log mentions nothing that should affect me, I'll try anyway once mprime is done.
EDIT: Something else I suppose I should mention is that I upgraded my Nvidia 9200 GT to an Nvidia 8800 GTX. It requires two 6-pin PCI-E power connectors, so I got adapters and hooked it up. It runs Starcraft II perfectly, it isn't underpowered as far as I can tell. This leads me to think it's not my PSU. Also, I have had mprime running for about 15 hours and no errors have been reported yet.
Last edited by Nathan (2010-10-11 17:45:07)
Offline
This is strange as what you said in your first post really pointed to hardware (as it occurred in both Windows and Linux) but if you've run mprime successfully then it cannot be your CPU-mobo combo. If it is still hardware then it has to be the GPU or hard drive. But then you said you tried different hard drives with same slowdown. The only thing left is GPU and mprime won't stress that. I suggest running a long DVD game/movie and having "top" run in a terminal and wait for the slowdown, then check the GPU temperature - perhaps overheating?
Philosophy is looking for a black cat in a dark room. Metaphysics is looking for a black cat in a dark room that isn't there. Religion is looking for a black cat in a dark room that isn't there and shouting "I found it!". Science is looking for a black cat in a dark room with a flashlight.
Offline
I've tried three different graphics cards. One is a new Nvidia 9200 GT, which I just got for Christmas. Another is the Nvidia 8200, which is an onboard chipset. The most recent, which is installed right now. is an Nvidia 8800 GTX. Although it seems to happen less with the GTX, it still happens somewhat frequently. It was worst with the onboard.
I'm wondering if I should try stripping out my HDDs and installing on an old IDE disk I have just to be sure, even though SMART says both disks are fine.
mprime has been running for 21 hours and no issues so far, although I do have a large box fan pointed at the open CPU case and a smaller one pointed straight at the CPU
EDIT: Another thing I've been wondering if this would be at all possible caused by my router. My router is quite crappy -- an ancient Belkin running DD-WRT, but it barely has enough RAM to do that. Everything but the essentials is turned off. Do you think this could cause it? My dad's computer, attached to the same router, is also having problems, but that computer has had the same Windows XP install since about 2002, so it's not a great benchmark. Wondering if it's worth checking out though.
The reason I say this is because I notice the hanging / slowdown mostly start in Firefox / Chrome / Pidgin when using the network then spread to other components. Also, I am connected to it via a *really* long (200ft I think) CAT5e cable. Could this be a problem? Should I try to throw in a wireless card and see what happens? I didn't have problems with it when we first installed it, but that was a while ago. Come to think of it, it may be about the time that the slowdown started. Could that be the culprit? I'll probably try a wireless card tomorrow and see what happens.
Last edited by Nathan (2010-10-12 01:01:09)
Offline
Most interfaces for internet connecting devices are limited to 75 feet between source and load. If a powered hub were used, another 75 feet is provided.
I have never heard of a 200 foot run from source to load! It would cause many errors in transmission, slowdowns and propagate errors into your system.
Perhaps you should provide fuller background info in your posts in the future.
Prediction...This year will be a very odd year!
Hard work does not kill people but why risk it: Charlie Mccarthy
A man is not complete until he is married..then..he is finished.
When ALL is lost, what can be found? Even bytes get lonely for a little bit! X-ray confirms Iam spineless!
Offline
We've used the cable before with no issues. I've read from many places that CAT5e be used in lengths of up to 90m (295 or so feet). Where have you learned that 75ft is the max? That sounds absurd.
Offline
Perhaps 200 feet is possible with a good router but you say yours is crappy!
Prediction...This year will be a very odd year!
Hard work does not kill people but why risk it: Charlie Mccarthy
A man is not complete until he is married..then..he is finished.
When ALL is lost, what can be found? Even bytes get lonely for a little bit! X-ray confirms Iam spineless!
Offline
...Another thing I've been wondering if this would be at all possible caused by my router... ... I am connected to it via a *really* long (200ft I think) CAT5e cable.
100-Base-TX over Cat5 is specified to 100m
Assuming this is eth0, Check the content of the files in /sys/class/net/eth0/statistics/ that have 'error' in their name. See if the error counts look reasonable or if they are absurd as compared to the tx and rx byte counts. (Files with those counts are also in the above directory.
Edit : BTW: My money is still on the disk drive
Last edited by ewaller (2010-10-12 01:51:58)
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
0 errors reported in all files. I'm starting to wonder if it's the disk drive too.
EDIT: 24 hours and no errors in mprime
Last edited by Nathan (2010-10-12 02:24:31)
Offline
So, I figured it out. It was the CPU *and* the HDD Linux was running on.
I rebooted to Windows for the first time in a while and found that it bluescreened where Linux would have frozen. Instead of random freezes, it just crashed. After some googling, found out that 3rd core issues in Phenoms are not uncommon. I limited it to 1.9ghz and all has been fine since then -- in Windows.
It still happened in Linux, however, but not as much. I got a new HDD in from Western Digital three days after filing a RMA request. They sent it to me and I copied all my data over. Thanks Western Digital.
So now, for the first time in month, everything is fixed. Hooray!
Offline