You are not logged in.
Lengthy description here, but I've tried just about everything to the best of my knowledge. Any insights are appreciated!
3 weeks ago I upgraded my system with an additional 1TB HD and 4 more gigs of RAM. Around the same time, the HD with my Arch install started to fail mechanically. I was luckily able to move my data over to the new drive and set up my system there. After this new install the system would unpredictably freeze with a blank screen of a random color (usually whatever color was covering the majority of the screen). Naturally I thought it was a video card issue so I picked up a new card, reinstalled, but the freezing persisted (although the screen would just freeze where I was, no blanket color). I ran memtest to see if my new RAM was the issue, but it reported no errors. I realized that at the time of the freeze I would hear the drive in use spin down so I checked the SMART status of the new drive and turns out it was failing too, so I picked up a new drive and reinstalled again (no SMART warnings on the new drive). Until this point I was unable to successfully reproduce or anticipate the error, it would occur unpredictably. For this new installation I set up KDE4.6 and discovered I was able to reproduce the error by simply logging in via KDM, opening Dolphin, and hovering my mouse over the drives listed in the left panel (yes I know that sounds ridiculous, but it happened every time, even if I waited after logging in to open Dolphin). I also realized that the freeze would only take place while X was running, and would not occur while I was in a virtual console (tty1-6). There seems to be no manifestation of this problem when I'm running Win 7 on the same hardware, although I spend significantly less time doing so.
Currently I switched over to GNOME and can run the system for usable lengths of time (hours to a few days) before the freeze happens, which is usually when starting or closing applications. I can't ssh into the frozen system. I've tailed dmesg prior to it freezing and nothing peculiar comes up. There aren't any errors in Xorg.log either. I've tried removing extra PCI cards I've got (tv-tuner, wireless cards, etc.) but the problem persists. I have not had the problem while running a LiveCD. I've even tried switching up which SATA ports/cables and power connections I'm using on my motherboard.
Specs:
Intel Core i7 CPU 860 @ 2.80GHz
Gigabyte P55M-UD2 motherboard
6 GB DDR3 1600 (PC3 12800) RAM
750W Corsair PSU (replaced less than a year ago)
lspci:
00:00.0 Host bridge: Intel Corporation Core Processor DMI (rev 11)
00:03.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 1 (rev 11)
00:08.0 System peripheral: Intel Corporation Core Processor System Management Registers (rev 11)
00:08.1 System peripheral: Intel Corporation Core Processor Semaphore and Scratchpad Registers (rev 11)
00:08.2 System peripheral: Intel Corporation Core Processor System Control and Status Registers (rev 11)
00:08.3 System peripheral: Intel Corporation Core Processor Miscellaneous Registers (rev 11)
00:10.0 System peripheral: Intel Corporation Core Processor QPI Link (rev 11)
00:10.1 System peripheral: Intel Corporation Core Processor QPI Routing and Protocol Registers (rev 11)
00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB Universal Host Controller (rev 05)
00:1a.1 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB Universal Host Controller (rev 05)
00:1a.2 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB Universal Host Controller (rev 05)
00:1a.7 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)
00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 05)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05)
00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 05)
00:1c.5 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 (rev 05)
00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB Universal Host Controller (rev 05)
00:1d.1 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB Universal Host Controller (rev 05)
00:1d.2 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB Universal Host Controller (rev 05)
00:1d.3 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB Universal Host Controller (rev 05)
00:1d.7 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation 5 Series Chipset LPC Interface Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 05)
01:00.0 VGA compatible controller: nVidia Corporation G96 [GeForce 9400 GT] (rev a1)
03:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
03:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
05:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)
I'm out of ideas save for replacing my mobo. Any idea what I could be missing?
Offline
I get a similar problem and I wonder about memory. What is the memory output if you install and run lshw ?
Offline
Memory section output of lshw:
*-memory
description: System Memory
physical id: 19
slot: System board or motherboard
size: 6GiB
*-bank:0
description: DIMM 1600 MHz (0.6 ns)
physical id: 0
slot: A0
size: 2GiB
width: 2244 bits
clock: 1600MHz (0.6ns)
*-bank:1
description: DIMM 1600 MHz (0.6 ns)
physical id: 1
slot: A1
size: 2GiB
width: 2244 bits
clock: 1600MHz (0.6ns)
*-bank:2
description: DIMM 1600 MHz (0.6 ns)
physical id: 2
slot: A2
size: 2GiB
width: 2244 bits
clock: 1600MHz (0.6ns)
*-bank:3
description: DIMM [empty]
physical id: 3
slot: A3
Offline
I have the same crash problem (usually over ssh) and in lshw I get that the width is very large in bits. I think it should be either 32 or 64 bits if memory is working correctly with the system. I have no idea how to fix, though.
Offline
I've been getting the same problem (and it sounds like they might have it in another thread too). I can't replicate it - it often happens when I'm away from the computer - and no logs show anything interesting; just ordinary MARKs that abruptly end. For me, lshw is giving a width of 64 bits, so perhaps a large width is not the problem. The other thread speculated on nvidia graphics drivers being the problem; I have those drivers, but it looks like you have Intel graphics, so maybe graphics drivers aren't the problem either. All I can be certain of is that it started after upgrading to kernel 2.6.37.
Offline
I'm not sure what's up with my memory width, its the same (2244 bits) when running on a live CD also. I've updated my BIOS but the discrepancy still remains.
I actually have a nVidia card (GeForce 9400 GT) and have always used their proprietary drivers. I don't think my video card is the problem, but I could give the other drivers a shot.
Offline
try their beta drivers... I was having similar freeze issues. Nothing was in the logs and this fixed the issue for me. HTH
DF
Offline
I've become convinced the problem is with kernel 2.6.37. First I tried switching to nouveau drivers; then I tried using no video drivers and just working from console. In each case, I still experienced random lockups. In these cases, I saw a long error trace on lockup, but still no evidence in logs, so I think it's same thing.
This morning, I upgraded to kernel 2.6.37.5-1 and nvidia 270.30-3, and still got a lockup. I'm now trying kernel26-lts along with nvidia-lts, and haven't seen a lockup yet (I don't know why I didn't think of this earlier). So I have to conclude that the kernel itself is to blame.
Offline
Do you use your wireless interface? My money's on that. I have trouble with mine since 2.6.37. Try tunning on wired for a while and see if it goes away.
Matt
"It is very difficult to educate the educated."
Offline
I do use wireless; wired, unfortunately, isn't an option. But I'll keep that in mind, thanks. Maybe I can avoid crashes just by unloading the wireless driver (rt2500usb).
EDIT: I managed to get 48 hours of uptime by only intermittently connecting to wireless. Then I got greedy and left it connected for too long and it froze up. So this must be my problem; now, how to fix it...
Last edited by Tempel (2011-03-31 15:32:37)
Offline