You are not logged in.
Hardware:
description: Desktop Computer
product: H8QG6 (1234567890)
vendor: Supermicro
version: 1234567890
serial: 1234567890
width: 64 bits
capabilities: smbios-2.6 dmi-2.6 vsyscall64 vsyscall32
configuration: boot=normal chassis=desktop family=1234567890 sku=1234567890 uuid=48385147-3600-0030-48FE-003048FE56B2
*-core
description: Motherboard
product: H8QG6
vendor: Supermicro
physical id: 0
version: 1234567890
serial: 1234567890
slot: 1234567890
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: 080016
date: 07/23/2010
size: 64KiB
capacity: 1984KiB
capabilities: isa pci pnp upgrade shadowing escd cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer int10video acpi usb ls120boot zipboot biosbootspecification
*-cpu:0
description: CPU
product: AMD Opteron(tm) Processor 6128
vendor: Hynix Semiconductor (Hyundai Electronics)
physical id: 4
bus info: cpu@0
version: D1
serial: To Be Filled By O.E.M.
slot: CPU 1
size: 2GHz
capacity: 2GHz
width: 64 bits
clock: 200MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
configuration: cores=8 enabledcores=8 threads=8
*-cache:0
description: L1 cache
physical id: 5
slot: L1-Cache
size: 1MiB
capacity: 1MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cache:1
description: L2 cache
physical id: 6
slot: L2-Cache
size: 4MiB
capacity: 4MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cache:2
description: L3 cache
physical id: 7
slot: L3-Cache
size: 10MiB
capacity: 10MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cpu:1
description: CPU
product: AMD Opteron(tm) Processor 6128
vendor: Hynix Semiconductor (Hyundai Electronics)
physical id: 8
bus info: cpu@1
version: D1
serial: To Be Filled By O.E.M.
slot: CPU 2
size: 2GHz
capacity: 2GHz
width: 64 bits
clock: 200MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
configuration: cores=8 enabledcores=8 threads=8
*-cache:0
description: L1 cache
physical id: 9
slot: L1-Cache
size: 1MiB
capacity: 1MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cache:1
description: L2 cache
physical id: a
slot: L2-Cache
size: 4MiB
capacity: 4MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cache:2
description: L3 cache
physical id: b
slot: L3-Cache
size: 10MiB
capacity: 10MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cpu:2
description: CPU
product: AMD Opteron(tm) Processor 6128
vendor: Hynix Semiconductor (Hyundai Electronics)
physical id: c
bus info: cpu@2
version: D1
serial: To Be Filled By O.E.M.
slot: CPU 3
size: 2GHz
capacity: 2GHz
width: 64 bits
clock: 200MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
configuration: cores=8 enabledcores=8 threads=8
*-cache:0
description: L1 cache
physical id: d
slot: L1-Cache
size: 1MiB
capacity: 1MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cache:1
description: L2 cache
physical id: e
slot: L2-Cache
size: 4MiB
capacity: 4MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cache:2
description: L3 cache
physical id: f
slot: L3-Cache
size: 10MiB
capacity: 10MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cpu:3
description: CPU
product: AMD Opteron(tm) Processor 6128
vendor: Hynix Semiconductor (Hyundai Electronics)
physical id: 10
bus info: cpu@3
version: D1
serial: To Be Filled By O.E.M.
slot: CPU 4
size: 2GHz
capacity: 2GHz
width: 64 bits
clock: 200MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
configuration: cores=8 enabledcores=8 threads=8
*-cache:0
description: L1 cache
physical id: 11
slot: L1-Cache
size: 1MiB
capacity: 1MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cache:1
description: L2 cache
physical id: 12
slot: L2-Cache
size: 4MiB
capacity: 4MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-cache:2
description: L3 cache
physical id: 13
slot: L3-Cache
size: 10MiB
capacity: 10MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
*-memory
description: System Memory
physical id: 2b
slot: System board or motherboard
size: 96GiB
*-bank:0
description: DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
product: 9905413-019.A00LF
vendor: Kingston
physical id: 0
serial: 34314341
slot: P1_DIMM1B
size: 4GiB
width: 64 bits
clock: 1333MHz (0.8ns)
(banks 0 to 31 populated, 4GiB each. Same make and model (unregistered).
System:
$uname -a
Linux 3.0-ARCH #1 SMP PREEMPT Tue Aug 30 08:53:25 CEST 2011 x86_64 AMD Opteron(tm) Processor 6128 AuthenticAMD GNU/Linux
Symptoms:
The computer initiates reboots without warning every 2-3 days, and seems to reboot every time after running computationally intensive programs (that uses all 32 cores and most of the RAM).
Error message:
Essentially none.
/var/log/errors.log does not contain any logging around the time of the reboot.
/var/log/everything.log also does not contain any logging around the time of the reboot.
/var/log/crond.log
Nov 11 13:35:01 mdt3 crond[1421]: FILE /var/spool/cron/root USER root PID 15371 job sys-daily
Nov 11 13:35:03 mdt3 crond[15394]: mailing cron output for user root job sys-daily
Nov 11 13:35:03 mdt3 crond[15394]: unable to exec /usr/sbin/sendmail: cron output for user root job sys-daily to /dev/null
Nov 11 13:49:01 mdt3 crond[1421]: FILE /var/spool/cron/root USER root PID 16516 job sys-hourly
Nov 11 14:49:01 mdt3 crond[1421]: FILE /var/spool/cron/root USER root PID 21320 job sys-hourly
Nov 11 11:47:47 mdt3 crond[1473]: /usr/sbin/crond 4.5 dillon's cron daemon, started with loglevel info
Nov 11 15:48:02 mdt3 crond[1473]: time disparity of 240 minutes detected
Nov 11 15:49:01 mdt3 crond[1473]: FILE /var/spool/cron/root USER root PID 1653 job sys-hourly
Nov 11 16:49:01 mdt3 crond[1473]: FILE /var/spool/cron/root USER root PID 10861 job sys-hourly
/var/log/user.log is empty, confirming that no users initiated the reboot.
Other signs:
CPU temperatures appear normal (BIOS indicates "low"). conky indicates 36~40. (Temperature is likely closer to 60 when running computationally intensive programs.)
However, I doubt CPU overheating is the cause, since the motherboard would BEEEEEEEP, which is hard to ignore.
Summary:
Workstation with 4 CPU sockets (Opertion 6100, which has 8 cores each), and 96 GB of *unregistered* ECC RAM.
I don't know whether the problems are Archlinux-specific, since I do not use any other OS on this machine. My other Archlinux machines do not experience the same problem.
No error messages appear in system logging.
Only I am logged in. No logging of a user-initiated reboot.
I suspect that since the RAM is unregistered (though has ECC), perhaps some inconsistency occurred in the RAM...
Any suggestions would be helpful.
Last edited by rWarrior (2011-11-11 22:58:15)
Offline
Is this a new setup?
Hmm... try other OS (you can boot some livecd/flashdrive) and run it the same way and look for reboots, If it fail again on other distribution/other kernel/windows server you should look more into hardware/memory compatibility/bios configuration than OS
Last edited by masteryod (2011-11-11 23:47:50)
Offline
This sounds hardware level. Can you access the IPMI logs (apparently is integrated on that systemboard) and see if they shed any light?
Offline
Try running "mprime" for a couple of hours and check that the results are valid.
Philosophy is looking for a black cat in a dark room. Metaphysics is looking for a black cat in a dark room that isn't there. Religion is looking for a black cat in a dark room that isn't there and shouting "I found it!". Science is looking for a black cat in a dark room with a flashlight.
Offline
You could try running memtest or something like to check all ram modules are ok, although I don't know if that could prompt a reboot. My other guess is it is a problem with the PSU, either not powerful enough or it is not working properly.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline