You are not logged in.

#1 2011-11-11 22:56:43

rWarrior
Member
Registered: 2008-12-28
Posts: 26

Random reboots on AMD Operton 6100 system

Hardware:

    description: Desktop Computer
    product: H8QG6 (1234567890)
    vendor: Supermicro
    version: 1234567890
    serial: 1234567890
    width: 64 bits
    capabilities: smbios-2.6 dmi-2.6 vsyscall64 vsyscall32
    configuration: boot=normal chassis=desktop family=1234567890 sku=1234567890 uuid=48385147-3600-0030-48FE-003048FE56B2
  *-core
       description: Motherboard
       product: H8QG6
       vendor: Supermicro
       physical id: 0
       version: 1234567890
       serial: 1234567890
       slot: 1234567890
     *-firmware
          description: BIOS
          vendor: American Megatrends Inc.
          physical id: 0
          version: 080016
          date: 07/23/2010
          size: 64KiB
          capacity: 1984KiB
          capabilities: isa pci pnp upgrade shadowing escd cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer int10video acpi usb ls120boot zipboot biosbootspecification
     *-cpu:0
          description: CPU
          product: AMD Opteron(tm) Processor 6128
          vendor: Hynix Semiconductor (Hyundai Electronics)
          physical id: 4
          bus info: cpu@0
          version: D1
          serial: To Be Filled By O.E.M.
          slot: CPU 1
          size: 2GHz
          capacity: 2GHz
          width: 64 bits
          clock: 200MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
          configuration: cores=8 enabledcores=8 threads=8
        *-cache:0
             description: L1 cache
             physical id: 5
             slot: L1-Cache
             size: 1MiB
             capacity: 1MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
        *-cache:1
             description: L2 cache
             physical id: 6
             slot: L2-Cache
             size: 4MiB
             capacity: 4MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
        *-cache:2
             description: L3 cache
             physical id: 7
             slot: L3-Cache
             size: 10MiB
             capacity: 10MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
     *-cpu:1
          description: CPU
          product: AMD Opteron(tm) Processor 6128
          vendor: Hynix Semiconductor (Hyundai Electronics)
          physical id: 8
          bus info: cpu@1
          version: D1
          serial: To Be Filled By O.E.M.
          slot: CPU 2
          size: 2GHz
          capacity: 2GHz
          width: 64 bits
          clock: 200MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
          configuration: cores=8 enabledcores=8 threads=8
        *-cache:0
             description: L1 cache
             physical id: 9
             slot: L1-Cache
             size: 1MiB
             capacity: 1MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
        *-cache:1
             description: L2 cache
             physical id: a
             slot: L2-Cache
             size: 4MiB
             capacity: 4MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
        *-cache:2
             description: L3 cache
             physical id: b
             slot: L3-Cache
             size: 10MiB
             capacity: 10MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
     *-cpu:2
          description: CPU
          product: AMD Opteron(tm) Processor 6128
          vendor: Hynix Semiconductor (Hyundai Electronics)
          physical id: c
          bus info: cpu@2
          version: D1
          serial: To Be Filled By O.E.M.
          slot: CPU 3
          size: 2GHz
          capacity: 2GHz
          width: 64 bits
          clock: 200MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
          configuration: cores=8 enabledcores=8 threads=8
        *-cache:0
             description: L1 cache
             physical id: d
             slot: L1-Cache
             size: 1MiB
             capacity: 1MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
        *-cache:1
             description: L2 cache
             physical id: e
             slot: L2-Cache
             size: 4MiB
             capacity: 4MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
        *-cache:2
             description: L3 cache
             physical id: f
             slot: L3-Cache
             size: 10MiB
             capacity: 10MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
     *-cpu:3
          description: CPU
          product: AMD Opteron(tm) Processor 6128
          vendor: Hynix Semiconductor (Hyundai Electronics)
          physical id: 10
          bus info: cpu@3
          version: D1
          serial: To Be Filled By O.E.M.
          slot: CPU 4
          size: 2GHz
          capacity: 2GHz
          width: 64 bits
          clock: 200MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
          configuration: cores=8 enabledcores=8 threads=8
        *-cache:0
             description: L1 cache
             physical id: 11
             slot: L1-Cache
             size: 1MiB
             capacity: 1MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
        *-cache:1
             description: L2 cache
             physical id: 12
             slot: L2-Cache
             size: 4MiB
             capacity: 4MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
        *-cache:2
             description: L3 cache
             physical id: 13
             slot: L3-Cache
             size: 10MiB
             capacity: 10MiB
             clock: 1GHz (1.0ns)
             capabilities: pipeline-burst internal write-back unified
     *-memory
          description: System Memory
          physical id: 2b
          slot: System board or motherboard
          size: 96GiB
        *-bank:0
             description: DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
             product: 9905413-019.A00LF
             vendor: Kingston
             physical id: 0
             serial: 34314341
             slot: P1_DIMM1B
             size: 4GiB
             width: 64 bits
             clock: 1333MHz (0.8ns)

(banks 0 to 31 populated, 4GiB each. Same make and model (unregistered).


System:

$uname -a
Linux 3.0-ARCH #1 SMP PREEMPT Tue Aug 30 08:53:25 CEST 2011 x86_64 AMD Opteron(tm) Processor 6128 AuthenticAMD GNU/Linux

Symptoms:

The computer initiates reboots without warning every 2-3 days, and seems to reboot every time after running computationally intensive programs (that uses all 32 cores and most of the RAM).


Error message:

Essentially none.

/var/log/errors.log does not contain any logging around the time of the reboot.
/var/log/everything.log also does not contain any logging around the time of the reboot.

/var/log/crond.log
Nov 11 13:35:01 mdt3 crond[1421]: FILE /var/spool/cron/root USER root PID 15371 job sys-daily
Nov 11 13:35:03 mdt3 crond[15394]: mailing cron output for user root job sys-daily
Nov 11 13:35:03 mdt3 crond[15394]: unable to exec /usr/sbin/sendmail: cron output for user root job sys-daily to /dev/null
Nov 11 13:49:01 mdt3 crond[1421]: FILE /var/spool/cron/root USER root PID 16516 job sys-hourly
Nov 11 14:49:01 mdt3 crond[1421]: FILE /var/spool/cron/root USER root PID 21320 job sys-hourly
Nov 11 11:47:47 mdt3 crond[1473]: /usr/sbin/crond 4.5 dillon's cron daemon, started with loglevel info
Nov 11 15:48:02 mdt3 crond[1473]: time disparity of 240 minutes detected
Nov 11 15:49:01 mdt3 crond[1473]: FILE /var/spool/cron/root USER root PID 1653 job sys-hourly
Nov 11 16:49:01 mdt3 crond[1473]: FILE /var/spool/cron/root USER root PID 10861 job sys-hourly

/var/log/user.log is empty, confirming that no users initiated the reboot.


Other signs:
CPU temperatures appear normal (BIOS indicates "low"). conky indicates 36~40. (Temperature is likely closer to 60 when running computationally intensive programs.)
However, I doubt CPU overheating is the cause, since the motherboard would BEEEEEEEP, which is hard to ignore.


Summary:

Workstation with 4 CPU sockets (Opertion 6100, which has 8 cores each), and 96 GB of *unregistered* ECC RAM.
I don't know whether the problems are Archlinux-specific, since I do not use any other OS on this machine. My other Archlinux machines do not experience the same problem.
No error messages appear in system logging.
Only I am logged in. No logging of a user-initiated reboot.

I suspect that since the RAM is unregistered (though has ECC), perhaps some inconsistency occurred in the RAM...

Any suggestions would be helpful.

Last edited by rWarrior (2011-11-11 22:58:15)

Offline

#2 2011-11-11 23:47:37

masteryod
Member
Registered: 2010-05-19
Posts: 433

Re: Random reboots on AMD Operton 6100 system

Is this a new setup?

Hmm... try other OS (you can boot some livecd/flashdrive) and run it the same way and look for reboots, If it fail again on other distribution/other kernel/windows server you should look more into hardware/memory compatibility/bios configuration than OS

Last edited by masteryod (2011-11-11 23:47:50)

Offline

#3 2011-11-12 00:41:01

chpln
Member
From: Australia
Registered: 2009-09-17
Posts: 361

Re: Random reboots on AMD Operton 6100 system

This sounds hardware level.  Can you access the IPMI logs (apparently is integrated on that systemboard) and see if they shed any light?

Offline

#4 2011-11-12 01:00:09

lagagnon
Member
From: an Island in the Pacific...
Registered: 2009-12-10
Posts: 1,087
Website

Re: Random reboots on AMD Operton 6100 system

Try running "mprime" for a couple of hours and check that the results are valid.


Philosophy is looking for a black cat in a dark room. Metaphysics is looking for a black cat in a dark room that isn't there. Religion is looking for a black cat in a dark room that isn't there and shouting "I found it!". Science is looking for a black cat in a dark room with a flashlight.

Offline

#5 2011-11-13 10:40:57

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Random reboots on AMD Operton 6100 system

You could try running memtest or something like to check all ram modules are ok, although I don't know if that could prompt a reboot. My other guess is it is a problem with the PSU, either not powerful enough or it is not working properly.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

Board footer

Powered by FluxBB