You are not logged in.
Pages: 1
I have some MCE errors I'd like to investigate:
[ 0.018122] mce: [Hardware Error]: Machine check events logged
[ 0.018130] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: f600000000070f0f
[ 0.018258] mce: [Hardware Error]: TSC 0 ADDR fea10190
[ 0.018343] mce: [Hardware Error]: PROCESSOR 2:700f01 TIME 1521186153 SOCKET 0 APIC 0 microcode 7000106I read that the way to find of what it means is by
/usr/sbin/mcelog --k8 --ascii < myerrorHowever, mcelog is no longer working on Arch and is replaced with rasdaemon.
How can I replicate this particular mcelog functionality with this new tool?
Offline
With the two executables that come with the rasdaemon package, I can't find an interesting option. I also can't read find anything in the two man-pages.
In the past, with 'mcelog', you would enable the service that came with the package. It would then add the explanation message to the journal whenever a machine check event happened. Perhaps 'rasdaemon' works the same, you can just enable the service that comes with the package and then wait until the next event happens?
This seems annoying, but better than nothing, I guess.
Offline
I have it enabled but I haven't seen any relevant message.
Any idea where should I look for it?
Offline
What if you try building mcelog and see if it can decode anything from the error it can not use /dev/mcelog with the arch kernels but can it still decode extracted errors?
git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git
cd mcelog
make
./mcelog --k8 --ascii < myerrorOffline
It does not compile correctly:
cc -c -g -Os -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wstrict-prototypes -Wformat-security -Wmissing-declarations -Wdeclaration-after-statement -o denverton.o denverton.c
cc -c -g -Os -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wstrict-prototypes -Wformat-security -Wmissing-declarations -Wdeclaration-after-statement -o msr.o msr.c
cc -c -g -Os -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wstrict-prototypes -Wformat-security -Wmissing-declarations -Wdeclaration-after-statement -o bus.o bus.c
cc -c -g -Os -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wstrict-prototypes -Wformat-security -Wmissing-declarations -Wdeclaration-after-statement -o unknown.o unknown.c
( printf "char version[] = \"" ; \
if test -e .os_version; then \
cat .os_version | tr -d '\n' ; \
elif command -v git >/dev/null; then \
if [ -d .git ] ; then \
git describe --tags HEAD | tr -d '\n'; \
else \
printf "unknown" ; \
fi ; \
else \
printf "unknown" ; \
fi ; \
printf '";\n' \
) > version.tmp
cmp version.tmp version.c || mv version.tmp version.c
cmp: version.c: No such file or directory
cc -c -g -Os -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wstrict-prototypes -Wformat-security -Wmissing-declarations -Wdeclaration-after-statement -o version.o version.c
cc mcelog.o p4.o k8.o dmi.o tsc.o core2.o bitfield.o intel.o nehalem.o dunnington.o tulsa.o config.o memutil.o msg.o eventloop.o leaky-bucket.o memdb.o server.o trigger.o client.o cache.o sysfs.o yellow.o page.o rbtree.o sandy-bridge.o ivy-bridge.o haswell.o broadwell_de.o broadwell_epex.o skylake_xeon.o denverton.o msr.o bus.o unknown.o version.o -o mcelogOffline
cc mcelog.o p4.o k8.o dmi.o tsc.o core2.o bitfield.o intel.o nehalem.o dunnington.o tulsa.o config.o memutil.o msg.o eventloop.o leaky-bucket.o memdb.o server.o trigger.o client.o cache.o sysfs.o yellow.o page.o rbtree.o sandy-bridge.o ivy-bridge.o haswell.o broadwell_de.o broadwell_epex.o skylake_xeon.o denverton.o msr.o bus.o unknown.o version.o -o mcelogUnless there was an error after that last line that matches my build here which produced mcelog in the local directory
Offline
Right. What should I put under "myerror"?
I am trying the numerical value, but no luck:
# ./mcelog --k8 --ascii < f600000000070f0f
-bash: f600000000070f0f: No such file or directoryOffline
cat myerror
[ 0.018122] mce: [Hardware Error]: Machine check events logged
[ 0.018130] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: f600000000070f0f
[ 0.018258] mce: [Hardware Error]: TSC 0 ADDR fea10190
[ 0.018343] mce: [Hardware Error]: PROCESSOR 2:700f01 TIME 1521186153 SOCKET 0 APIC 0 microcode 7000106
./mcelog --k8 --ascii < myerror
mcelog: Cannot open /dev/mem for DMI decoding: Permission denied
Machine check events logged
mcelog: Unknown CPU type vendor 2 family 22 model 0
Hardware event. This is not a software error.
CPU 0 0 data cache
TIME 1521186153 Fri Mar 16 07:42:33 2018
STATUS 0 MCGSTATUS 0
CPUID Vendor AMD Family 22 Model 0
(Fields were incomplete)
SOCKET 0 APIC 0 microcode 7000106unfortunately not much help on my system
Offline
Ah, got it. Thank you for your guidenance.
I run it as root and got some result.
# ./mcelog --k8 --ascii < myerror
Machine check events logged
mcelog: Unknown CPU type vendor 2 family 22 model 0
Hardware event. This is not a software error.
CPU 0 0 data cache
TIME 1521186153 Fri Mar 16 08:42:33 2018
STATUS 0 MCGSTATUS 0
CPUID Vendor AMD Family 22 Model 0
(Fields were incomplete)
SOCKET 0 APIC 0 microcode 7000106Not sure how reliable is its indication that it is a hardware error or whether some more info can be extracted from it. My processor is AMD Kabini and I get the same output whether I use --k8 or --generic.
Offline
Pages: 1