mcelog

Langue: en

Version: 385666 (fedora - 01/12/10)

Section: 8 (Commandes administrateur)

NAME

mcelog - Print machine check log from x86-64 kernel.

SYNOPSIS

mcelog [options] [device]
mcelog [options] --ascii
mcelog [options] --drop-old-memory
mcelog [options] --reset-memory locator
mcelog [options] --dump-memory[=locator]
mcelog --version

DESCRIPTION

Linux x86-64 kernels since 2.6.4 don't print recoverable machine check errors to the kernel log anymore. Instead they are saved into a special kernel buffer accessible using /dev/mcelog. mcelog reads /dev/mcelog and prints the stored machine check records to stdout. Then the stored machine check records in the kernel buffer are deleted.

Mcelog normally runs as a regular cron job to log kernel machine check events to disk. On newer kernels it can also be triggered directly using the /sys/devices/system/machinecheck/machinecheck0/trigger trigger. In addition mcelog can be used on the command line to decode an existing machine check record in ascii format with the --ascii option.

When the --syslog option is specified redirect output to system log. The --syslog-error option causes the normal machine checks to be logged as LOG_ERR (implies --syslog ). Normally only fatal errors or high level remarks are logged with error level. Some one line summies of errors are also always logged to the syslog by default unless mcelog operates in --ascii mode.

When the --logfile=file option is specified append log output to the specified file. With the --no-syslog option mcelog will never log anything to the syslog.

When --k8 is specified assume the events are for a AMD Opteron or Athlon 64 or Athlon FX CPU. With --p4 is specified assume the events are for a Intel Pentium 4 or Intel (older) Xeon. With --core2 assume the events are for a Intel Core2 CPU or Intel Xeon 3000, 3200, 5100, 5300, 7300 series. When --intel-cpu=family,model are specified then the family number and model number of the Intel CPU to be decoded should be specified (can be found in /proc/cpuinfo). When --generic all the fields are dumped without CPU specific decoding. Default is to decode for the CPU mcelog is running on or when the kernel is new enough to output the PROCESSOR field with the mce event to decode for the CPU reported by the kernel.

With the --dmi option mcelog will look up the addresses reported in machine checks in the SMBIOS/DMI tables of the BIOS. This can sometimes tell you which DIMM or memory controller has developed a problem. More often the information reported by the BIOS is either subtly or obviously wrong or useless. This option requires that mcelog has read access to /dev/mem (normally requires root) and runs on the same machine in the same hardware configuration as when the machine check event happened.

When --ignorenodev is specified then mcelog will exit silently when the device cannot be opened. This is useful in virtualized environment with limited devices.

When --filter is specified mcelog will filter out known broken machine check events.

When --raw is specified mcelog will not decode, but just dump the mcelog in a raw hex format. This can be useful for automatic post processing.

When a device is specified the machine check logs are read from device instead of the default /dev/mcelog.

With the --ascii option mcelog decodes a fatal machine check panic generated by the kernel ("CPU n: Machine Check Exception ...") in ASCII from stdout. This is useful to make sense of the hexadecimal numbers in there. Note that when the panic comes from a different machine than where mcelog is running on you might need to specify the correct architecture ( --k8 or --p4 or --core2 ) on older kernel. On newer kernels which output the PROCESSOR field this is not needed anymore.

With the --daemon option mcelog will run in the background and continuously poll for machine checks for the kernel. This gives the fastest reaction time, but the normal recommended operating mode is running as a cronjob or from the kernel trigger (through /sys/devices/system/machinecheck/machinecheck0/trigger). This option implies --syslog.

--database filename specifies the memory module error database file. Default is /var/lib/memory-errors. It is only used together with DMI decoding.

--error-trigger=cmd,thresh When a memory module accumulates thresh errors in the err database run command cmd.

--drop-old-memory Drop old DIMMs in the memory module database that are not plugged in anymore.

--reset-memory=locator When the DIMMs have suitable unique serial numbers mcelog will automatically detect changed DIMMs. When the DIMMs don't have those the user will have to use this option when changing a DIMM to reset the error count in the error database. Locator is the memory slot identifier printed on the motherboard.

--dump-memory[=locator] Dump error database information for memory module located at locator. When no locator is specified dump all.

--version displays the version of mcelog and exits.

NOTES

The kernel prefers old messages over new. If the log buffer overflows only old ones will be kept.

The exact output depends on the CPU.

SMBIOS/DMI output is unreliable and sometimes wrong. Not Linux's fault - complain to your motherboard vendor. mcelog does some sanity checks to distingush bad BIOS from good BIOS but it is not 100% fool proof.

mcelog will report serious errors to the syslog during decoding.

FILES

/dev/mcelog (char 10, minor 227)
/var/lib/memory-errors
Memory error database

SEE ALSO

AMD x86-64 architecture programmer's manual, Volume 2, System programming
IA32 Intel Architecture Software developer's manual, Volume 3, System programming guide
Datasheet of your CPU.