CentOS Kdump

Installing kdump

yum install kexec-tools

Configuring the Memory Usage

Changing Memory Options in GRUB2

Open the /etc/default/grub configuration file as root using a plain text editor such as vim or Gedit.

In this file, locate the line beginning with GRUB_CMDLINE_LINUX. The line will look similar to the following:

GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root rhgb quiet"

Note the highlighted crashkernel= option; this is where the reserved memory is configured.

Change the value of the crashkernel= option to the amount of memory you want to reserve. For example, to reserve 128 MB of memory, use the following:

crashkernel=128M

Minimum amount of reserved memory required for kdump: 2 GB and more should reserved 160 MB + 2 bits for every 4 KB of RAM. For a system with 1 TB of memory, 224 MB is the minimum (160 + 64 MB).

Configuring kdump on the Command Line

Configuring the Core Collector

To reduce the size of the vmcore dump file, kdump allows you to specify an external application (a core collector) to compress the data, and optionally leave out all irrelevant information. Currently, the only fully supported core collector is makedumpfile.

To enable the core collector, as root, open the /etc/kdump.conf configuration file in a text editor, remove the hash sign (“#”) from the beginning of the #core_collector makedumpfile -l --message-level 1 -d 31 line, and edit the command line options as described below.

To enable the dump file compression, add the -c parameter. For example:

core_collector makedumpfile -c

Configuring the Default Action

default reboot

Enabling the Service

systemctl enable kdump.service
systemctl start kdump.service

Testing the kdump Configuration

~]# systemctl is-active kdump
active

Then type the following commands at a shell prompt:

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

This will force the Linux kernel to crash, and the address-YYYY-MM-DD-HH:MM:SS/vmcore file will be copied to the location you have selected in the configuration (that is, to /var/crash/ by default).

分析 Core Dump

将上面产生的 vmcore 文件放在一个具有相同内核的服务器中进行调试,不要在生产环境中调试,同时确保 /etc/yum.confexclude 不包含 kernel/etc/yum.repos.d/CentOS-Debuginfo.repobase-debuginfo 部分 enabled 为 1。

yum install crash
yum install yum-utils
wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Debug-7 http://vault.centos.org/RPM-GPG-KEY-CentOS-Debug-7
debuginfo-install kernel

注意: kernel-debuginfo 在国内无镜像站,可以提前从 http://debuginfo.centos.org/7/x86_64/ 下载 rpm 包,使用以下替代方式安装:

yum install kernel-debuginfo-3.10.0-327.el7.x86_64.rpm kernel-debuginfo-common-x86_64-3.10.0-123.el7.x86_64.rpm

使用 crash 分析 kernel crash 的原因:

crash /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux vmcore

crash 7.1.2-3.el7_2.1
Copyright (C) 2002-2014  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 40
        DATE: Mon Aug  8 15:02:49 2016
      UPTIME: 45 days, 03:42:44
LOAD AVERAGE: 0.63, 0.43, 0.25
       TASKS: 934
    NODENAME: server-64
     RELEASE: 3.10.0-327.el7.x86_64
     VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015
     MACHINE: x86_64  (2299 Mhz)
      MEMORY: 63.8 GB
       PANIC: "SysRq : Trigger a crash"
         PID: 122956
     COMMAND: "bash"
        TASK: ffff88084e667300  [THREAD_INFO: ffff88078f2a0000]
         CPU: 14
       STATE: TASK_RUNNING (SYSRQ)

crash>

Reference

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Kernel_Crash_Dump_Guide/chap-installing-configuring-kdump.html

Loading Disqus comments...
Table of Contents