The status check provides a report of instance exceptions. This document mainly describes the symptoms, causes and solutions of kernel and IO problems shown in the status check report.
The kernel failure may cause login failure or abnormal restart.
The kernel hung task is based on a single kernel thread named as khungtaskd
, which monitors processes in the TASK_UNINTERRUPTIBLE
status. If a process stuck in D state during the period specified by kernel.hung_task_timeout_secs
(defaults to 120 seconds), the stack information of this hung task process will be printed.
If kernel.hung_task_panic=1
is configured, the hung task will trigger kernel panic and system restart.
A soft lockup refers to a kernel thread using and not releasing a CPU, without giving other tasks a chance to run. Each CPU is assigned with a timed kernel thread watchdog/x
. If this thread is not executed during the specified period (the default period is two times the kernel.watchdog_thresh
value. For example, the default kernel.watchdog_thresh
value is 10 seconds for a 3.10 kernel), soft lockup occurs.
If kernel.softlockup_panic=1
is configured, the soft lockup will trigger kernel panic and system restart.
A kernel panic refers to a kernel crash that causes the abnormal restart. The kernel panic will be generally caused by:
kernel.hung_task_panic=1
configured.kernel.softlockup_panic=1
configured.Due to the difficulty, we recommend you submit a ticket for the troubleshooting.
Problem: the error message “No space left on device” is prompted when you create a file. After you run the df -i
command, you will see inode is 100% used.
Common causes: the file system exhausted all inodes.
Procedure: delete useless files or expand the disk.
Problem: the error message “No space left on device” is prompted when you create a file. After you run the df -h
command, you will see the disk space is 100% used.
Common causes: the disk space runs out.
Procedure: delete useless files or expand the disk.
Problem: the file system can read files only without creating one.
Common cause: the file system is damaged.
Procedure:
We recommend directly restarting the instance, please see Restart Instances.
Problem: the instance lags, and responds slowly or stop responding to the SSH or VNC login.
Common cause: high IO causes the disk %util to reach 100%.
Procedure: check the high IO status, and assess whether to reduce IO reads/writes, or use a disk with higher performance.
Was this page helpful?