The event center of Cloud Monitor (CM) currently provides the following monitoring information for product events:
Event Name |
Event Parameter |
Event Type |
Dimension | Recoverable |
Description | Troubleshooting Methods |
---|---|---|---|---|---|---|
Kernel failure | GuestCoreError | Exception |
CVM instance | No | An OS kernel bug or driver issue causes a fatal error in the OS kernel | 1. Check whether any kernel driver modules are loaded into the system other than those provided by the kernel. Try not to load these modules and observe the operating status of the system. 2. View released bug reports of the kernel and OS, and try to upgrade the kernel. 3. By default, kdump is enabled for CVM. When a panic occurs, system memory dump information will be generated in the /var/crash directory. You can analyze it with the crash tool |
OOM | GuestOom | Exception |
CVM instance | No | System memory usage is overloaded | 1. Check whether the memory capacity configured in the current system meets business requirements. If additional capacity is required, we recommend upgrading the CVM memory configuration. 2. View processes that are killed during OOM based on system logs such as dmesg and /var/log/messages to check whether the memory used by processes is as expected. Use tools such as valgrind to analyze whether memory leakage occurs |
ping failure | PingUnreachable | Exception |
CVM instance | Yes | The network of the CVM instance is not pingable | 1. Check whether the running status of the CVM instance is normal. If any exceptions occur (for example, the system crashes), force restart the CVM instance in the console to restore it. 2. If the CVM instance is running normally, check the CVM network configuration. This includes the internal network service configuration, firewall configuration, and security group configuration |
Read-only disk | DiskReadonly | Exception |
CVM instance | Yes | Data cannot be written into the disk | 1. Check whether the disk is full. 2. In Linux, run the `df -i` command to check whether inode is used up. 3. Check whether the file system is damaged |
Server restart | GuestReboot | Status change |
CVM instance | Yes | The CVM instance restarts | This event is triggered when the CVM restarts. Check whether the status change is as expected based on actual scenarios |
Packet loss occurs when the public network outbound bandwidth exceeds the limit | PacketDroppedByQosWanOutBandwidth | Exception |
CVM instance | Yes | The public network outbound bandwidth of the CVM instance exceeds the upper limit, causing packet loss. Packet loss caused by bandwidth glitches is not reflected on the bandwidth view because the minimum granularity for bandwidth statistics is 10 (total traffic in 10 seconds/10 seconds). If the constant bandwidth is not exceeded by much, it can be ignored | Increase the upper limit of the public network bandwidth. If the maximum purchase limit has been reached, you can reduce the bandwidth consumption of the server through load balancing and other means |
CVM nvme device error | NvmeError | Exception |
CVM instance | No | CVM nvme disk failure | 1. Isolate the read/write of the disk and unmount the corresponding directory 2. Submit a ticket and wait for the technical personnel to replace the disk 3. After the disk is replaced, format the new disk before use |
Event Name |
Event Parameter |
Event Type |
Dimension | Recoverable |
Description | Troubleshooting Methods |
---|---|---|---|---|---|---|
Blocked public IP | VipBlockInfo | Exception |
CLB instance | Yes | CLB public IP under attack is blocked after an exception is detected by the security system | Submit a ticket to query causes and solutions |
Server port status has an exception | RsPortStatusChange | Exception |
Real server port | Yes | An exception is found at the real server port of the public network CLB instance during health check | View the service status of the real server port |
Event Name |
Event Parameter |
Event Type |
Dimension | Recoverable |
Description | Troubleshooting Methods |
---|---|---|---|---|---|---|
Packet loss occurs when the public network outbound bandwidth exceeds the limit | PacketDroppedByQosWanOutBandwidth | Exception |
VPN Gateway instance | Yes | The public network outbound bandwidth of a VPN gateway instance exceeds the upper limit, causing packet loss. Packet loss caused by bandwidth glitches is not reflected on the bandwidth view because the minimum granularity for bandwidth statistics is 10 (total traffic in 10 seconds/10 seconds). If the constant bandwidth is not exceeded by much, it can be ignored | Increase the upper limit for public network bandwidth |
Packet loss occurs when the number of connections exceeds the limit | PacketDroppedByQosConnectionSession | Exception |
VPN Gateway instance | Yes | The number of connections to the VPN Gateway instance exceeds the limit, causing packet loss | Submit a ticket to contact us |
Event Name |
Event Parameter |
Event Type |
Dimension | Recoverable |
Description | Troubleshooting Methods |
---|---|---|---|---|---|---|
Node exception | NodeNotReady | Exception |
Node | Yes | A node exception may be caused by network disconnection, node kubelet exception, container OOM and more. If the node exception lasts for a long time, Kubernetes will drain containers on the node | 1. Check on the CVM monitoring page whether the node is running. 2. Log in to the CVM to check whether kubelet is running normally. 3. Log in to the CVM to check whether docker is running normally |
Node disk capacity will run out soon | NodeHasDiskPressure | Exception |
Node | Yes | The disk (cbs or root) capacity used for container and image storage on the node will run out soon. NodeOutOfDisk will be triggered after the capacity runs out, and new containers cannot be scheduled to this node | Clean up the disk or container images no longer in use |
Node disk capacity has run out | NodeOutOfDisk | Exception |
Node | Yes | The disk (cbs or root) capacity used for container and image storage on the node has run out, and new containers cannot be scheduled to this node | Clean up the disk or container images no longer in use |
Node memory will run out soon | NodeHasInsufficientMemory | Exception |
Node | Yes | Node memory utilization is high | Expand the node or schedule containers to other nodes |
Node OOM | SystemOOM | Exception |
Node | No | OOM occurs on the node due to high memory utilization | Check the causes of OOM on the node by querying monitoring data, syslog, demsg, and more |
Node network unreachable | NodeNetworkUnavailable | Exception |
Node | No | The network on the node is not configured properly. Normally, this problem will not occur in clusters created through the console or Tencent Cloud API | Submit a ticket to contact us |
Insufficient inodes on the node | NodeInodePressure | Exception |
Node | No | New containers cannot be created due to insufficient inodes on the node | Check the remaining inodes on the node. Try to clean up container images no longer in use to free up Inode space |
Event Name |
Event Parameter |
Event Type |
Dimension | Recoverable |
Description | Troubleshooting Methods |
---|---|---|---|---|---|---|
OOM | OutOfMemory | Exception |
TencentDB for MySQL instance | Yes | Database memory usage is overloaded | Check whether the memory capacity configured in the database meets business requirements. If additional capacity is required, we recommend upgrading the MySQL memory configuration |
Primary-secondary switch | PrimarySwitch | Exception |
TencentDB for MySQL instance | No | This event can be triggered when a physical machine fails. Check whether the instance status is normal | |
Read-only instance removal | RORemoval | Exception |
TencentDB for MySQL instance | Yes | A read-only instance fails or exceeds the latency threshold | If the read-only group contains only one instance, switch the read traffic after the read-only instance is removed to avoid a single point of failure. We recommend purchasing at least two read-only instances for the group |
Instance migration caused by server failure | ServerfailureInstanceMigration | Exception |
TencentDB for MySQL instance | No | Server failure results in instance migration | The migration time is subject to the maintenance window. Please change the time promptly if needed. The new migration time will be subject to the new maintenance window |
Event Name |
Event Parameter |
Event Type |
Dimension | Recoverable |
Description | Troubleshooting Methods |
---|---|---|---|---|---|---|
Insufficient oplog backup | oplogInsufficient | Exception |
TencentDB for MongoDB instance | No | When TencentDB for MongoDB backs up data, it cannot read the full oplog from the last backup to the current backup. This affects database rollback to any time point in the last 7 days | We recommend adjusting the size or backup frequency of the TencentDB for MongoDB oplog in the MongoDB console. If you do not need this event notification, disable it on the backup interface of the MongoDB console |
The number of connections exceeds the limit | connectionOverlimit | Exception |
TencentDB for MongoDB instance | Yes | The number of connections to the instance exceeds the limit | Check whether the maximum number of connections configured for the TencentDB for MongoDB instance meets business requirements. If additional connections are required, we recommend upgrading the instance configuration |
Primary-secondary switch | primarywitch | Exception |
TencentDB for MongoDB instance | Yes | Primary-secondary switch occurs | This event can be triggered when a physical machine fails. Check whether the instance status is normal |
The disk capacity has run out | instanceOutOfDisk | Exception |
TencentDB for MongoDB instance | Yes | The disk capacity is full and the instance becomes read-only | Clean up the disk |
Instance rollback | instanceRollback | Exception |
TencentDB for MongoDB instance | Yes | Instance data rollback | This event may be triggered if the primary node fails and a primary-secondary switch occurs when some data on the primary node has not been synced to the secondary node. Check whether the instance status is normal |
Event Name |
Event Parameter |
Event Type |
Dimension | Recoverable |
Description | Troubleshooting Methods |
---|---|---|---|---|---|---|
Connection downtime | DirectConnecDown | Exception |
Connection | Yes | Physical link of the connection is interrupted or has an exception | 1. Check whether the physical link has an exception or is interrupted (for example, the fiber cable is cut off, the line is unplugged, etc.) 2. Check whether the receiving port and optical/electrical modules are normal 3. Check whether the network device port is turned off |
Dedicated tunnel downtime | DirectConnectTunnelDown | Exception |
Dedicated tunnel | Yes | Physical link of the connection is interrupted or has an exception | 1. Check whether the physical link has an exception or is interrupted (for example, the fiber cable is cut off, the line is unplugged, etc.) 2. Check whether the receiving port and optical/electrical modules are normal 3. Check whether the network device port is turned off |
Dedicated tunnel BGP session downtime | DirectConnectTunnelBGPSessionDown | Exception |
Dedicated tunnel | Yes | The dedicated tunnel BGP session is interrupted | 1. Check whether the BGP process of the network device is normal 2. Check whether the dedicated tunnel is normal 3. Check whether the physical line is normal |
Alarm for exceeded number of BGP tunnel routes | DirectConnectTunnelRouteTableOverload | Exception |
Dedicated tunnel | No | The number of BGP session routes in a dedicated tunnel exceeds 80% of the threshold | 1. Check whether routes published by the BGP session of the dedicated tunnel have reached 80% of the maximum, which is 100 by default. For more information, see Use Limits) |
Dedicated tunnel BFD detection downtime | DirectConnectTunnelBFDDown | Exception |
Dedicated tunnel | Yes | The dedicated tunnel BFD detection is interrupted | 1. Check whether the dedicated tunnel is normal 2. Check whether the physical line is normal |
Was this page helpful?