Exception Alarms

Last updated: 2021-04-09 10:42:46

    The exception alarms feature displays the information overview of exception alarms (exceptions detected by "24/7 Exception Diagnosis") generated by database instances connected to DBbrain under your account.

    Note:

    Currently, the exception alarms feature is only supported for TencentDB for MySQL (excluding the basic single-node instance).

    Exception Alarm List

    1. Log in to the DBbrain console, select Monitoring & Alarm > Exception Alarm on the left sidebar, and select a database type at the top.

    The exception alarm list displays the basic information of the database instance, risk level, diagnosis items, duration, and operations. In the search bar, you can search instances by instance ID, instance name, diagnosis item, etc. You can also filter instances by region and time.

    • Risk levels include notice, warning, severe, and fatal. You can filter, aggregate, and search for alarms by field. You can also click Details to view specific information about the exception and the optimization suggestion for the exception.
    • There are over 30 diagnosis items for exception diagnosis, such as slow SQL, primary-secondary switch, deadlock, uncommitted transaction, and OOM. You can filter, aggregate, and search for items by field. You can also sort them by duration.

    Ignoring/Unignoring an Alarm

    You can ignore or unignore diagnosis item alarms that are not generated by health checks to filter exception alarms.

    • In the exception alarm list, locate an alarm and click Ignore in the Operation column to ignore it. By doing so, other diagnosis item alarms of the instance generated by the same root cause will also be ignored.

    • In the exception alarm list, an ignored alarm will be grayed out. You can click Unignore to unignore it and other diagnosis item alarms generated by the same root cause.

    Detailed Description of Diagnosis Items

    A diagnosis item is an item diagnosed intelligently, which can be divided into four categories: performance, availability, reliability, and maintainability. Each diagnosis item belongs to only one category.

    Diagnosis Item Name Category Description
    Connectivity check Availability Unable to connect to the database
    Slow insertion, update, or deletion Performance There is a thread pending for a long time
    Slow SQL Performance There is a thread that is in the status of temp table creation, temp table replication, result sorting, etc.
    Row lock wait Performance There is a transaction with lock wait
    Uncommitted transaction Performance There is a thread in sleep status for a long time
    DDL statement metadata lock wait Performance There is a thread running DDL statements with metadata lock wait
    INSERT, UPDATE, and DELETE statement metadata lock wait Performance There is a thread running IUD statements with metadata lock wait
    SELECT statement metadata lock wait Performance There is a thread running SELECT statements with metadata lock wait
    Deadlock Reliability A deadlock is detected in the monitoring data, and the deadlock information exists in INNODB STATUS
    Read-only lock Performance There is a thread with global read-only lock wait
    SQL statement metadata lock wait Performance There is a thread running DDL statements with metadata lock wait
    Waiting for flush tables Performance There is a thread waiting for flush table
    High number of active sessions Performance The number of active sessions exceeds three times the CPU specification of the database instance
    High disk utilization Reliability The disk utilization is too high
    Memory utilization Reliability The memory utilization is too high
    High CPU utilization Performance The CPU utilization is too high
    Low hit rate of table open cache Performance The hit rate of the table open cache is low
    High-risk account Maintainability There are anonymous or password-free accounts
    Big table Maintainability The size of a single table exceeds 10% of the instance disk specification
    I/O replication thread interruption Reliability A replication monitoring metric is exceptional and triggers diagnosis, and there is an I/O thread exception in SHOW SLAVE STUTAS
    SQL replication thread interruption Reliability A replication monitoring metric is exceptional and triggers diagnosis, and there is an SQL thread exception in SHOW SLAVE STUTAS
    Replication delay caused by DDL Reliability A replication monitoring metric is exceptional and triggers diagnosis, and there is a thread running DDL statements with metadata lock wait
    Replication delay caused by transaction Reliability A replication monitoring metric is exceptional and triggers diagnosis, and there is a thread in sleep status with metadata lock wait
    Replication delay caused by read-only lock Reliability A replication monitoring metric is exceptional and triggers diagnosis, and there is a thread with global read-only lock wait
    Primary-secondary switch Availability The primary-secondary switch monitoring metric is exceptional
    Instance migration caused by server failure Availability The monitoring metric of instance migration is exceptional due to server failure
    Read-only instance removal Availability The read-only instance removal monitoring metric is exceptional
    Disk limit exceeded Availability The disk limit monitoring metric is exceptional
    Memory limit exceeded Availability The memory limit monitoring metric is exceptional
    OOM Availability The database memory is overloaded