High CPU Utilization

Last updated: 2021-06-01 16:23:51

    Symptoms

    A TencentDB for MySQL instance encounters such issues as slow response, failure to get connections, and timeout. The cause is a high CPU utilization of more than 80%.

    You can view the CPU utilization of a TencentDB for MySQL instance on the instance monitoring page in the TencentDB for MySQL console or in the DBbrain console.

    Impact

    If MySQL's CPU utilization remains high for a prolonged time, the overall performance of the database will be severely compromised, and under extreme circumstances, instances may be hung.

    When the HA system detects such an issue, it will trigger a source-replica switch to maintain the high availability of your business. During the switch, instances are usually unavailable for less than 60 seconds. If the switch occurs during peak hours, business stability and continuity will be seriously affected.

    To protect your business from being affected by CPU resource shortage, we recommend that you optimize the application or upgrade the CPU resources for the instance with a high CPU utilization. A source-replica switch is accompanied by a disconnection lasting for just seconds; therefore, for persistent connections, your application should have a reconnection mechanism.

    Possible Causes

    MySQL's CPU resources are mainly used by system threads and user threads. Therefore, if CVMs are for exclusive use by your TencentDB for MySQL instances, you can solve most of the issues just by focusing on the two types of threads.

    User threads

    In most cases, busy user threads are caused by slow queries, heavy computation, and high QPS (queries per second).

    • Slow queries
      Querying that involves ORDER BY, GROUP BY, temp tables, joins, etc. is so inefficient that the computation of a single SQL statement takes much longer CPU time.
    • Heavy computation
      Heavy computation is caused just by huge amounts of data.
    • High QPS
      CPU time is prolonged just by a high QPS. For example, if a four-core server sustains a high QPS of 20,000 to 30,000, the total CPU time can be very long even when the CPU time of a single SQL statement is short.

    System threads

    In a production environment, system thread issues are less frequent. In general, the CPU utilizations of multiple system threads are rarely too high or close to 100% at the same time as long as the CVM has at least four available CPU cores. However, there are a few bugs that may affect the CPU utilization, as shown in the figure below:

    Solutions

    As most CPU issues are caused by busy user threads, the following sections focus on the solutions to high CPU utilization caused by user threads.

    • Slow queries: to identify and optimize slow queries, we recommend DBbrain. For more information, please see Slow queries.
    • Heavy computation: to solve the high CPU utilization issue caused by huge amounts of data, please see heavy computation.
    • High QPS: to solve the high CPU utilization issue caused by too many access requests, please see High QPS.

    Troubleshooting Procedure

    Slow queries

    Use DBbrain to identify and optimize the SQL statements which cause a high CPU utilization:

    In MySQL, slow query time (long_query_time) is set to 10 seconds by default. After a performance issue occurs, if no slow query is found, it is recommended to adjust the parameter value to 1 second and then observe whether there are slow queries in a business cycle, and if yes, optimize the slow queries accordingly. After the parameter is adjusted, if still no slow queries are found but the CPU utilization remains high, you are recommended to upgrade the CPU configuration so as to improve the overall performance of the database.

    Heavy computation

    When MySQL handles huge amounts of data, its CPU utilization can be high, even if the indexes and query execution plans work well. Moreover, such an issue can still occur at a low concurrency due to MySQL’s one-thread-per-connection feature

    Generally, there are two common solutions:

    • Enable read/write separation. Run this type of queries on a read-only replica node where the business access pressure is low.
    • Optimize your program to split a large SQL query into smaller ones.

    High QPS

    • Upgrade CPU configurations to improve the overall database performance.
    • Use read-only instances to share the load of the source instance.
    • Optimize query statements to enhance efficiency.