Tencent Cloud EMR provides a complete monitoring system for the ClickHouse cluster, which consists of cluster overview, service monitoring, and server monitoring.
The cluster overview page displays the overview information of the ClickHouse cluster, such as the cluster running status, number of servers, and ZooKeeper status.
In addition, it displays the aggregate service and server metric information by cluster, so that you can intuitively view the overall cluster running status.
ClickHouse cluster service monitoring is simple, as the services include only ClickHouse and ZooKeeper (for HA clusters). In the role list on the service monitoring page, you can view basic information of roles. ZooKeeper has only one role (
Zookeeper), and ClickHouse also has only one role (
ClickHouse-Server). You can enter the server monitoring page of a server in the node IP section.
Under a specific role, you can view its detailed service monitoring data in a time period up to the last 30 days and at a desired time granularity. In addition, you can select the metrics to be displayed. Click Set Metric and you can view all the monitoring metrics of the role, all of which can also be previewed. You can check the needed metrics to display them by default. Currently, up to 12 metrics can be displayed.
Here, ClickHouse monitoring metrics can be divided into three groups, which come from three ClickHouse system tables
Server monitoring divides into server monitoring overview page and server monitoring details page.
The server monitoring overview page displays the server monitoring metrics of the cluster. Currently, 12 aggregate metrics in four dimensions of CPU, memory, disk, and network are provided to show the overall server resource usage in the cluster. Similar to service monitoring, you can set the aggregate metrics to be displayed in Set Metric.
In addition, server monitoring provides the heat map feature that shows each server's load concerning a server metric in the specified time period. Taking memory utilization as an example, the curve above shows the memory utilization of the cluster. In the load distribution section, each small square represents a server, and the color represents the server utilization range. The deeper the color, the higher the memory utilization. Statistics are displayed in descending order, and top 3 servers with the highest memory utilization are displayed by default, making it easy for you to find the differences between servers.
The monitoring overview page also displays the list of all nodes in the cluster. You can filter the nodes by type, search for them by IP, or sort and display them by CPU, memory, and disk utilization. You can click the IP of a node to enter its server monitoring details page.
The server monitoring details page consists of four sections: basic configuration, deployment status, load status, and server monitoring.
The complete ClickHouse cluster monitoring system is constructed by integrating cluster overview, service monitoring, and server monitoring, which greatly facilitates ClickHouse cluster OPS.