As your business continues to develop, your need for underlying resources will also increase. With more and more basic resources, the efficiency of daily monitoring becomes a bottleneck in OPS. Tencent Cloud Cloud Monitoring (CM) provides a solution for large-scale resource monitoring scenarios to customers who have a large number of resources.
With a large number of resources, it is impossible to view the metric data of all cloud resources one by one. Even if you do, you still cannot compare the data in a global way, nor detect exceptions. There are two key points when monitoring a large number of resources:
Using Cloud Virtual Machine (CVM) as an example:
CVM resources are classified and managed by service and cluster in projects. The resources of different services or clusters reside in different projects.
Log in to Cloud Monitoring Console.
In the left sidebar, click Dashboard to access the dashboard management page.
In the upper-left corner of the page, click Add Monitoring Dashboard to add a dashboard.
Click add Monitoring Chart, and in the pop-up configuration box, configure the monitoring item.
After the configuration is completed, click OK to complete the creation.
Sum up the data of all servers to calculate the total bandwidth used by a service or cluster according to bandwidth-related metrics.
Min values of the monitoring data of all servers according to performance metrics, such as the CPU utilization, and display them in one chart. You can then obtain the average, maximum, and minimum CPU utilization of a service or cluster.
Detect abnormal data in an aggregation view. According to the overall trend of resource aggregation curves and their comparisons, you can understand the overall trends and exceptions in the performance data of resources.
For example, you can determine whether the current bandwidth is abnormal by comparing the inbound and outbound bandwidth curves as well as the overall trend of bandwidth curves. You can also determine the overall status of resources and whether abnormal resources exist by comparing the average, maximum, and minimum CPU utilization.
Locate a specific abnormal object.
Click the curve at a specific time point to show a list of corresponding instances sorted by performance. You can change the sorting order and metrics, or switch the data displayed in the list by clicking different points in the curve.
Hover over an instance in the sorted list. The monitoring data curve corresponding to this instance is highlighted in the curve above. Compare and analyze the monitoring curve of this instance and the overall aggregated data curve to further determine the current and historical exceptions of the instance.
After confirming the specific abnormal object via the previous two steps, click the name of the abnorml object in the list to open the monitoring details page for further troubleshooting.
So far, you have completed the processes of creating a monitoring view, viewing the monitoring view, detecting an exception, and locating the exception. The chart and the sorting list allow you to intuitively view the running status of all resources, locate the specific abnormal object, and analyze the trend of an exception. This provides an effective solution to solve inefficiency in large-scale resource monitoring and difficulty in detecting exceptions.
Currently, a maximum of 12 CVM instances can be added to each chart on the dashboard. If this does not meet your needs, you can submit a ticket to raise the limit.
In addition to aggregation views, you can also use a details view to detect and locate exceptions in large-scale resources.
Details view: The curves of all instances are displayed in the same chart.
Aggregation view: The curves of all instances are computed and aggregated into one or more curves by using a custom statistical method.
Create a details view.
Creating a details view Creating a details view is similar to creating an aggregation view. When creating a details view, you do not need to select the statistical method.
Detect exceptions in a details view.
Locate a specific abnormal object.
You can also use a details view in conjunction with a chart and a sorted list to locate a specific abnormal object. The overall process is similar to that for an aggregation view. For more information, see the sixth item for aggregation views above.
Currently, a maximum of 12 CVM instances can be added to each chart on the dashboard. If this cannot meet your needs, you can submit a ticket to raise the limit.