The instance monitoring feature of TencentDB for PostgreSQL was upgraded on May 11, 2021. In the upgrade, certain legacy monitoring policies were optimized, a large number of new metrics were added, and 5-second monitoring was supported for all metrics. After the upgrade, the collected data and upper limits of certain metrics changed greatly compared to those on legacy versions. This document describes the details of such changes.
CPU Utilization: the upper limit is fixed at 100%, that is, even if the instance overuses idle resources, the metric value will still not exceed 100%. Please set the alarm threshold on the basis of 100%.
Used Storage Capacity: the collected metric data on legacy versions takes into account only data files while excluding other files such as instance log files. However, the actual disk usage needs to be calculated based on the total size of all files on the instance.
Therefore, in order to avoid the deviation between the actual usage and metric value, this metric has been standardized to the sum of storage space occupied by all files such as data files (datafile), redo log files (xlog/WAL), database running log files (pg_log), and database program files, so there will be a big increase in the used storage space as seen in the monitoring metric.
Storage Space Utilization: similar to the "Used Storage Capacity" metric, this metric is calculated as follows: used storage capacity/purchased instance storage capacity * 100%.
Queries Per Second (QPS): the metric name has been changed to QPS, as the original name was ambiguous. The actual collection unit is counts/sec.
Requests: the unit of this metric on legacy versions was "counts/min". As the meaning of this metric has changed after monitoring at the second level is supported, which is the total number of requests in a collection period now, the unit has also been changed to "-". In addition, the average value was previously used for the 1-minute and 5-minute granularities, but the total value is used now. The original monitoring metric has been disused in the new release.
Read Requests: the unit of this metric on legacy versions was "counts/min". As the meaning of this metric has changed after monitoring at the second level is supported, which is the total number of requests in a collection period now, the unit has also been changed to "-". In addition, the average value was previously used for the 1-minute and 5-minute granularities, but the total value is used now. The original monitoring metric has been disused in the new release.
Write Requests: the unit of this metric on legacy versions was "counts/min". As the meaning of this metric has changed after monitoring at the second level is supported, which is the total number of requests in a collection period now, the unit has also been changed to "-". In addition, the average value was previously used for the 1-minute and 5-minute granularities, but the total value is used now. The original monitoring metric has been disused in the new release.
Other Requests: the unit of this metric on legacy versions was "counts/min". As the meaning of this metric has changed after monitoring at the second level is supported, which is the total number of requests in a collection period now, the unit has also been changed to "-". In addition, the average value was previously used for the 1-minute and 5-minute granularities, but the total value is used now. The original monitoring metric has been disused in the new release.
Primary/Standby XLOG Sync Difference: the name of this metric has been changed to Differences Between sent_lsn and replay_lsn. It indicates the difference between the size of the log sent from the primary node to the standby node and the log replayed on the standby node, which reflects the log application speed of the standby node as well as its performance and network transfer speed. It is available only for primary instances but not for read-only instances.
Primary-Standby Lag: the name of this metric has been changed to WAL Flush Lag. It indicates the time difference between the time point when the log is sent from the primary node to the standby node and the time point when the standby node receives the log and flushes it. It is available only for primary instances on v10.x or above.
Was this page helpful?