In today's era of widespread microservices and distributed architectures, a single system may carry thousands of APIs. Scenarios such as dynamic URLs, multi-version paths, and personalized parameters further exacerbate the complexity of API naming. When traditional monitoring tools encounter the high cardinality (High Cardinality) issue of API names, it leads to divergent API naming, resulting in fragmented monitoring data, failed statistics, and even triggering an avalanche effect in monitoring systems. To address this, APM introduces the auto-convergence feature for API names. Through automated categorization and dynamic aggregation, it enables more precise, efficient, and insightful application management.
API Name Divergence Issue
For the observability data reported by applications, APM aggregates and calculates critical performance metrics such as throughput, response time, and error rate based on the API name dimension (that is, Span name). Through this API name-based aggregation, users can analyze the performance of each API via features like API monitoring—for example, viewing the average error rate and P99 response time for specific APIs over the last 24 hours.
API names are determined by the trace data reported by applications, offering high flexibility. Probes with automatic instrumentation capabilities select information such as API URLs and internal method names as API names based on instance conditions. Users can also customize API names through methods like the OpenTelemetry API. When an application's API names exhibit a high cardinality (High Cardinality) state, it triggers the issue of divergent API naming. For example, when the URL used as the API name includes user ID information, different user ID parameters generate countless unique URLs: /api/user/123, /api/user/124, /api/user/125, /api/user/126… In this case, each URL is treated as a distinct API, resulting in an extremely large number of API names in the system. When statistics are aggregated by API name dimension, each API is actually associated with only a specific user ID. This leads to extremely serious issues:
Statistical Failure: This is well illustrated in the above example, where /api/user/{userid} is scattered across numerous APIs with different user IDs, making it impossible to locate the overall performance of this type of API.
Analysis Obstacle: The application also has two other important APIs, /api/product and /api/order. However, because a large number of APIs starting with /api/user/ flood the monitoring data, these two critical APIs are overlooked.
Query Performance: Aggregation calculations become slow under high cardinality, causing report loading to lag or even time out. Although APM has optimized query performance at the technical architecture level, when the frontend page needs to load large amounts of discrete data, it still impacts the user experience.
Alarm False Positives and False Negatives: Discrete API names make it difficult to uniformly configure threshold rules, potentially ignoring common failures (such as timeouts for all payment APIs), but generating false alarms for sporadic exceptions.
These issues not only cause inefficient use of APM but may also mask more serious system failures.
API Auto-Convergence
To address the API name divergence issue, the APM system employs intelligent convergence technology to automatically categorize APIs, enabling better unified governance and aggregated analysis. This includes multiple convergence approaches:
Parameters and anchors in URLs. When APM identifies that the API name in the metric data is a URL, it truncates the part after ? and #. For example, www.example.com:80/path/to/myfile.html?key1=value1#SomewhereInTheDocument is processed as www.example.com:80/path/to/myfile.html.
Long Segments in URLs. When a URL in the metric data is split by /, if the length of any segment reaches 40, it will be replaced with {LONG_STR}. If a segment consists solely of digits and its length reaches 5, it will be replaced with {LONG_NUM}. For example: /path/to/1234567890/d4806b8c-2d5b-481d-9598-827b6dd49c10 will be processed as /path/to/{LONG_NUM}/{LONG_STR}.
Static resources. When APM identifies that the API name in the metric data has a static resource suffix, it replaces the static resource part in the API name with {STATIC_RESOURCE}.xxx, where xxx is the suffix of the static resource. Static resource suffixes include .jpg, .gif, .flv, etc. For example: /path/to/user_icon_007.jpg is processed as /path/to/{STATIC_RESOURCE}.jpg.
IP Address. When APM identifies an IP address in the API name within the metric data, it replaces the IP address with {IP}. For example, 11.146.86.42:9301/generateOrderInfo will be processed as {IP}:9301/generateOrderInfo.
Limit the Total Number of APIs per Application. In extreme cases where interface name divergence persists despite the above convergence methods, APM will replace API names in the metric data to ensure that the cardinality of API names for a single application does not exceed 1,000 within a given period. API names exceeding this limit will be replaced with {EXCEEDED}. The system performs real-time detection of API name divergence based on reported application data. If the divergence issue is resolved, API names will revert to their original content.
APM's API auto-convergence capability only applies to metric data. This means that the API names in span data (data stored in Spans) are not affected. Through the converged API names, you can still correlate queries to the corresponding span data.
SQL Convergence
In features such as database call analysis, APM aggregates statistics by SQL dimension, calculating performance metrics like call frequency and response time. Similar to API names, SQL can also cause divergence issues. APM categorizes database calls into two types based on response time: calls with response times under 2 seconds are normal calls, while those reaching 2 seconds are slow calls. When SQL divergence occurs, APM replaces SQL in the metric data to reduce the cardinality per application. SQL exceeding the limit will be replaced with {OTHER_SLOW_SQL} for slow calls and {OTHER_SQL} for normal calls based on their response times.
Best Practices
API Name Convergence
@GetMapping("/api/employeeswithvariable/{id}")
@ResponseBody
public String getEmployeesByIdWithVariableName(@PathVariable("id") String employeeId) {
return "ID: " + employeeId;
}
In scenarios where applications access databases, use parameterized SQL statements, for example, Prepared Statements in JDBC. When the OpenTelemetry API is used for manual instrumentation in application code, avoid including variable parameters in Span names to minimize the cardinality of Span names.