tencent cloud

Tencent Cloud TCHouse-C

DokumentasiTencent Cloud TCHouse-C

Recommendations and Specifications for Using the Tencent Cloud TCHouse-C Kernel

Unduh
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-05-27 10:17:33
When using Tencent Cloud TCHouse-C, observe the following specifications:

Writing Specifications

1. Batch Write: Data must be written to Tencent Cloud TCHouse-C in batches. Each batch must contain at least 1,000 records/batch, with a recommended batch size between 5k - 100k records. Each write operation generates one or more part storage directories at the underlying layer. A background task automatically merges small parts into a larger one. If the write frequency is too high, an excessive number of parts can accumulate. When the merge speed cannot keep up with the insertion rate, the write operation fails and returns the error: Too many parts(301). Merges are processing significantly slower than inserts.
2. Reduce Direct Writes to Distributed Tables: To improve write and query performance, write directly to Local Tables instead of Distributed Tables whenever possible. Although writes to a Distributed Table are ultimately forwarded to Local Tables, Distributed Tables suffer from issues such as write amplification and asynchronous disk I/O consumption, resulting in poorer write performance.
3. Enforcing Data Consistency: Tencent Cloud TCHouse-C does not provide transaction guarantees for data writes. Therefore, you must use an external data import module to control data idempotency. For example, if an exception occurs during the import of a specific data batch, you can delete the corresponding partition data or clean up the imported data, and then re-import the data for that partition or batch. Alternatively, you can use a deduplication engine (replacingMergetree) to ensure eventual consistency.
4. Large-Scale Data Write: For large-scale data writes, split the data in advance and write it evenly across all nodes of Tencent Cloud TCHouse-C. If a specific distribution rule exists, perform hash calculations on the application side.
5. Write Data to One Partition at a Time: To avoid write performance degradation and an excessive number of directories, write data to only one partition at a time. If a batch of data spans multiple partitions, multiple part files are generated at the underlying layer, consuming more merge performance and complicating idempotency control.

Query Specifications

Single table queries

1. Fields for high-frequency filtering and point queries are accelerated using indexes.
2. Avoid using select * statements. Instead, explicitly specify the fields to be queried and retrieve only the necessary ones. Tencent Cloud TCHouse-C uses a columnar storage engine at its underlying layer. Query latency is linearly related to the size and number of fields being queried.
3. When querying datasets containing more than ten million records, use where clauses and limit statements in conjunction with order by queries to improve query efficiency.
4. The `select {tablename} final` statement enables queries (read on merge), but it slows down query performance. Use it selectively.
5. Filter and prune data by partition whenever possible. Specifying partition fields reduces the number of files the underlying database needs to scan, thereby improving query performance.
6. Use delete and update mutation operations with caution. In Tencent Cloud TCHouse-C, update and delete operations are performed asynchronously. They rewrite the data parts filtered by the where condition, which are resource-intensive operations and may consume significant system resources. Furthermore, update and delete operations are executed part by part and do not guarantee atomicity across the entire execution.
7. If uniqueness requirements are not strict, you can use the approximate deduplication function `uniqCombined` to optimize deduplication logic, thereby improving query performance by up to ten times. If the query allows for some error, use `uniqCombined` as a replacement. Otherwise, continue using the `distinct` syntax. Using `distinct` can have a certain impact on query performance.

Multi-table association

1. To avoid Join operations and shuffle, use a Flat wide table structure instead of multi-table Joins whenever possible.
2. Limit the number of tables in a Join operation to three or fewer whenever possible.
3. If the query fields originate from a single table, consider replacing Join with in queries. In ClickHouse, in queries support single fields and tuples. For example: SELECT name FROM tab_a WHERE id IN (SELECT id FROM tab_b WHERE name = 'xx').
4. It is preferable to replace multi-table Joins with two-table Joins and subqueries.
5. For a two-table Join, Join a large table with a small table (with the small table's data volume controlled at the million to ten-million row level), or Join a small table with another small table. Do not Join a large table with another large table. During the Join operation, place the large table on the left and the small table on the right.
6. Use column pruning and partition pruning. Apply conditional filtering before the Join operation to minimize the volume of data involved in the Join as much as possible.
7. If the right table in a Join operation is a subquery or a distributed table with a small data volume, you can use GLOBAL JOIN to avoid read amplification. Note that GLOBAL JOIN triggers data propagation between nodes, consuming some network bandwidth. If the data volume is large, it can also cause performance degradation.

Table Creation Specifications

1. Non-Replicated tables cannot be created in high-availability clusters, and Replicated tables cannot be created in non-high-availability clusters.
2. If you have strict requirements for data eventual consistency, use the ReplacingMergeTree or CollapsingMergeTree engine. Perform optimize operations periodically or use select {tablename} FINAL to achieve final deduplication.
3. When partitions are planned, design the number of partitions appropriately and utilize partitions as much as possible. It is not recommended for a table to have more than 1000 partitions. A well-designed number of partitions can effectively assist in data filtering during queries and, when used properly, can improve query performance by several times. Partitioning by day is a common practice. However, avoid creating too many partitions. In Tencent Cloud TCHouse-C, data from different partitions is not merged. An excessive number of partitions can easily lead to too many parts, which in turn causes queries and restarts to become very slow.
4. Plan table fields as early as possible during table creation and avoid altering or deleting fields whenever possible. Altering or deleting fields rewrites the entire table's full data volume. For large tables, this consumes significant resources and the execution time can be lengthy. Furthermore, the process of altering or deleting fields can easily block other DDL statements and affect the table's merge operations. If an error occurs during the process, it may lead to unpredictable data consistency issues.
5. Do not modify index columns. Modifying an index column invalidates the existing index, triggering index rebuilding. During this period, queried data may be inaccurate.
6. Limit the amount of data stored on COS and avoid write and mutation operations on cold partitions as much as possible. A single COS bucket has a bandwidth of approximately only 1 GB, which is far lower than the performance of multi-node local disks and cloud disks, and the network latency is relatively high. Storing excessive data on COS severely impacts query efficiency. When a COS partition is written to, it triggers a merge operation on that partition. The merge efficiency decreases and may even affect data operations on local disks.


Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan