tencent cloud

Tencent Cloud TCHouse-D

Product Introduction
Overview
Concepts
Cluster Architecture
Strengths
Scenarios
Purchase Guide
Billing Overview
Renewal Instructions
Overdue Policy
Refund Instructions
Configuration Adjustment Billing Instructions
Getting Started
Using Tencent Cloud TCHouse-D Through the Console
Using Tencent Cloud TCHouse-D Through a Client
Operation Guide
Cluster Operation
Monitoring and Alarm Configuration
Account Privilege Management
Data Management
Query Management
Modify Configurations
Node Management
Log Analysis
SQL Studio
Enabling Resource Isolation
Development Guide
Design of Data Table
Importing Data
Exporting Data
Basic Feature
Query Optimization
Ecological Expansion Feature
API Documentation
History
Introduction
API Category
Making API Requests
Cluster Operation APIs
Database and Table APIs
Cluster Information Viewing APIs
Hot-Cold Data Layering APIs
Database and Operation Audit APIs
User and Permission APIs
Resource Group Management APIs
Data Types
Error Codes
Cloud Ecosystem
Granting CAM Policies to Sub-accounts
Query Acceleration for Tencent Cloud DLC
Practical Tutorial
Basic Feature Usage
Advanced Features Usage
Resource Specification Selection and Optimization Suggestions
Naming Specifications and Limits to the Database and Data Table
Table Design and Data Import
Query Optimization
Suggested Usage to Avoid
Accessing TCHouse-D via JDBC over the Public Network
Performance Testing
TPC-H Performance Testing
SSB Performance Testing
TPC-DS Performance Testing
FAQs
Common Operational Issues
Common Errors
Contact Us
Glossary
Product Policy
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement

Index, Sorted Column and Prefix Index

PDF
Focus Mode
Font Size
Last updated: 2024-06-27 10:55:08

Index

Doris supports a wide range of index structures to reduce data scans and improve query efficiency. The currently supported index types are:
Sorted Compound Key Index: Users can specify three columns at most to form a compound sort key. This index can effectively prune data to better support high concurrency reporting scenarios.
Z-order Index: It allows highly efficient range queries for any combination of fields in the data model.
Min/Max: This enables effective filtering of equivalence and range queries for numeric types.
Bloom Filter: It is highly effective in equivalence filtering and pruning of high cardinality columns.
Invert Index: It enables quick search for any field.
Unlike traditional database, Doris does not support creating indexes on any column. MPP-structured OLAP databases like Doris usually handle large amounts of data through improved concurrency.

Sort Column (Sort Key):

To improve the query performance, Doris has optimized the organization structure of data storage. Essentially, Doris stores data in an ordered data structure similar to SSTable (Sorted String Table). This structure can sort storage according to specific columns (one or multiple), also known as sort columns. In such a data structure, it is very efficient to use sorting columns as conditions to retrieve data.
In the three data models of Aggregate, Unique, and Duplicate, the underlying data is sorted and stored according to the columns specified in AGGREGATE KEY, UNIQUE KEY, and DUPLICATE KEY of the table creation statement. Rollup can specify its own sort keys, but the sort keys must be a prefix of the Rollup column order.
Note
In the column definitions of the table creation statement, the definition of sort column must appear before the definitions of other columns.
The order of sort columns is determined by the column order in the table creation statement.

Prefix Index

This is an index method based on the sorted data structure (SSTable), implementing fast data query with given prefix columns. For queries that can utilize the sorting structure, Doris uses binary search algorithm to locate the target data range. However, if there are a lot of rows in the table, direct binary search on sort columns requires loading all the data in the filter column into memory, which would consume a large amount of memory. To optimize this, Doris introduces sparse Shortkey Index (prefix index) based on the sort key. The content of the Sortkey Index is 1024 times less than the data volume (Doris combines every 1024 rows of data into a logical data block, called Data Block, and each Data Block stores a row of index in the prefix index), therefore the content will be fully cached in memory, which can effectively speed up the query in actual search.
Large number of Sort Key columns will occupy a lot of memory. For performance purposes, prefix index items are restricted as follows:
At most 3 columns can be selected as Shortkey columns.
Columns of FLOAT / DOUBLE type cannot be used.
Prefix index can only be constructed in the order of sort keys.
VARCHAR / CHAR type columns can only appear once and must be at the last position.
All columns do not exceed 36 bytes, and VARCHAR / CHAR columns are truncated according to the remaining byte number.
When the user specifies the property of the short_key in the table creation statement, for example, "short_key" = "4" specifies 4 columns as short_key, which can break through the above restrictions.

Sample code

1. The prefix index of the following table structure is user_id(8 Bytes) + age(4 Bytes) + message(prefix 20 Bytes).
ColumnName
Type
user_id
BIGINT
age
INT
message
VARCHAR(100)
max_dwell_time
DATETIME
min_dwell_time
DATETIME
2. The prefix index of the following table structure is user_name(20 Bytes). Even if it does not contain 36 bytes, it is directly truncated and does not continue backwards because it encounters VARCHAR.
ColumnName
Type
user_name
VARCHAR(20)
age
INT
message
VARCHAR(100)
max_dwell_time
DATETIME
min_dwell_time
DATETIME
When our query condition is aprefix of the prefix index, it can greatly speed up the query speed. For example, we perform the following query in the first example:
SELECT * FROM table WHERE user_id=1829239 and age=20;
The efficiency of this query will be much higher than the following query:
SELECT * FROM table WHERE age=20;
Therefore, when creating a table, selecting the column order correctly can greatly improve the query efficiency.

Best Practice

Adjusting the Prefix Index through ROLLUP

Since the column order has been specified when a table is created, a table only has one prefix index. This may not satisfy the requirements of efficiency for queries that use the column that cannot hit the prefix index as the conditions. Therefore, we can adjust the column order artificially by creating a ROLLUP.

Optimizing the Order of Sort Columns to Improve Query Performance

When the Sort Key involves multiple columns, pay attention to the order. It is recommended to place highly distinct and frequently queried columns in front.
Note the number of sort columns:
1. If a large number of columns are selected for sort columns, the sorting when importing data will increase the time of the whole importing process.
2. A small number of well-designed sort columns can also quickly locate the position of the data row. Adding more columns for sorting will not improve query.

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback