tencent cloud

TDSQL Boundless

Storage Engine Architecture and Data Model

PDF
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-04-17 12:21:17
This document introduces the architecture of the TDStore storage engine for TDSQL Boundless, along with its core three-tier metadata model and KV encoding mechanism.
Prerequisites: Architecture Overview - Learn about the overall architecture and three major core components of TDSQL Boundless

TDStore storage engine architecture

TDStore is the core storage engine of TDSQL Boundless. Built on RocksDB, it incorporates a data shard module and a distributed transaction module, with an underlying Raft consensus protocol module. This implements a distributed KV storage engine featuring high scalability, high availability, distributed transactions, data rebalancing, and strong consistency with multiple replicas.

System Architecture


TDStore storage engine includes the following core components:
Component
Function
Feature
RocksDB
Underlying KV Storage
LSM-Tree structure, high compression ratio
Multi-Raft
multi-replica disaster recovery
Strong data consistency, high availability
Sharding Module
Data Shard Management
Flexible scheduling, affinity support
Transaction Module
Distributed Transaction
2PC offloading, negotiated commit

Layered Description

Single-Machine KV Storage Engine (RocksDB)
Receives KV requests forwarded from the compute layer.
Uses the LSM-Tree structure to store data.
Logical Data Sharding
Supports scheduling and migration at the data shard level.
The system implements flexible scheduling of data between cluster nodes.
Distributed Transaction
Maintains information such as participant context and transaction status for distributed transactions across data shards.
Supports scheduling correlated data to the same data shard.
Multi-Replica Disaster Recovery Layer (Multi-Raft)
Creates multiple replicas for each data shard across different TDStore nodes using the Raft Group approach.
Each data shard is synchronized as an independent log stream.

Low-Cost Mass Storage

TDStore storage layer stores and manages data based on the LSM-Tree + SSTable structure:
Extremely High Compression Ratio: Effectively reduces storage costs for massive data.
PB-Level Support: A single instance can support PB-level storage capacity.
Multi-level Compression: Provides compression algorithms at each data layer.

three-level metadata model

Note:
Core Concept: TDSQL Boundless implements fine-grained data management and intelligent scheduling through a three-level metadata model.
Facing the three major challenges of distributed architecture: perception gap, constrained scheduling, and rigid rules, TDSQL Boundless addresses them through a three-level metadata model:

three-level metadata model



DataObject (Data Object)

Definition: Logical-level conceptual abstractions such as tables, indexes, partitions, and auto-increment values.
Hierarchical Structure:
L0 Level: Database (Database)
L1 Level: Table (Table), belonging to a specific Database.
L2 Level: Index/Partition (Index or Partition), belonging to a specific Table.
Role:
Define different types of data structures.
Serves to enable topology-aware data affinity relationships.
Record table structure, secondary index, and other metadata.
Example: Object ID 10010 can clearly represent: a secondary index under the primary partition (id:1003) of the partitioned table (id:1001) in the database (id:1).

Replication Group (Replication Group)

Definition: A physical storage unit based on the Raft protocol, featuring one master and N replicas to ensure data consistency.
Role Type:
Role
Description
Leader
Primary replica, handling all read and write requests.
Follower
Replica, synchronizes data and can participate in elections.
Learner
Learner, synchronizes data but does not participate in elections.
Witness
Witness, participates in elections but does not store data.
Characteristics:
Corresponds to a Raft log stream.
Managing data across multiple different Regions.
Supports data affinity scheduling.

Region (data shard)

Definition: A continuous range of Key Range, which is the smallest unit of physical data storage.
Capacity Standard:
Maximum 256MB
or up to 100,000 rows of data
Rules:
A Resource Group (RG) can contain multiple Regions.
Each Region holds a portion of the actual data for a certain DataObject.
A single shard can contain at most the data of one data object.

KV Encoding and Data Space

Encoding Rules

In TDSQL Boundless, all data is encoded in Key-Value format. The encoded Keys feature the mem-comparable (memory-comparable) characteristic.

KV encoding and data space


Encoding Characteristics:
The system assigns a globally increasing unique ID to each index.
All data of the same index shares the same prefix.
Encoded data is logically contiguous.

Data Shards and Replication Groups

In the logical data space, each Key corresponds to a discrete point, but physically each Key-Value requires storage space. When the data volume increases, a single node cannot accommodate all data, thus the data is partitioned into multiple shards (Regions).

Data Shards and Replication Groups


Key Features:
Data of the same index is spatially contiguous.
Different indexes of the same table may be distributed across different, non-contiguous Regions.
By using replication groups, associated Regions can be scheduled to the same node.

Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan