Tencent Cloud TDMQ for RocketMQ adopts a multi-primary architecture with cross-availability zone (AZ) deployment to ensure high service availability, while utilizing a three-replica cloud disk mechanism to guarantee data high availability. This architecture abandons the traditional primary-secondary mode, significantly streamlining Ops complexity while delivering financial-grade reliability.
Cluster-Level High Availability
The core objective of cluster-level high availability is to ensure uninterrupted read and write operations for the messaging service, even if certain components fail. Tencent Cloud TDMQ for RocketMQ builds a highly available service cluster with no single point of failure and regional-level disaster recovery capability through the combination of NameServer cluster, stateless proxy, multi-primary broker, and cross-AZ deployment.
NameServer Clustering and Cross-AZ Deployment
NameServer serves as the "brain" of RocketMQ, responsible for service discovery and routing management. It must be highly available by design. In the Tencent Cloud solution:
Clustered deployment: At least two NameServer nodes are deployed to form a stateless cluster.
Cross-AZ deployment: These NameServer nodes are distributed across multiple AZs within the same region. AZs are physically isolated data centers. A failure in one AZ does not impact others.
Thus, any failure of a single NameServer node or even an entire AZ does not impact the retrieval of routing information. Producers and consumers can still retrieve the list of broker addresses from other healthy NameServer nodes.
Multi-primary Broker Architecture
Unlike the traditional primary-secondary architecture, the multi-primary architecture eliminates hierarchical roles. All broker nodes operate as primary nodes and can handle message write requests from producers at any time.
Read-Write Load Balancing
When sending messages, the producer retrieves the list of all available primary brokers from the NameServer and writes messages to different brokers using strategies such as round-robin scheduling, naturally achieving load balancing for write traffic.
Seamless Failover
Single node failure: If a broker node loses heartbeat with the NameServer due to crashes or network issues, the NameServer immediately removes it from the route table.
Automatic client retry: Producers and consumers periodically update the broker list from the NameServer. When they detect an unavailable broker, they automatically skip the node and seamlessly switch requests (such as message sending and pulling) to other healthy broker nodes in the cluster.
The entire process is completely transparent to business applications and requires no manual intervention, achieving failover within seconds.
Cross-AZ Deployment for Brokers
To mitigate data center-level disasters, we deploy multiple primary broker nodes across multiple AZs. A cross-AZ distribution policy is enforced based on the AZs you have selected.
Failure Scenario Simulation
Assume that a cluster is deployed across three AZs (AZ1, AZ2, and AZ3) in the Guangzhou region, with broker nodes distributed in each AZ.
Scenario 1: A Broker Server in AZ1 Failed
The NameServer detects a heartbeat timeout and removes this broker from the routing information.
New production and consumption requests are automatically routed to other brokers in AZ1 as well as brokers in AZ2 and AZ3, ensuring uninterrupted service.
Scenario 2: The Entire AZ1 Unavailable Due to a Power or Network Failure
All NameServer and broker nodes in AZ1 go offline.
Since healthy NameServer and broker nodes still remain in AZ2 and AZ3, clients will connect to these nodes.
The overall service capacity of the cluster will be temporarily degraded, but core message production and consumption features remain available, ensuring business continuity.
Cross-AZ Deployment for Proxies
For the 5.x product forms, similar to broker deployment, we enforce cross-AZ distribution of proxies based on the AZs you have selected. Leveraging the stateless nature of proxies means:
No business data storage: Persistent status information such as message data and consumer offsets remains stored on the brokers' cloud disks.
No long-lived session status: Proxies do not maintain critical, non-recoverable session information. Any proxy node can handle requests from any client.
At the deployment level, this is achieved through a multi-node cluster combined with a frontend load balancer (LB):
Deploy multiple proxy instances: We deploy at least 3 or more proxy instances, distributed across different AZs, similar to NameServers and brokers.
Configure a frontend LB: We configure a load balancer, such as Tencent Cloud's Cloud Load Balancer (CLB), in front of all proxy instances. This LB provides a single virtual IP (VIP) address externally.
Clients connect to the VIP: All producer and consumer clients are configured to connect solely to this unified LB VIP address, rather than connecting directly to NameServers or brokers.
Data High Availability
Data high availability is the lifeline of a message queue, with the core objective of ensuring that successfully written message data is never lost due to any hardware failure. Traditional RocketMQ relies on primary-secondary synchronous replication (SYNC_FLUSH) to guarantee zero data loss, but this approach introduces significant performance overhead and complex primary-secondary switch procedures.
Tencent Cloud leverages IaaS-layer capabilities to inherently resolve data persistence high availability through the three-replica mechanism of Cloud Block Storage (CBS).
The "brokers + three-replica cloud disks" architecture represents a typical compute-storage separation architecture. This model transforms RocketMQ brokers into stateless compute nodes, while delegating stateful data storage to a professional, highly available distributed block storage service, thereby achieving high data availability.
What Are Three-Replica Cloud Disks?
CBS is a network block storage device provided by Tencent Cloud, offering high availability, high reliability, and low latency. One of its core features is the three-replica data mechanism.
When a RocketMQ broker writes a piece of message data (such as CommitLog or ConsumeQueue) to its mounted cloud disk, the underlying storage system of the cloud disk automatically and synchronously writes three physical copies of this data to three different physical racks within the same AZ.
The write operation returns success to the upper-layer application (the RocketMQ broker) only after all three copies have been successfully written.
This process is completely transparent to the upper-layer application. The broker simply performs what appears to be a standard local disk write operation.
How Do Three-Replica Cloud Disks Ensure Data High Availability?
This architecture shifts the guarantee of data reliability from the application layer (RocketMQ primary-secondary replication) down to a more reliable and efficient infrastructure layer (distributed storage).
Immune to Single-Disk/Single-Machine Failures
If the physical disk or server hosting any data replica fails, the system automatically recovers the data from the two other healthy replicas and creates a new replica in a new location, always maintaining the three-replica status. The entire process is transparent to services, with zero data loss.
Simplified Architecture, Eliminating Primary-Secondary Replication
Since high availability of data is already achieved at the storage layer, we no longer need to deploy secondary nodes or configure complex primary-secondary synchronous replication. This brings significant strengths:
No replication latency: This eliminates data synchronization latency between primary and secondary nodes.
Simplified Ops: This eliminates the need to handle complex Ops scenarios such as primary-secondary switch and data replenishment.
Rapid recovery: When a broker node (virtual machine) fails, we only need to start a new broker instance and remount the original cloud disk. Since all message data remains intact on the cloud disk, the broker instance can resume service immediately, significantly reducing the recovery time objective (RTO).
Containerized Deployment
Building upon the high-availability architecture described above, if we containerize the entire TDMQ for RocketMQ cluster (including NameServer and broker) and deploy it on a container orchestration platform represented by Tencent Kubernetes Engine (TKE), we can achieve standardized delivery, rapid scaling, and automatic recovery in abnormal scenarios.