tencent cloud

Hitless Migration from Virtual Cluster to Pro Cluster
Last updated: 2025-12-24 15:00:42
Hitless Migration from Virtual Cluster to Pro Cluster
Last updated: 2025-12-24 15:00:42

Scenarios

To meet user requirements in different scenarios, TDMQ for Apache Pulsar provides two product modes: Pro Clusters and Virtual Clusters.
Due to risks in the stability of virtual clusters, we terminated the creation of new virtual clusters in 2023. Pro clusters provide enhanced capabilities and a more comprehensive console (for operations such as management, renewal, and scale-out). To provide better services, we provide hitless migration capabilities for existing virtual clusters in use to support hitless migration from virtual clusters to pro clusters.
Note:
Pre-check and assessment are required for the hitless upgrade from a virtual cluster to a pro cluster. If you cannot find the related button on the Instance Details page in the console, you can contact Tencent Cloud customer service or submit a ticket.

Capability Description

The cluster migration process on the data plane is nearly transparent to users. This indicates that the migration is hitless (No adjustment is required for access points, and no modification is required for the user business code).
Migration process:
1. The system will deduct fees based on the specifications of the pro cluster. When migration starts, the deduction status is displayed as Processing on the Order page. During migration, if a rollback is triggered due to an issue, the order is automatically refunded. After migration is completed, the order status becomes Completed, and billing officially starts.
2. On the Professional Cluster list page of the console, you can view the cluster information after migration and perform subsequent operations such as management, configuration upgrade, and renewal.

Prerequisites for Hitless Migration

Hitless migration has a prerequisite: Some access point addresses cannot be smoothly migrated to the new cluster because the cluster was created earlier, or the network connectivity was established using special methods.
Therefore, users need to provide the access point address currently in use to ensure feasibility.

Operation Entry

Go to the Virtual Cluster list page in the TDMQ for Apache Pulsar console, click the ID of the cluster you want to upgrade to go to the Instance Details page. In the upper-right corner of the Basic Information section on the Instance Details page, you can see Upgrade to Professional Edition. If you cannot find this button, contact Tencent Cloud customer service or submit a ticket.

Operation Process

1. Initiate the migration. The frontend will verify the current access point addresses. If non-standard access points are detected, a pop-up window will be displayed, prompting you to modify the client access address to Standard Access Point.
2. Select the target specification. The page displays the TPS and storage size of the last 7 days. You can select a pro cluster of the corresponding specification based on the usage of the current virtual cluster.
3. Perform an access point scan. The server will scan access points to check whether the client uses non-standard access points.
4. Initiate the upgrade. The migration process starts.

Possible Issues and Solutions

1. Message duplication
Progress synchronization has been implemented through an individual acknowledgment (ACK) to minimize duplicate messages during migration. However, duplication during migration may not be entirely avoided. In normal cases, duplication lasts no more than 1 minute. It is recommended that users perform idempotent processing in advance if necessary.
2. Out-of-order messages
This issue may occur during cluster migration. No solution can completely prevent out-of-order messages during the migration process. It is necessary to notify message consumers in advance.
3. Inaccurate monitoring data
During the switch of clusters, the monitoring data may be instantaneously inaccurate. Typically, this issue can be resolved within 1 to 2 minutes.
4. Production time jitters
During the switch of clusters, a temporary production time jitter may occur. This is similar to the time jitter during cluster upgrades. Typically, this issue can be resolved within 1 minute.
5. Abnormal message traces
During data synchronization, messages for the consumption progress synchronization will be generated. Users can view these messages when they perform message queries. Message details may not be queried during migration.
6. Migration duration
The migration duration depends on the number of namespaces, production traffic, and data volume in message storage. For a namespace with 1,000 TPS and 100 GB of message storage, cluster migration is typically completed within 1 hour. In case of a large amount of data, such as 1 TB of storage, cluster migration may take approximately 2 hours.
7. Retention period of the original cluster
After migration is completed, the Tencent Cloud R&D team will wait for 1 to 3 days before cleaning up resources on the existing physical cluster. After cleanup, rollback will no longer be performed.
8. Message backlog issue
During migration, the consumption progress synchronization is performed by using user topics. Therefore, internal messages may exist in these topics. These messages are filtered out on the server during consumption and will not actually be consumed by the business. For topics without consumer subscriptions, a message backlog may occur.
9. Message replication scope
During message replication, only messages within the TTL range of the original cluster can be replicated to the new cluster due to the implementation mechanism of TDMQ for Apache Pulsar. If your message retention period is long and you need to synchronize all data within the retention period, you need to adjust the TTL configuration first.
10. Client disconnection
At the final stage of migration, the unload operation will be triggered for topics. This will trigger client lookup. Typically, the address of the new cluster is automatically obtained at the client lookup initiation stage.

Migration Principles

Technical Solution

The Geo Replication solution of TDMQ for Apache Pulsar is used, and bidirectional cross-cluster replication is enabled. This can achieve the synchronization of data and message progress, meeting cluster migration requirements.


Main Migration Process

The following figure shows the main migration process. Migration proceeds to the next step after each step is successful.

1. Initiate the migration process in the console. The platform triggers resource delivery. After the order payment is completed, a new pro cluster starts to be built.
2. Perform synchronization of cluster metadata, including namespaces, topics, subscriptions, roles, and namespace policies.
3. Enable cross-cluster data synchronization.
4. Switch between clusters. The operation platform issues a tenant cluster-switching command to trigger the unload operation for the topics of the original cluster. This triggers client lookup. At the lookup stage, the address of the new cluster is returned.
5. After confirming that user data has been successfully migrated, disable cross-cluster data synchronization and clean up resources of the original cluster.
6. You can view the new cluster in the console.

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback