tencent cloud

Feedback

Migration from MySQL/MariaDB/Percona/TDSQL-C for MySQL to Kafka

Last updated: 2023-09-05 15:04:29
    DTS allows you to sync the full and incremental data of the source database to Ckafka, so you can quickly obtain business change data and use it. This document describes how to use DTS to sync data from MySQL/MariaDB/Percona/TDSQL-C for MySQL to Ckafka.
    The following deployment modes of the source database are supported:
    Self-built MySQL, third-party cloud MySQL, and TencentDB for MySQL.
    Self-built MariaDB and TencentDB for MariaDB.
    Self-built Percona.
    TDSQL-C for MySQL

    Note

    The same task can be associated with multiple topics, but the same topic cannot be used by multiple sync tasks at the same time, otherwise it will cause data chaos, and the consumed data will come from multiple different tasks. This may also result in error reports for abnormal data processing during the task retry.
    By default, the locked sync method is adopted. During the full data export, a global lock (FTWRL) will be added to the source database, and the write operation will be blocked for a short time (in seconds) during the table locking process.
    If you determine to use the rename operation on a certain table during the sync process and deliver it to the target Ckafka, the partition rules will match the topic and partition based on the new table name.
    During the full sync, every time 100,000 entries of data are synced, a checkpoint message will be inserted in the target Kafka to identify the current data sync point.
    Before the sync task, if the full data already exists in the source database, the happenAt field in the consumer demo will be displayed as 1970-01-01 08:00:00 as there is no accurate time for data writing. You can ignore this time field. When incremental data is consumed, the time field can be displayed correctly.
    During the full data sync, DTS exports and imports the source database data to the target Ckafka, and uses the utf8mb4 character set to avoid garbled characters caused by unrecognized character sets.
    It is recommended that the target topic be empty and not be written to in the sync task, otherwise it may cause message disorder and task error reporting.
    There is a performance cap for a single message that DTS syncs to the target Kafka. It is recommended that the single row of data in the source database not exceed 8 MB, otherwise an error may be reported during the data sync.

    Prerequisite

    The source and target databases must meet the requirements for the sync feature and version as instructed in Databases Supported by Data Sync.
    Permissions required of the source database for the sync task account:
    GRANT RELOAD,LOCK TABLES,REPLICATION CLIENT,REPLICATION SLAVE,SHOW VIEW,PROCESS,SELECT ON *.* TO 'account'@'%' IDENTIFIED BY 'password';
    FLUSH PRIVILEGES;
    Modify message retention time and message size limit in target Ckafka.
    It is recommended to set the message retention time to 3 days, and the data beyond the retention time will be cleared, you need to consume in time within the set time. The upper limit of message size refers to the maximum size of a single message that Ckafka can receive. You must set it to be greater than the maximum value of a single row of data in the source database table in order for the data to be delivered normally to CKafka.

    Application Restrictions

    Only basic tables, views, procedures, and functions can be synced.
    If the source MySQL database is a non-GTID database, DTS doesn't support HA switch for it. If it is switched, DTS incremental sync may be interrupted.
    Only data with the following three database engines can be synced: InnoDB, MyISAM, and TokuDB. Tables with other engines will be skipped during sync by default.
    Correlated data objects need to be synced together; otherwise, sync will fail. The correlations include tables correlated through primary/foreign keys.
    Scenarios that contain both DML and DDL statements in the same transaction are not supported and will trigger errors during task execution.
    Geometry data types are not supported and will trigger errors during task execution.
    The ALTER VIEW statement is not supported and will be skipped during sync.
    In order to ensure data consistency, the locked sync is implemented for sync to Kafka by default. However, if the source database is Alibaba Cloud MySQL5.5/5.6, AWS MySQL 5.7.18 and earlier, locks cannot be implemented, which may cause data duplication during the sync to the target Ckafka. If you are more concerned about duplicate data, you can implement deduplication logic when consuming data.
    During the process of the sync task, if the task is restarted, which may cause data duplication on the target CKafka.
    DTS syncs data based on the smallest data unit. A piece of data of a single table object in the full stage is considered the smallest data unit, and each checkpoint in the incremental stage is considered a data unit. If the sync of one data unit is completed before the task restart, it will not cause data duplication; if one data unit is still being synced during the task restart, the data unit needs to be pulled again after the restart, which will lead to data duplication.
    A task will be restarted in the following operations: modify the sync object, manually click the retry button in the operation column when the task error occurs, change the configuration, stop the task and resume it, and isolate the task and unisolate it
    If you are more concerned about duplicate data, you can implement deduplication logic when consuming data.
    For the version of target Kafka, you can only purchase [Pro Edition and Standard Edition] (https://www.tencentcloud.com/document/product/597/41815!ddbf00757162b30e90bc29bb6136fca6) on versions of v1.1.1 and later on Tencent Cloud. The physical machine of standard edition that has been deactivated is not supported as a target Kafka.

    Operation Restrictions

    During the sync, do not perform the following operations; otherwise, the sync task will fail:
    Do not modify or delete user information (including username, password, and permissions) in the source database and target CKafka and port numbers.
    Do not write binlog data in the STATEMENT format into the source database.
    Do not clear binlogs in the source database.
    Do not write data to a topic that has been selected for sync to the target end.

    Synchronizable SQL Operations

    Operation Type
    SQL Statement
    DML
    INSERT, UPDATE, DELETE
    DDL
    CREATE DATABASE、DROP DATABASE、ALTER DATABASE、CREATE TABLE、ALTER TABLE、DROP TABLE、TRUNCATE TABLE、CREATE VIEW、DROP VIEW、CREATE INDEX、DROP INDEX
    Note
    CREATE TABLE table name AS SELECT is not supported.

    Environment Requirements

    Type
    Environment Requirements
    Requirements for the source database
    The source and target databases can be connected.
    Requirements for instance parameters:
    The server_id parameter in the source database must be set manually and cannot be 0.
    row_format for the source databases/tables cannot be set to FIXED.
    The connect_timeout variable in the source database must be greater than or equal to 10.
    Requirements for binlog parameters:
    The log_bin variable in the source database must be set to ON.
    The binlog_format variable in the source database must be set to ROW.
    The binlog_row_image variable in the source database must be set to FULL.
    On MySQL 5.6 or later, if the gtid_mode variable is not ON, an alarm will be triggered. We recommend you enable gtid_mode.
    It is not allowed to set do_db and ignore_db.
    If the source instance is a replica database, the log_slave_updates variable must be set to ON.
    We recommend that you retain the binlog of the source database for at least three days; otherwise, the task cannot be resumed from the checkpoint and will fail.
    Foreign key dependency:
    Foreign key dependency can be set to only one of the following two types: NO ACTION and RESTRICT. Other types may affect data consistency results.
    During partial table sync, tables with foreign key dependency must be migrated.
    The environment variable innodb_stats_on_metadata must be set to OFF.
    Requirements for the target CKafka
    The target CKafka can be connected.
    The upper limit of message size in target CKafka must be greater than the maximum value of a single row of data in the source database table.

    Directions

    This document describes how to sync data from MySQL to CKafka. The requirements and steps of data sync from MySQL/MariaDB/Percona/TDSQL-C forMySQL are basically the same.
    1. Log in to the data sync purchase page, select appropriate configuration items, and click Buy Now.
    Parameter
    Description
    Billing Modes
    Monthly subscription and pay-as-you-go billing modes are supported.
    Source Instance Type
    Select MySQL, which cannot be changed after purchase.
    Source Instance Region
    Select the source instance region, which cannot be changed after purchase. If the source database is a self-built one, select a region nearest to it.
    Target Instance Type
    Select Kafka, which cannot be changed after purchase.
    Target Instance Region
    Select the target instance region, which cannot be changed after purchase.
    Specifications
    Select a specification based on your business needs. The higher the specification, the higher the performance. For more information, see Billing Overview.
    2. After making the purchase, return to the data sync list and click Configure in the Operation column to enter the sync task configuration page.
    3. On the sync task configuration page, configure the source and target databases and their accounts and passwords, test the connectivity, and click Next.
    Setting Items
    Parameter
    Description
    Task Configuration
    Task Name
    DTS will automatically generate a task name, which is customizable.
    Running Mode
    Immediate execution and scheduled execution are supported.
    Source Instance Settings
    Source Instance Type
    The source database type selected during purchase, which cannot be changed.
    Source Instance Region
    The source instance region selected during purchase, which cannot be changed.
    Access Type
    Select a type based on your scenario. For the preparations for different access types, see Preparations Overview.
    Public Network: The source database can be accessed through a public IP.
    Self-Build on CVM: The source database is deployed in a CVM instance.
    Direct Connect: The source database can be interconnected with VPCs through Direct Connect.
    VPN Access: The source database can be interconnected with VPCs through VPN Connection.
    Database: The source database is a TencentDB instance.
    CCN: The source database can be interconnected with VPCs through CCN.
    Public Network
    Host Address: IP address or domain name of the source database.
    Port: Port used by the source database.
    Self-Build on CVM
    CVM Instance: The ID of the CVM instance.
    Port: Port used by the source database.
    Direct Connect
    VPC-based Direct Connect Gateway: Only VPC-based direct connect gateway is supported. Confirm the network type associated with the gateway.
    VPC: Select a VPC and subnet associated with the VPC-based Direct Connect Gateway or VPN Gateway.
    Host Address: IP address of the source database.
    Port: Port used by the source database.
    VPN Access
    VPN Gateway: Select a VPN Gateway instance.
    VPC: Select a VPC and subnet associated with the VPC-based Direct Connect Gateway or VPN Gateway.
    Host Address: IP address of the source database.
    Port: Port used by the source database.
    Database
    Instance Name: The ID of the source database instance.
    CCN
    Host Address: IP address of the source database server.</td>
    Port: Port used by the source database.
    VPC-based CCN Instance: The name of the CCN instance.
    Accessed VPC: The accessed VPC refers to the VPC in CCN over which the subscription linkage is connected. You need to select a VPC other than the VPC to which the source database belongs. For example, if the database in Guangzhou is used as the source database, select other region, such Chengdu-VPC or Shanghai-VPC, as the accessed VPC.
    Subnet: Name of the subnet of the selected VPC.
    Region of Accessed VPC: The region of the source database selected during task purchase must be the same as the region of the accessed VPC; otherwise, DTS will change the former to the latter.
    Account/Password
    Account/Password: Enter the database account and password.
    Target Instance Settings
    Target Instance Type
    The target instance type selected during purchase, which cannot be changed.
    Target Instance Region
    The target instance region selected during purchase, which cannot be changed.
    Access Type
    Select a type based on your scenario. In this scenario, select CKafka instance.
    Instance ID
    Select the instance ID of the target instance.
    4. On the Set sync options and objects page, set the data initialization, policy for syncing data to Kafka, data sync, and sync object options and click Save and Go Next.
    Deliver to custom topic
    Deliver to a single topic
    Setting Items
    Parameter
    Description
    Data Initialization Option
    Initialization Type
    Structure initialization: Table structures in the source instance will be initialized into the target instance before the sync task runs.
    Full data initialization: Data in the source instance will be initialized into the target instance before the sync task runs. If you select Full data initialization only, you need to create the table structures in the target database in advance.
    Both options are selected by default, and you can deselect them as needed.
    Format of Data Delivered to Kafka
    Avro adopt the binary format with a higher consumption efficiency, while JSON adopts the easier-to-use lightweight text format.
    Policy for Syncing Data to Kafka
    Topic Sync Policy
    Deliver to custom topic: Set the topic name for delivery by yourself. After setting, the target Kafka will automatically create the topic. The synced data is randomly delivered to different partitions under the topic. If the target Kafka fails to create the topic, the task will report an error.
    Deliver to a single topic: Select an existing topic on the target side, and then deliver based on multiple partitioning policies. Data can be delivered to a single partition of the specified topic, or delivered to different partitions by table name or by table name + primary key.
    Rules for delivering to custom topic
    If you add multiple rules, the database and table rules are matched one by one from top to bottom. If no rules are matched, data will be delivered to the topic corresponding to the last rule. If multiple rules are matched, data will be delivered to the topics corresponding to all the matched rules.
    Example 1: There are tables named "Student" and "Teacher" in a database named "Users" on database instance X. If you need to deliver the data in the "Users" database to a topic named "Topic_A". The rules are configured as follows:
    Enter Topic_A for Topic Name, ^Users$ for Database Name Match, and .* for Table Name Match.
    Enter Topic_default for Topic Name, Databases that don't match the above rules for Database Name Match, and Tables that don't match the above rules for Table Name Match.
    Example 1: There are tables named "Student" and "Teacher" in a database named "Users" on database instance X. If you need to deliver the data in the "Student" table and "Teacher" tables to topics named "Topic_A" and "Topic_default" respectively. The rules are configured as follows:
    Enter Topic_A for Topic Name, ^Users$ for Database Name Match, and ^Student$ for Table Name Match.
    Enter Topic_default for Topic Name, Databases that don't match the above rules for Database Name Match, and Tables that don't match the above rules for Table Name Match.
    Rules for delivering to a single topic
    After selecting a specified topic, the system will perform partitioning based on the specified policy as follows.
    Deliver all data to partition 0: Deliver all the synced data of the source database to the first partition.
    By table name: Partition the synced data from the source database by table name. After setting, the data with the same table name will be written into the same partition.
    By table name + primary key: Partition the synced data from the source database by table name and primary key. This policy is suitable for frequently accessed data. After settings, frequently accessed data is distributed from tables to different partitions by table name and primary key, so as to improve the concurrent consumption efficiency.
    Topic for DDL Storage
    (Optional) If you need to deliver the DDL operation of the source database to the specified topic separately, you can select the settings here. After setting, it will be delivered to Partition 0 of the selected topic by default; if not set, it will be delivered based on the topic rules selected above.
    Data Sync Option
    Setting Items
    Parameter
    Description
    Data Sync Option
    SQL Type
    The following operations are supported: INSERT, DELETE, UPDATE, and DDL. If Custom DDL is selected, you can choose different DDL sync policies as needed. For more information, see Setting SQL Filter Policy.
    Sync Object Option
    Database and Table Objects of Source Instance
    Select objects to be synced, which can be basic tables, views, procedures, and functions. The sync of advanced objects is a one-time operation: only advanced objects already in the source database before the task start can be synced, while those added to the source database after the task start will not be synced to the target database. For more information, see Syncing Advanced Object.
    Selected Object
    After selecting the sync object on the left, click
    
    to see the selected object on the right.
    Sync Online DDL Temp Table
    If you perform an online DDL operation on tables in the source database with the gh-ost or pt-osc tool, DTS can migrate the temp tables generated by online DDL changes to the target database.
    If you select gh-ost, DTS will migrate the temp tables (_table name_ghc, _table name_gho, and _table name_del) generated by the gh-ost tool to the target database.
    If you select pt-osc, DTS will migrate the temp tables (_table name_new and _table name_old) generated by the pt-osc tool to the target database.
    For more information, see Syncing Online DDL Temp Table.
    5. On the task verification page, complete the verification. After all check items are passed, click Start Task. If the verification fails, fix the problem as instructed in Check Item Overview and initiate the verification again.
    Failed: It indicates that a check item fails and the task is blocked. You need to fix the problem and run the verification task again.
    Alarm: It indicates that a check item doesn't completely meet the requirements, and the task can be continued, but the business will be affected. You need to assess whether to ignore the alarm or fix the problem and continue the task based on the alarm message.
    6. Return to the data sync task list, and you can see that the task has entered the Running status.
    Note
    You can click More > Stop in the Operation column to stop a sync task. Before doing so, ensure that data sync has been completed.
    7. (Optional) you can click a task name to enter the task details page and view the task initialization status and monitoring data.

    Common Issues

    During incremental sync, the synced objects of the source database have not been written for a long time, and the monitoring panel shows that the delay is large
    You need to first confirm that the selected synced objects of the source database have no data written to for a long time (while other unsynced database/table objects have data written to). If there is no data written to, ignore this problem as there is an error on the monitoring metric. If not, submit a ticket for processing.
    When MySQL databases are synced to the Kafka link, DTS needs to first pull the binlog of the source database, analyze and identify the relevant data of the selected objects, and synced the data to the target Kafka. If the selected synced objects of the source database have no data written to for a long time, while other unsynced database/table objects have data written to, DTS pulls the binlog for analysis and finds that no data needs to be synced to the target end. The data on the target end has not been updated, so the delay is displayed on the monitoring panel. You can try to update the data in the selected synced objects in the source database to refresh and get the real metric data.

    Related Operations

    After the data is synced to the target Kafka, the data can be consumed. We provide you with a consumption demo, which is convenient for you to quickly test the process of data consumption and understand the method of data parsing.
    Consuming Data with Kafka Client (Avro)
    Consuming Data with Kafka Client (JSON)
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support