tencent cloud

Feedback

Merging MySQL CDC Sources

Last updated: 2023-11-08 15:58:10
    When syncing data, Flink CDC Connector will establish a database connection for each table. This significantly increases the load on the database instance when data is synced among multiple tables or across the database. For this, Stream Compute Service introduced the source merging capability.

    Overview

    Take the following job for example:
    CREATE TABLE `source_1`
    (
    `f_sequence` INT,
    `f_random` INT,
    `f_random_str` VARCHAR,
    PRIMARY KEY (`f_sequence`) NOT ENFORCED
    ) WITH (
    'connector' = 'mysql-cdc' ,
    'hostname' = 'ip1',
    'port' = '3306',
    'username' = 'xxx',
    'password' = 'xxx',
    'database-name' = 'db1',
    'table-name' = 'source_1'
    );
    
    CREATE TABLE `source_2`
    (
    `f_sequence` INT,
    `f_random` INT,
    `f_random_1` INT,
    `f_random_str` VARCHAR,
    `f_random_str_1` VARCHAR,
    PRIMARY KEY (`f_sequence`) NOT ENFORCED
    ) WITH (
    'connector' = 'mysql-cdc' ,
    'hostname' = 'ip1',
    'port' = '3306',
    'username' = 'xxx',
    'password' = 'xxx',
    'database-name' = 'db2',
    'table-name' = 'source_2'
    );
    
    
    CREATE TABLE `sink_1`
    (
    `f_sequence` INT,
    `f_random` INT,
    `f_random_str` VARCHAR,
    PRIMARY KEY (`f_sequence`) NOT ENFORCED
    ) WITH (
    'connector' = 'logger'
    );
    
    CREATE TABLE `sink_2`
    (
    `f_sequence` INT,
    `f_random` INT,
    `f_random_1` INT,
    `f_random_str` VARCHAR,
    `f_random_str_1` VARCHAR,
    PRIMARY KEY (`f_sequence`) NOT ENFORCED
    ) WITH (
    'connector' = 'logger'
    );
    
    insert into sink_1 select * from source_1;
    insert into sink_2 select * from source_2;
    To sync the two tables, which belong to the same database instance, Flink will generate two pipelines. Each CDC source will establish a connection with the MySQL database. When a job has many database connections, the load on the database will be high.
    
    After source merging is enabled, multiple MySQL CDC sources of the same database instance will be merged into one source, reducing the database load. Also, when new data is synced, binlog data only needs to be read once for multiple sources.
    

    How to enable CDC source merging

    At the beginning of a SQL job, use the SET command to enable merging of MySQL CDC sources.
    SET table.optimizer.mysql-cdc-source.merge.enabled=true;
    SET parameters
    Option
    Default Value
    Description
    table.optimizer.mysql-cdc-source.merge.enabled
    false
    Whether to enable merging of MySQL sources. If it is enabled, Stream Compute Service will automatically merge MySQL CDC sources that belong to the same database in a job into one source.
    table.optimizer.mysql-cdc-source.merge.default-group.splits
    1
    The number of sources multiple MySQL CDC sources are merged into (when merging is enabled). If there is a large number of tables, merging them all into one source may not meet performance requirements. In such cases, you can increase the value of this parameter. Stream Compute Service will group the sources as evenly as possible according to the parameter specified.
    Assume that source_1, source_2, source_3, and source_4 are MySQL CDC sources from the same database instance (source and sink definitions are omitted). You can use the following SET command to configure the source merging feature.
    SET table.optimizer.mysql-cdc-source.merge.enabled=true;
    SET table.optimizer.mysql-cdc-source.merge.default-group.splits=2;
    
    insert into sink_1 select * from source_1;
    insert into sink_2 select * from source_2;
    insert into sink_3 select * from source_3;
    insert into sink_4 select * from source_4;
    This will automatically divide the four CDC sources into two groups.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support