Tencent Cloud

Prev Next

search by keyword

Recent Pages

Documentation

Download PDF

Kafka Data Import

Last updated: 2020-12-21 09:50:56

Download PDF

Overview

This article describes how to import data from Kafka into a ClickHouse cluster.

Note：
For more technical exchanges on ClickHouse, submit a ticket to us, and we will add you into the ClickHouse technical exchange group.

Kafka is a widely used open source messaging middleware. A common use case of Kafka is to collect data from services as a data bus, including service, subscription, and spending data, and then generate reports or data applications. ClickHouse has a built-in Kafka engine, making it easy to integrate ClickHouse and Kafka.

Standard process for importing data from Kafka into ClickHouse:

Create an external Kafka engine table in ClickHouse as a connector to access the Kafka data source.
Create a common table in ClickHouse (usually MergeTree family) to store the data from Kafka.
Create a materialized view in ClickHouse to listen to the data in Kafka and write the data from Kafka to a ClickHouse table.

After completing the above three steps, you can import the data from Kafka to the ClickHouse cluster.

Importing data from Kafka to ClickHouse

ClickHouse provides a Kafka engine that serves as a connector (or a data stream) to access Kafka clusters. The detailed steps are as shown below:

Step 1: Create an external Kafka engine table

CREATE TABLE source
(
  `ts` DateTime, 
  `tag` String, 
  `message` String
)
ENGINE = Kafka()
SETTINGS kafka_broker_list = '172.19.0.47:9092', 
       kafka_topic_list = 'tag',
       kafka_group_name = 'clickhouse', 
       kafka_format = 'JSONEachRow',
       kafka_skip_broken_messages = 1,
       kafka_num_consumers = 2

Parameter	Required	Description
kafka_broker_list	Yes	Enter Kafka brokers and separate each one with a comma
kafka_topic_list	Yes	Enter Kafka topics and separate each one with a comma
kafka_group_name	Yes	Enter the consumer group name
kafka_format	Yes	Kafka data format. For formats supported by ClickHouse, see Formats for Input and Output Data
kafka_skip_broken_messages	No	Enter an integer greater than or equal to 0, which indicates the number of errors to tolerate when parsing messages. Where N errors occur, background threads end. The materialized view will rearrange background threads to listen to data
kafka_num_consumers	No	Number of consumers of a single external Kafka engine table. By giving a higher value for this parameter, you can increase the throughput of consumed data, but the number of consumers should not be greater than the number of partitions in the topic
kafka_row_delimiter	No	Message delimiter
kafka_schema	No	If kafka_format requires a schema definition, this parameter determines the schema
kafka_max_block_size	No	This parameter determines the maximum block size allowed for writing Kafka data into the target table. The data will be flushed to the disk if the block size exceeds this value

Step 2: Create a target table to store Kafka data. This table is the destination where the Kafka data will be stored
This article uses MergeTree to store Kafka data:
```
CREATE TABLE target
(
  `ts` DateTime, 
  `tag` String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY tag
```
Step 3: Create a materialized view to capture data
This article uses the following statements to create the materialized view:
```
CREATE MATERIALIZED VIEW source_mv TO target AS
SELECT 
  ts, 
  tag
FROM source
```
After completing the above three steps, you can query the data from Kafka in the target table.

In the above process, the materialized view acts as an intermediate pipeline to write the data streams represented by Kafka engine to the target table. In fact, a data stream can be attached to multiple materialized views to import the data from Kafka to multiple target tables simultaneously. You can also detach a data stream from a table or attach it to a target table.

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support

tencent cloud

Recent Pages

Kafka Data Import

Overview

Importing data from Kafka to ClickHouse

Was this page helpful?

Was this page helpful?

tencent cloud

Sign Up

Log in

Recent Pages

Kafka Data Import

Overview

Importing data from Kafka to ClickHouse

Was this page helpful?

Was this page helpful?