Technical Principle

Last updated: 2018-02-05 17:24:57


How It Works

The architecture of Cloud Kafka is as follows:

A typical Cloud Kafka cluster is shown above. Producer may be the messages generated by web activities, service logs, or other information. Producers post the messages to Cloud Kafka's Broker cluster in push mode, and consumers consume the messages from Broker in pull mode. Consumer are divided into a number of Consumer Groups, In addition, the cluster manages the clustering configuration through Zookeeper, and conducts leader election, fault tolerance and so on.

Implementation of High Throughput

A huge amount of network data in Cloud Kafka are permanently sent to disks and disk files over the network. The performance of this process directly affects Kafka's overall throughput, especially from the following aspects:

  1. Efficient Use of Disks
  2. Sequential reading/writing data on disks improves the disk utilization.
    Write message
    The message is written to the page cache and flushed by the asynchronous thread.
    Read message
    The message is directly transferred from the page cache to the socket and then sent out.
    When the corresponding data is not found in the page cache, the disk IO is produced at this time, and the messages are loaded from the disk to the page cache. Then, they are sent directly from the socket.
  3. Zero Copy Mechanism of Broker
    Use the sendfile system call to send data directly from the page cache to the network.
  4. Reduced Network Overhead
  5. Data compression reduces the network load
  6. Batch Processing Mechanism: Producer writes data to Broker in batch, and Consumer pulls data from Broker in batch

Data Persistence

The data persistence of Cloud Kafka is mainly achieved through the following principles:

  1. Storage Distribution of Partitions in Topic
    In the file storage of Cloud Kafka, a topic has multiple different partitions, with each physically corresponding to a folder. Users store the messages and index files in these partitions. For example, if you create two topics, topic1 with 5 partitions and topic2 with 10 partitions, a total of 15 folders are generated across the cluster.

2 File Storage Method in Partition
Partition is physically composed of multiple segments of equal size. These segments are read/write sequentially and are deleted quickly upon expiration, which improves the disk utilization.

Scale Out

One Topic can be divided into multiple Partitions and distributed in one or more Brokers.
One consumer can subscribe to one or more of these Partitions.
Producer is responsible for equally assigning the messages to the Partitions.
Messages are well-organized in Partitions.

Consumer Group Design

Cloud Kafka does not delete the consumed messages.
Any consumer must belong to a group.
Consumers in the same Consumer Group do not consume the same partition at the same time.
Different Groups consume the same message at the same time, which is diversified (queue mode, publishing/subscription mode).

Multiple Copies

Why is the multi-copy design required?
To enhance the system availability and reliability.
Replica is evenly distributed throughout the cluster.
Replica's algorithm is as follows:

  1. Sort all Brokers (assuming a total of n Brokers) and Partitions to be assigned.
  2. Assign the (i) Partition to the (i mod n) Broker.
  3. Assign the (j) Replica of the (i) Partition to the ((i + j) mode n) Broker.

Leader Election Mechanism

Cloud Kafka maintains an ISR (in-sync replicas) dynamically in the Zookeeper.
All Replicas in the ISR keep up with the leader.
Only a member in the ISR may be chosen as the Leader.

There are f+1 Replicas in the ISR, and one Partition can
tolerate the failure of f Replicas under the premise that the committed messages are guaranteed not to be lost.
There are a total of 2f+1 Replicas (including Leaders and Followers), and it must be guaranteed that f+1
Replicas have copied the messages before committed. To ensure a new Leader is correctly selected, the number of failed Replicas cannot exceed f.