Data compression can reduce network I/O transmission traffic and disk usage. This document describes the message formats supported for data compression and how to configure it based on your actual needs.
Currently, Kafka supports two message formats, i.e., v1 and v2 (imported in v0.11.0.0). CKafka supports the open-source versions 0.9 and 0.10.2, and exclusive CKafka clusters support v1.1.1.
Different configurations apply to different versions, which are described as below:
Snappy is the officially recommended compression algorithm. Its analysis process is as follows:
Performance of a compression algorithm is evaluated mainly based on two metrics: compression ratio and compression/decompression throughput.
Versions below Kafka 2.1.0 support three compression algorithms: Gzip, Snappy, and LZ4.
In actual use of Kafka, comparison of performance metric between the three algorithms is as shown below:
Comparison of physical resource usage is as shown below:
Therefore, the recommended order of the three compression algorithms under normal circumstances is LZ4 > Gzip > Snappy.
Long-Term testing in the production environment shows that the above model works well in most cases. However, in extreme cases, the LZ4 compression algorithm will increase the CPU load.
As shown by analysis, this is because that the source data content varies by business, resulting in different performance of the compression algorithm. Therefore, we recommended users sensitive to CPU metrics use the more stable Snappy compression algorithm.
A producer can use the following method to configure data compression:
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("acks", "all"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); // After the producer is started, all its produced message sets will be compressed, which can greatly reduce the network transmission bandwidth and disk usage of the Kafka broker. // Note that different versions have different configurations. Currently, versions 0.9 and below do not support compression, and versions 0.10 and above do not support Gzip compression. props.put("compression.type", " lz4 "); Producer<String, String> producer = new KafkaProducer<>(props); In most cases, after receiving a message from the producer, the broker will retain it as-is without making any modification.
compression.codeccannot be set.