tencent cloud

Feedback

Message Compression

Last updated: 2024-01-03 14:27:38

    Background

    In Pulsar, a message of over 5 MB cannot be successfully sent. To send such a large message, you need to compress it in the client first.

    Processing Large Message in Pulsar

    As the default size limit for a single message is 5 MB in Pulsar, the producer will fail to send a message exceeding this limit. You can handle this in the following two ways:
    Message chunking: Message chunking enables Pulsar to process large payload messages by splitting the message into chunks at the producer side and aggregating chunked messages at the consumer side.
    Message compression: The message size can be compressed by replacing the same character sequences in the message data. Pulsar supports four compression algorithms: LZ4, ZLIB, ZSTD, and Snappy.
    ‌We recommend that you compress large messages before sending them.

    Compression Algorithm Introduction and Comparison

    Introduction

    LZ4
    LZ4 is a lossless data compression algorithm that consumes a small amount of CPU. It features extremely fast compression/decompression speed.
    ZLIB
    ‌ZLIB is a common lossless data compression algorithm that can improve network transfer efficiency and network capacity because it can effectively reduce the size of transferred data. As a variant of the Lempel-Ziv compression algorithm, it can compress data to half the original size or even less. It can be used for data compression and decompression.
    ZSTD
    ‌ZSTD is a variant of the LZ77 compression algorithm and is based on Huffman coding. It is an effective compression algorithm for different compression scenarios. Compared with other compression algorithms, it compresses data faster and more efficiently because it features real-time encoding. It can guarantee a high compression ratio and high compression speed at the same time.
    Snappy
    ‌Snappy is a lossless compression algorithm based on LZ77. Its core principle lies in the replacement of the repetitive character strings in a data stream with shorter codes to reduce the stream size.

    Comparison

    Compression Algorithm
    Compression Ratio
    Compression Speed
    Decompression Speed
    ZLIB 1.2.11 -1
    2.743
    110 MB/sec
    400 MB/sec
    LZ4 1.8.1
    2.101
    750 MB/sec
    3,700 MB/sec
    ZSTD 1.3.4-1
    2.877
    470 MB/sec
    1,380 MB/sec
    Snappy 1.1.4
    2.091
    530 MB/sec
    1,800 MB/sec
    Throughput: LZ4 > Snappy > ZSTD > ZLIB
    Compression ratio: ZSTD > ZLIB > LZ4 > Snappy
    Physical resource occupation: Snappy occupies the most network bandwidth while ZSTD occupies the least.

    Compression Algorithm Test

    Test result

    Note:
    The following test results are for reference only. The actual compression effect is subject to the specific message content.
    Message Size
    Message
    Compression Algorithm
    Monitored Message Size
    Message Compression Duration
    Message Sending Duration
    5 MB
    Random message body
    LZ4 (threshold: 5 MB)
    9.95 MB
    31 ms
    0.049 ms
    ZLIB
    7.26 MB
    31 ms
    0.038 ms
    ZSTD
    8.20 MB
    31 ms
    0.039 ms
    Snappy (threshold: 5 MB)
    9.70 MB
    33 ms
    0.046 ms
    6 MB
    Random message body
    ZLIB (threshold: 6 MB)
    8.71 MB
    35 ms
    0.044 ms
    ZSTD (threshold: 6 MB)
    9.84 MB
    35 ms
    0.046 ms
    20 MB
    Same message body
    LZ4
    0.16 MB
    41 ms
    0.006 ms
    ZLIB
    0.20 MB
    42 ms
    0.006 ms
    ZSTD
    0.01 MB
    42 ms
    0.003 ms
    Snappy
    2.47 MB
    41 ms
    0.021 ms
    40 MB
    Same message body
    LZ4
    0.32 MB
    123 ms
    0.008 ms
    ZLIB
    0.39 MB
    122 ms
    0.008 ms
    ZSTD
    0.01 MB
    124 ms
    0.004 ms
    Snappy
    4.95 MB
    123 ms
    0.036 ms
    80 MB
    Same message body
    LZ4
    0.63 MB
    241 ms
    0.009 ms
    ZLIB
    0.39 MB
    244 ms
    0.01 ms
    ZSTD
    0.01 MB
    243 ms
    0.004 ms
    Snappy (threshold: 80 MB)
    9.9 MB
    243 ms
    0.056 ms
    160 MB
    Same message body
    LZ4
    1.26 MB
    484 ms
    0.013 ms
    ZLIB
    1.56 MB
    479 ms
    0.016 ms
    ZSTD
    0.03 MB
    481 ms
    0.004 ms
    320 MB
    Same message body
    LZ4
    2.5 MB
    1,035 ms
    0.03 ms
    ZLIB
    3.1 MB
    1,008 ms
    0.027 ms
    ZSTD
    0.03 MB
    949 ms
    0.004 ms
    585 MB
    Same message body
    LZ4
    4.59 MB
    1,705 ms
    0.027 ms
    ZLIB
    5.67 MB
    1,733 ms
    0.03 ms
    ZSTD
    0.11 MB
    1,722 ms
    0.006 ms
    Summary:
    For data streams with random message body (non-repetitive strings), the four compression algorithms show low compression ratios. When the message is larger than 5 MB, none of the four algorithms can compress it to less than 5 MB.
    For data streams with same message body (repetitive strings), all the compression algorithms show high compression ratios. Especially, LZ4, ZLIB, and ZSTD can compress a message of 5–600 MB to less than 5 MB.

    Message compression demo and test

    For the demo, see tdmq-sdk-demo.‌

    Test

    Parameters called by the producer:
    java -jar tdmq-sdk-demo-1.0-SNAPSHOT-jar-with-dependencies.jar pulsar://xxxx:6650 
    eyJrZXlJZCI6ImRlZmF1bHRfa2V5SWQiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJzdXBlcnVzZXIifQ.dYcCfp4XrdWRKdKaWylobY-_xEExfRCi1pMvNyZXbqU
    pulsar-78ra8ownxb7d/BigMSGSpace/BigMSGTopic subname 1 500 0 1 20480 1 0
    ‌Parameters called by the consumer:
    java -jar tdmq-sdk-demo-1.0-SNAPSHOT-jar-with-dendencies.jar pulsar://xxxx:6650 
    eyJrZXlJZCI6ImRlZmF1bHRfa2V5SWQiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJzdXBlcnVzZXIifQ.dYcCfp4XrdWRKdKaWylobY-_xEExfRCi1pMvNyZXbqU 
    pulsar-92d7w2mjwmv9/BigMessSpace/BigMessTopic subname 1 500 1
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support