Using a CRD to Configure Log Collection

Last updated: 2020-10-16 16:05:19

    Overview

    You can use the console or the Custom Resource Definitions (CRD) to configure log collection. CRD supports the collection of container standard outputs, container files, and host files. It also supports multiple log collection formats.

    Prerequisites

    You have enabled log collection in Feature Management in the TKE console. For more information, see Log Collection.

    Creating a CRD

    To create a collection configuration, you only need to define the LogConfig CRD. log-agent will modify the corresponding CLS log topics based on changes to the LogConfig CRD and set the bound server group. The format of the CRD is as follows:

    apiVersion: cls.cloud.tencent.com/v1
       kind: LogConfig                          ## Default value
    metadata:
      name: test                                ## CRD resource name, which is unique in the cluster
    spec:
      clsDetail:
        topicId: xxxxxx-xx-xx-xx-xxxxxxxx       ## CLS log topic ID. The log topic must be created in CLS in advance and not be used by other collection configurations.
        logType: minimalist_log                 ## Log collection format. json_log: JSON format. delimiter_log: separator-based format. minimalist_log: full text in a single line. multiline_log: full text in multi lines. fullregex_log: full regex format.
        extractRule:                            ## Extraction and filtering rule
          ...
      inputDetail:
        type: container_stdout                  ## Log collection type, including container_stdout (container standard output), container_file (container file), and host_file (host file)
    
        containerStdout:                        ## Container standard output
          namespace: default                    ## The Kubernetes namespace of the container to be collected. If this parameter is not specified, it indicates all namespaces.
          allContainers: false                  ## Whether to collect the standard output of all containers in the specified namespace
          container: xxx                        ## Container name in the pod that meets the includeLabels condition. This parameter is used only when includeLabels is specified.
         includeLabels:                         ## Only pods that contain the specified labels will be collected.
            k8s-app: xxx                        ## Only the logs generated by pods with the configuration of "k8s-app=xxx" in the pod labels will be collected. This parameter cannot be specified at the same time as workloads and allContainers=true.
          workloads:                            ## Kubernetes workload to which the container pod to be collected belongs
          - namespace: prod                     ## workload namespace
            name: sample-app                    ## workload name
            kind: deployment                    ## workload type. Valid values are deployment, daemonset, statefulset, job, and cronjob.
            container: xxx                      ## Name of the container for collection. If this parameter is not specified, it indicates all containers in the workload pod.
    
        containerFile:                          ## File in the container
          namespace: default                    ## The Kubernetes namespace of the container to be collected
          container: xxx                        ## Name of the container to be collected
         includeLabels:                         ## Only pods that contain the specified labels will be collected.
            k8s-app: xxx                        ## Only the logs generated by pods with the configuration of "k8s-app=xxx" in the pod labels are collected. This parameter cannot be specified at the same time as workload.
          workload:                             ## Kubernetes workload to which the container pod to be collected belongs
            name: sample-app                    ## workload name                  
            kind: deployment                    ## workload type. Valid values are deployment, daemonset, statefulset, job, and cronjob.
          logPath: /opt/logs                    ## Log folder. Wildcards are not supported.
          filePattern: app_*.log                ## Log file name. It supports the wildcards "*" and "?". The "*" wildcard matches multiple random characters, and "?" matches a single random character.
    
        hostFile:                               ## Host file
          logPath: /opt/logs                    ## Log folder. Wildcards are supported.
          filePattern: app_*.log                ## Log file name. It supports the wildcards "*" and "?". The "*" wildcard matches multiple random characters, and "?" matches a single random character.
          customLablels
            k1: v1

    Log Input Types

    Full text in a single line

    A log with full text in a single line means a full log occupies one line. When CLS collects logs, it uses \n as a line break to end a log. For easier structural management, each log is provided with a default key value __CONTENT__. However, the log data itself is no longer structured, and the log fields are not extracted. The Time log attribute depends on the collection time of the log. For more information, see Collecting Logs with Full Text in a Single Line.

    Assume that the raw data of a log is as follows:

    Tue Jan 22 12:08:15 CST 2019 Installed: libjpeg-turbo-static-1.2.90-6.el7.x86_64

    A sample LogConfig configuration is as follows:

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      clsDetail:
        topicId: xxxxxx-xx-xx-xx-xxxxxxxx
        # Single-line log
        logType: minimalist_log

    The data collected to CLS is as follows:

    __CONTENT__:Tue Jan 22 12:08:15 CST 2019 Installed: libjpeg-turbo-static-1.2.90-6.el7.x86_64

    Full text in multiple lines

    A log with full text in multiple lines means that a full log may occupy multiple lines (such as Java stacktrace). In this format, the line break \n cannot be used as the end mark of a log. To help the CLS system distinguish among the logs, "First Line Regular Expression" is used for matching. When a log in a line matches the preset regular expression, it is considered the beginning of a log, and the next matching line will be the end mark of the log. In this format, a default key value __CONTENT__ is also configured. However, the log data itself is no longer structured, and the log fields are not extracted. The Time log attribute depends on the collection time of the log. For more information, see Collecting Logs with Full Text in Multi Lines.

    Assume that the raw data of a multi-line log is as follows:

    2019-12-15 17:13:06,043 [main] ERROR com.test.logging.FooFactory:
    java.lang.NullPointerException
        at com.test.logging.FooFactory.createFoo(FooFactory.java:15)
        at com.test.logging.FooFactoryTest.test(FooFactoryTest.java:11)

    A sample LogConfig configuration is as follows:

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      clsDetail:
        topicId: xxxxxx-xx-xx-xx-xxxxxxxx
        #Multi-line log
        logType: multiline_log
        extractRule:
          #Only a line that starts with a date and time is considered the beginning of a new log. Otherwise, append the line break `\n` to the end of the current log.
          beginningRegex: \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s.+

    The data collected to CLS is as follows:

    __CONTENT__:2019-12-15 17:13:06,043 [main] ERROR com.test.logging.FooFactory:\njava.lang.NullPointerException\n    at com.test.logging.FooFactory.createFoo(FooFactory.java:15)\n    at com.test.logging.FooFactoryTest.test(FooFactoryTest.java:11)

    Full RegEx

    Full RegEx is often used to process structured logs. It parses a full log by extracting multiple key-value pairs based on a regex. For more information, see Full RegEx.
    Assume that the raw data of a log is as follows:

    10.135.46.111 - - [22/Jan/2019:19:19:30 +0800] "GET /my/course/1 HTTP/1.1" 127.0.0.1 200 782 9703 "http://127.0.0.1/course/explore?filter%5Btype%5D=all&filter%5Bprice%5D=all&filter%5BcurrentLevelId%5D=all&orderBy=studentNum" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0"  0.354 0.354

    A sample LogConfig configuration is as follows:

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      clsDetail:
        topicId: xxxxxx-xx-xx-xx-xxxxxxxx
        # Full regex
        logType: fullregex_log
        extractRule:
          # Regular expression, which extracts corresponding values based on the `()` capture groups
          logRegex: (\S+)[^\[]+(\[[^:]+:\d+:\d+:\d+\s\S+)\s"(\w+)\s(\S+)\s([^"]+)"\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s"([^"]+)"\s"([^"]+)"\s+(\S+)\s(\S+).*
          beginningRegex: (\S+)[^\[]+(\[[^:]+:\d+:\d+:\d+\s\S+)\s"(\w+)\s(\S+)\s([^"]+)"\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s"([^"]+)"\s"([^"]+)"\s+(\S+)\s(\S+).*
          # List of extracted keys, which are in one-to-one correspondence with the extracted values
          keys:  ['remote_addr','time_local','request_method','request_url','http_protocol','http_host','status','request_length','body_bytes_sent','http_referer','http_user_agent','request_time','upstream_response_time']

    The data collected to CLS is as follows:

    body_bytes_sent: 9703
    http_host: 127.0.0.1
    http_protocol: HTTP/1.1
    http_referer: http://127.0.0.1/course/explore?filter%5Btype%5D=all&filter%5Bprice%5D=all&filter%5BcurrentLevelId%5D=all&orderBy=studentNum
    http_user_agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0
    remote_addr: 10.135.46.111
    request_length: 782
    request_method: GET
    request_time: 0.354
    request_url: /my/course/1
    status: 200
    time_local: [22/Jan/2019:19:19:30 +0800]
    upstream_response_time: 0.354

    JSON format

    A JSON log automatically extracts the key at the first layer as the field name and the value at the first layer as the field value to structure the entire log. A full log ends with a line break \n. For more information, see Collecting JSON Logs.

    Assume that the raw data of a JSON log is as follows:

    {"remote_ip":"10.135.46.111","time_local":"22/Jan/2019:19:19:34 +0800","body_sent":23,"responsetime":0.232,"upstreamtime":"0.232","upstreamhost":"unix:/tmp/php-cgi.sock","http_host":"127.0.0.1","method":"POST","url":"/event/dispatch","request":"POST /event/dispatch HTTP/1.1","xff":"-","referer":"http://127.0.0.1/my/course/4","agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0","response_code":"200"}

    A sample LogConfig configuration is as follows:

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      clsDetail:
        topicId: xxxxxx-xx-xx-xx-xxxxxxxx
        # JSON log
        logType: json_log

    The data collected to CLS is as follows:

    agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0
    body_sent: 23
    http_host: 127.0.0.1
    method: POST
    referer: http://127.0.0.1/my/course/4
    remote_ip: 10.135.46.111
    request: POST /event/dispatch HTTP/1.1
    response_code: 200
    responsetime: 0.232
    time_local: 22/Jan/2019:19:19:34 +0800
    upstreamhost: unix:/tmp/php-cgi.sock
    upstreamtime: 0.232
    url: /event/dispatch
    xff: -

    Separator-based format

    In this format, a log is structured based on the specified separator, and each complete log ends with a line break \n. You need to define a unique key for each separate field for CLS to process separator-based logs. For more information, see Collecting CSV Logs.

    Assume that the content of the raw log is as follows:

    10.20.20.10 ::: [Tue Jan 22 14:49:45 CST 2019 +0800] ::: GET /online/sample HTTP/1.1 ::: 127.0.0.1 ::: 200 ::: 647 ::: 35 ::: http://127.0.0.1/

    A sample LogConfig configuration is as follows:

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      clsDetail:
        topicId: xxxxxx-xx-xx-xx-xxxxxxxx
        # Separator-based log
        logType: delimiter_log
        extractRule:
          # Separator
          delimiter: ':::'
          #List of extracted keys, which are in one-to-one correspondence to the separated fields
          keys: ['IP','time','request','host','status','length','bytes','referer']

    The data collected to CLS is as follows:

    IP: 10.20.20.10
    bytes: 35
    host: 127.0.0.1
    length: 647
    referer: http://127.0.0.1/
    request: GET /online/sample HTTP/1.1
    status: 200
    time: [Tue Jan 22 14:49:45 CST 2019 +0800]

    Log Collection Types

    Container standard output

    Example 1: collect the standard output of all containers in the default namespace

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      inputDetail:
        type: container_stdout
        containerStdout:
          namespace: default
          allContainers: true
     ...

    Example 2: collect the container standard output in the pod that belongs to ingress-gateway deployment in the production namespace

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      inputDetail:
        type: container_stdout
        containerStdout:
          allContainers: false
          workloads:
          - namespace: production
            name: ingress-gateway
            kind: deployment
      ...

    Example 3: collect the container standard output in the pod whose pod labels contain "k8s-app=nginx" under the production namespace

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      inputDetail:
        type: container_stdout
        containerStdout:
          namespace: production
          allContainers: false
          includeLabels:
            k8s-app: nginx
      ...

    Container file

    Example 1: collect the access.log file in the /data/nginx/log/ path in the nginx container in the pod that belongs to ingress-gateway deployment under the production namespace

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      topicId: xxxxxx-xx-xx-xx-xxxxxxxx
      inputDetail:
        type: container_file
        containerFile:
          namespace: production
          workload:
            name: ingress-gateway
            type: deployment
          container: nginx
          logPath: /data/nginx/log
          filePattern: access.log
      ...

    Example 2: collect the access.log file in the /data/nginx/log/ path in the nginx container in the pod whose pod labels contain "k8s-app=ingress-gateway" under the production namespace

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      inputDetail:
        type: container_file
        containerFile:
          namespace: production
          includeLabels:
            k8s-app: ingress-gateway
          container: nginx
          logPath: /data/nginx/log
          filePattern: access.log
      ...

    Host file

    Example: collect all .log files in the /data/ host path

    apiVersion: cls.cloud.tencent.com/v1
    kind: LogConfig
    spec:
      inputDetail:
        type: host_file
        hostFile:
          logPath: /data
          filePattern: *.log
      ...

    Metadata

    For the container standard output (container_stdout) and container files (container_file), in addition to the raw log content, the container metadata (for example, the ID of the container that generated the logs) also needs to be carried and reported to CLS. In this way, when viewing logs, users can trace the log source or search based on the container identifier or characteristics (such as the container name and labels).
    The following table describes the metadata entries:

    Field Name Meaning
    container_id ID of the container to which logs belong
    container_name Name of the container to which logs belong
    image_id Image name of the container to which logs belong
    labels Labels of the pod to which logs belong
    namespace Namespace of the pod to which logs belong
    pod_uid UID of the pod to which logs belong
    pod_name Name of the pod to which logs belong

    Was this page helpful?

    Was this page helpful?

    • Not at all
    • Not very helpful
    • Somewhat helpful
    • Very helpful
    • Extremely helpful
    Send Feedback
    Help