Configuring Index

Last updated: 2021-10-13 14:45:20

    Index configuration is a sufficient condition for using CLS for log search and analysis; that is, search and analysis can be performed only after index is enabled. In addition, different index rules can lead to different search and analysis results. This document describes how to configure an index and how it works.

    CLS supports the following index types:

    Type Description Configuration Method
    Full-text index Full-Text index creates an index by splitting a raw log into multiple segments, based on which queries are performed; for example, entering error indicates to query logs that contain the keyword error.
    If you use LogListener to collect logs in a single-/multi-line full-text manner, you must enable full-text index before you can search for logs
    Console: enable full-text index on the index configuration page.
    Key-value index Key-value index creates an index of key-value pairs for the raw log, based on which searches are performed; for example, entering level:error indicates to query logs with a level field whose value is error. Console: on the index configuration page, enable key-value index and enter the corresponding field name (i.e., the key name key), such as level.
    Metadata index Metadata index is the key-value index in essence, but the key name is identified by __TAG__ as the prefix field; for example, entering __TAG__.client:192.168.10.10 indicates to query metadata with a client field whose value is 192.168.10.10. Console: on the index configuration page, enable key-value index and enter the corresponding metadata field name (i.e., the key name key), such as __TAG__.client.

    CLS has built-in reserved fields, including the collection source IP __SOURCE__, the collection file __FILENAME__, and the timestamp __TIMESTAMP__, which will be automatically indexed (as the traffic generated by indexing such fields is not billed, no additional fees will be incurred). The reserved fields are as detailed below:

    Built-in Field Description Fees
    __FILENAME__ Filename for log collection, which can be used to filter filenames; for example, __FILENAME__:/var/log/access.log queries logs from the /var/log/access.log file. It is indexed automatically, but the traffic generated by indexing it is not billed, so no fees will be incurred.
    __SOURCE__ Source IP for log collection, which can be used to filter servers; for example, __SOURCE__:192.168.10.10 queries logs from the 192.168.10.10 server It is indexed automatically, but the traffic generated by indexing it is not billed, so no fees will be incurred.
    __TIMESTAMP__ Log timestamp, which can be used for log analysis It is be indexed automatically, but the traffic generated by indexing it is not billed, so no fees will be incurred.
    __TAG__ Prefix of the metadata field, which is used to distinguish between raw log contents. For more information on how to carry metadata, please see the description of the LogTag field in Uploading Structured Log It needs to be configured manually. The traffic generated by indexing it will be billed. For more information on the fees, please see the billing description.

    Full-Text Index

    Full-text index allows you to perform searches by using keywords. It splits all field values in the raw log content into several keywords according to the delimiter rules and then creates an index. Therefore, if no key name (i.e., the field name key) is specified for keyword queries, searches will be performed based on full-text index.

    Configuration Item Description
    Full-text delimiter A set of characters that split the raw log content into segments. The delimiters @&()='",;:<>[]{}/ \n\t\r</code> are entered by default in the console.
    Case Sensitivity This specifies whether a keyword after segmentation is case-sensitive during queries; for example, if the keyword after segmentation is Error and is case-sensitive, error cannot be queried.
    Allow Chinese characters This feature can be enabled when logs contain Chinese characters need to be searched. After you enable this feature, Chinese characters will be segmented by meaning and indexes will increase; you can increase configurations as appropriate.

    For example, a complete log is as shown below:

    10.20.20.10;[2018-07-16 13:12:57];GET /online/sample HTTP/1.1;200
    

    If you extract key-value pairs from the raw log content as instructed in Collecting CSV Logs, the structured format uploaded to CLS will be:

    IP: 10.20.20.10
    request: GET /online/sample HTTP/1.1
    status: 200
    time: [2018-07-16 13:12:57]
    

    If the full-text delimiter rule is @&()='",;:<>[]{}/ \n\t\r (including space), all field values in the raw log will be split into the following keywords (each line denotes a keyword):

    10.20.20.10
    GET
    online
    sample
    HTTP
    1.1
    200
    2018-07-16
    13
    12
    57
    

    The rule for determining a search hit is that a hit log must contain at least one keyword. For more information, please see the following samples:

    • Sample query 1. If you enter 200 for query, the sample log will be hit because it contains the keyword 200.
    • Sample query 2. If you enter /online/sample for query, the sample log cannot be returned since / is a reserved character (regular expression identifier) in CLS search syntax, which needs to be escaped to \/online\/sample by using \. As / is included in the full-text index delimiter rule, the actual query logic of \/online\/sample is online OR sample, and other logs that contain the keyword online or sample may be hit.
    • Sample query 3. If you enter "/online/sample" for query, the sample log can be returned. Because "" is also a reserved character in CLS search syntax, /, online, and sample contained in "/online/sample" are regarded as ordinary characters and don't need to be escaped. As / is included in the full-text index delimiter rule, the actual query logic of "/online/sample" is online AND sample and the word order remains unchanged; however, "/online/sample" may hit other logs such as /online/sample/abc.

    Key-Value Index

    Key-value index creates an index of key-value pairs. An index rule can be configured for each field name (key name), such as the data type, delimiters, and statistics. To perform a key-value query, you must specify the field name, and the query syntax format is key:value, such as status:200. If no field name is specified, a full-text search will be performed.

    Note:

    CLS has built-in reserved fields, including the collection source IP __SOURCE__, the collection file __FILENAME__, and the timestamp __TIMESTAMP__. By default, key-value index is configured, the key-value index delimiters are empty, and statistics collection is enabled. The traffic generated by indexing such fields is not billed, so no fees will be incurred.

    Configuration Item Description Remarks
    Data Type Data type of the field. For example, the text type supports fuzzy query, while the long and double types support range query. long - integer (Int 64)
    double - floating point (64-bit)
    text - string
    Delimiter They are used to segment fields into keywords according to the defined character set. Default delimiters: @&()='",;:<>[]{}/ \n\t\r
    Allow Chinese Characters This feature can be enabled when logs contain Chinese characters need to be searched. If this feature is enabled, Chinese statements will be segmented according to semantics, increasing the index volume to some extent. Please use this feature as needed.
    Enable Statistics After it is enabled, statistical analysis can be performed on fields, such as group by ${key} and sum(${key}). For more information, please see Overview. It is disabled by default.
    Case Sensitivity This specifies whether a keyword after segmentation is case-sensitive during queries; for example, if the keyword after segmentation is Error and is case-sensitive, level:error cannot be queried. Case-insensitive by default.

    For example, a complete log is as shown below:

    10.20.20.10;[2018-07-16 13:12:57];GET /online/sample HTTP/1.1;200
    

    If you extract key-value pairs from the raw log content as instructed in Collecting CSV Logs, the structured format uploaded to CLS will be:

    IP: 10.20.20.10
    request: GET /online/sample HTTP/1.1
    status: 200
    time: [2018-07-16 13:12:57]
    

    If key-value index is configured as follows:

    Key-Value Index Field Name Delimiters
    IP @&()='",;:<>[]{}/ \n\t\r
    request @&()='",;:<>[]{}/ \n\t\r
    status @&()='",;:<>[]{}/ \n\t\r
    time @&()='",;:<>[]{}/ \n\t\r
    • Sample query 1. If you enter IP:10.20.20.10, the sample log can be returned since the delimiters don't include a period and 10.20.20.10 is regarded as a keyword.
    • Sample query 2. If you enter request:GET /online/sample, the sample log cannot be returned since / is a reserved character (regular expression identifier) in CLS search syntax, which needs to be escaped to request:GET \/online\/sample. As / is included in the request index delimiter rule, the actual query logic of request:GET \/online\/sample is request:GET OR request:online OR request:sample, and other logs may be hit.
    • Sample query 3. If you enter request:"GET /online/sample", the sample log can be retrieved. Because "" is also a reserved character in CLS search syntax, GET, /, online, and sample contained in request:"GET /online/sample" are regarded as ordinary characters and don't need to be escaped. However, as / is included in the request index delimiter rule, the actual query logic of request:"GET /online/sample" is request:(GET AND online AND sample) and the word order remains unchanged, and other logs such as request:GET /online/sample/abc may be hit.

    Metadata Index

    When a log is uploaded to CLS, its metadata is passed through the LogTag field (for more information, please see the LogTag field in Uploading Structured Log), while the raw log content is passed through the Log field. Metadata index needs to be configured for all data which is passed through LogTag. Metadata index is key-value index as a matter of fact, and it works and is configured in the same way as key-value index. The only difference is that the metadata field is identified by the specific prefix __TAG__.. For example, the client metadata field is indexed as __TAG__.client.

    For example, a complete log is as shown below:

    10.20.20.10;[2018-07-16 13:12:57];GET /online/sample HTTP/1.1;200
    

    If you extract key-value pairs from the raw log content as instructed in Collecting CSV Logs which carries the metadata region:ap-beijing, the structured format uploaded to CLS will be:

    IP: 10.20.20.10
    request: GET /online/sample HTTP/1.1
    status: 200
    time: [2018-07-16 13:12:57]
    __TAG__.region:ap-beijing
    

    Here, metadata index is configured as follows:

    Key-Value Index Field Name Delimiters
    __TAG__.region @&()='",;:<>[]{}/ \n\t\r

    Sample query: if you enter __TAG__.region:"ap-beijing", the sample log can be returned.

    Notes

    1. Log data collected cannot be found when index is disabled.
    2. It takes one minute for the log search feature to become available after index is enabled.
    3. The data storage time is the same as that in the logset.
    4. Any changes made to the index rule apply only to new data that is written after the changes, and the indexes created for old data will not be updated.
    5. Queries and analysis are only performed based on one index rule. Therefore, different results may be obtained for the same query and analysis statements if the index rule is changed, as the new index rule changes the data range for queries and analysis.