Configuring Indexes

Last updated: 2021-12-30 12:50:49

    Index configuration is a sufficient condition for using CLS for log search and analysis; that is, search and analysis can be performed only after index is enabled. In addition, different index rules can lead to different search and analysis results. This document describes how to configure an index and how it works.

    CLS supports the following index types:

    Type Description Configuration Method
    Full-text index A raw log is split into multiple segments, and indexes are created based on the segments. You can query logs based on segments. For example, entering error means to query logs that contain the keyword error.
    If you use LogListener to collect logs in a single-/multi-line full-text manner, you must enable full-text indexing before you can search for logs.
    Console: enable full-text indexing on the index configuration page.
    Key-value index Indexes are created based on key-value pairs for raw logs, and logs are searched by key-value. For example, entering level:error means to query logs with a level field whose value is error. Console: on the index configuration page, enable key-value indexing and enter the corresponding field name (i.e., the key name key), such as level.
    Metadata index A metadata index is a key-value index in essence, but the key name is identified by __TAG__ as the prefix field. For example, entering __TAG__.client:192.168.10.10 means to query metadata with a client field whose value is 192.168.10.10. Console: on the index configuration page, enable key-value indexing and enter the corresponding metadata field name (i.e., the key name key), such as __TAG__.client.

    CLS has built-in reserved fields, including __FILENAME__ (collection file name), __SOURCE__ (collection source IP), and __TIMESTAMP__ (timestamp).

    Built-in Field Description Fees
    __FILENAME__ Filename for log collection, which can be used to filter filenames. For example, you can use __FILENAME__:/"var/log/access.log" to query logs from the /var/log/access.log file. An index is created automatically, but the traffic generated by indexing it is not billed, so no fees will be incurred.
    __SOURCE__ Source IP for log collection, which can be used to filter servers. For example, you can use __SOURCE__:192.168.10.10 to query logs from the 192.168.10.10 server. An index is created automatically, but the traffic generated by indexing is not billed, so no fees will be incurred.
    __TIMESTAMP__ Log timestamp, which can be used for log analysis. An index is created automatically, but the traffic generated by indexing is not billed, so no fees will be incurred.
    __CONTENT__ If the extraction mode in the collection configuration is set to full text in a single line or full text in multi lines, the original log is stored under this field. You cannot configure a key-value index for this field. If search is required, you only need to enable full-text indexing. An index needs to be configured manually, and the traffic generated by indexing will be billed. For more information on the fees, please see the billing description.
    __TAG__ Prefix of the metadata field, which is used to distinguish between raw log contents. For more information on how to carry metadata, please see the description of the LogTag field in Uploading Structured Log. An index needs to be configured manually, and the traffic generated by indexing will be billed. For more information on the fees, please see the billing description.

    Full-Text Index

    Full-Text index allows you to perform searches by using keywords. It splits all field values in the raw log content into several keywords according to the delimiter rules and then creates an index. Therefore, if no key name (i.e., the field name key) is specified for keyword queries, searches will be performed based on full-text index.

    Configuration Item Description
    Full-Text Delimiter A set of characters that split the raw log content into segments. The delimiters @&()='",;:<>[]{}/ \n\t\r\ are entered by default in the console.
    Case Sensitivity Specifies whether a keyword after segmentation is case-sensitive during queries. For example, if the keyword after segmentation is Error and is case-sensitive, error cannot be queried.
    Allow Chinese Characters This feature can be enabled when logs contain Chinese characters and the Chinese characters need to be searched. For example, if the original text of a log is in Chinese, and this feature is disabled, you cannot query the log by using a Chinese keyword contained in the original text. The query can be successful only if you use the exact original log text to query the log. However, if you enable this feature, you can query the log using a Chinese keyword contained in the original log text. If this feature is enabled, Chinese log text will be segmented according to Chinese semantics, which will increase the index volume to some extent. Please use this feature as needed.

    For example, a complete log is as shown below:

    10.20.20.10;[2018-07-16 13:12:57];GET /online/sample HTTP/1.1;200
    

    If you extract key-value pairs from the raw log content as instructed in Collecting CSV Logs, the structured format uploaded to CLS will be:

    IP: 10.20.20.10
    request: GET /online/sample HTTP/1.1
    status: 200
    time: [2018-07-16 13:12:57]
    

    If the full-text delimiter rule is @&()='",;:<>[]{}/ \n\t\r( (including space), all field values in the raw log will be split into the following keywords (each line denotes a keyword):

    10.20.20.10
    GET
    online
    sample
    HTTP
    1.1
    200
    2018-07-16
    13
    12
    57
    

    The rule for determining a search hit is that a hit log must contain at least one keyword. For more information, please see the following samples:

    • Sample query 1. If you enter 200 for query, the sample log will be hit because it contains the keyword 200.
    • Sample query 2. If you enter /online/sample for query, the sample log cannot be returned since / is a reserved character (regular expression identifier) in CLS search syntax, which needs to be escaped to \/online\/sample by using \. As / is included in the full-text index delimiter rule, the actual query logic of \/online\/sample is online OR sample, and other logs that contain the keyword online or sample may be hit.
    • Sample query 3. If you enter "/online/sample" for query, the sample log can be returned. Because "" is also a reserved character in CLS search syntax, /, online, and sample contained in "/online/sample" are regarded as ordinary characters and don't need to be escaped. As / is included in the full-text index delimiter rule, the actual query logic of "/online/sample" is online AND sample and the word order remains unchanged; however, "/online/sample" may hit other logs such as /online/sample/abc.

    Key-Value Index

    Indexes are created based on key-value pairs. An index rule can be configured for each field name (key name), such as the data type, delimiter, and statistics. To perform a key-value query, you must specify the field name, and the query syntax format is key:value, for example, status:200. If no field name is specified, a full-text search will be performed.

    Note:

    CLS has built-in reserved fields, including the collection source IP __SOURCE__, the collection file __FILENAME__, and the timestamp __TIMESTAMP__. By default, a key-value index is configured, the key-value index delimiters are empty, and statistics collection is enabled. The traffic generated by indexing such fields is not billed, so no fees will be incurred.

    Configuration Item Description Remarks
    Data Type Data type of the field. For example, the text type supports fuzzy query, while the long and double types support range query. long - integer (Int 64)
    double - floating point (64-bit)
    text - string
    Delimiter They are used to segment fields into keywords according to the defined character set. Default delimiters: @&()='",;:<>[]{}/ \n\t\r\
    Allow Chinese Characters This feature can be enabled when fields contain Chinese characters and the Chinese characters need to be searched. For example, if the original text of a field is in Chinese, and this feature is disabled, you cannot query the log by using a Chinese keyword contained in the original text. The query can be successful only if you use the exact original log text to query the log. However, if you enable this feature, you can query the log using a Chinese keyword contained in the original log text. If this feature is enabled, Chinese log text will be segmented according to Chinese semantics, which will increase the index volume to some extent. Please use this feature as needed.
    Enable Statistics After it is enabled, statistical analysis can be performed on fields, such as group by ${key} and sum(${key}). For more information, please see Overview. Disabled by default
    Case Sensitivity Specifies whether a keyword after segmentation is case-sensitive during queries. For example, if the keyword after segmentation is Error and is case-sensitive, level:error cannot be queried. Case-insensitive by default

    For example, a complete log is as shown below:

    10.20.20.10;[2018-07-16 13:12:57];GET /online/sample HTTP/1.1;200
    

    If you extract key-value pairs from the raw log content as instructed in Collecting CSV Logs, the structured format uploaded to CLS will be:

    IP: 10.20.20.10
    request: GET /online/sample HTTP/1.1
    status: 200
    time: [2018-07-16 13:12:57]
    

    If the rules for key-value indexing as follows:

    Key-Value Index Field Name Delimiters
    IP @&()='",;:<>[]{}/ \n\t\r
    request @&()='",;:<>[]{}/ \n\t\r
    status @&()='",;:<>[]{}/ \n\t\r
    time @&()='",;:<>[]{}/ \n\t\r
    • Sample query 1. If you enter IP:10.20.20.10, the sample log can be returned since the delimiters don't include a period and 10.20.20.10 is regarded as a keyword.
    • Sample query 2. If you enter request:GET /online/sample, the sample log cannot be returned since / is a reserved character (regular expression identifier) in CLS search syntax, which needs to be escaped to request:GET \/online\/sample. As / is included in the request index delimiter rule, the actual query logic of request:GET \/online\/sample is request:GET OR request:online OR request:sample, and other logs may be hit.
    • Sample query 3. If you enter request:"GET /online/sample", the sample log can be retrieved. Because "" is also a reserved character in CLS search syntax, GET, /, online, and sample contained in request:"GET /online/sample" are regarded as ordinary characters and don't need to be escaped. However, as / is included in the request index delimiter rule, the actual query logic of request:"GET /online/sample" is request:(GET AND online AND sample) and the word order remains unchanged, and other logs such as request:GET /online/sample/abc may be hit.

    Metadata Index

    When a log is uploaded to CLS, its metadata is passed through the LogTag field (for more information, please see the LogTag field in Uploading Structured Log), while the raw log content is passed through the Log field. A metadata index needs to be configured for all data which is passed via LogTag. A metadata index is a key-value index in essence, adopting the same indexing rules and configuration methods as key-value indexes. The only difference is that the metadata field in a metadata index is identified by the specific prefix __TAG__.. For example, the client metadata field is indexed as __TAG__.client.

    For example, a complete log is as shown below:

    10.20.20.10;[2018-07-16 13:12:57];GET /online/sample HTTP/1.1;200
    

    If you extract key-value pairs from the raw log content as instructed in Collecting CSV Logs which carries the metadata region:ap-beijing, the structured format uploaded to CLS will be:

    IP: 10.20.20.10
    request: GET /online/sample HTTP/1.1
    status: 200
    time: [2018-07-16 13:12:57]
    __TAG__.region:ap-beijing
    

    If the rules for metadata indexing are as follows:

    Key-Value Index Field Name Delimiters
    __TAG__.region @&()='",;:<>[]{}/ \n\t\r

    Sample query: if you enter __TAG__.region:"ap-beijing", the sample log can be returned.

    Notes

    1. Log data collected cannot be found when index is disabled.
    2. It takes 1 minute for the log search feature to become available after indexing is enabled.
    3. The data storage time is the same as that in the logset.
    4. Any changes made to the index rule apply only to new data that is written after the changes, and the indexes created for old data will not be updated.
    5. Queries and analysis are only performed based on one index rule. Therefore, different results may be obtained for the same query and analysis statements if the index rule is changed, as the new index rule changes the data range for queries and analysis.

    Directions

    1. Log in to the CLS console.

    2. On the left sidebar, click Log Topic to go to the log topic list page.

    3. Click the desired log topic ID/name to go to the log topic management page.

    4. Click the Index Configuration* tab and click Edit to go to the index configuration page.

    5. Modify index configuration as needed and click OK to save the index configuration.
      When modifying index configuration, you can also click Auto Configure to enable the system to automatically get a collected log sample and parse the fields in it into key-value indexes. You can perform fine tuning on the basis of automatic configuration to quickly obtain the final index configuration information, greatly simplifying operations.