tencent cloud

Feedback

Key-Value Extraction Functions

Last updated: 2022-04-19 16:22:14

    Overview

    The figure below shows the common use cases of key-value extraction functions. After key-value extraction, logs are processed into structured data, which can be used for SQL analysis.

    Function ext_sep()

    Function definition

    This function is used to extract field value content based on a separator (single character).

    Syntax description

    ext_sep("Source field name", "Target field 1,Target field 2,Target field...", sep="Separator", quote="Non-segmentation part"", restrict=False, mode="overwrite")   
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    field Field to extract string Yes - Name of an existing field in the user log
    output A single field name or multiple new field names concatenated with commas string Yes - -
    sep Separator string No , Any single character
    quote Characters that enclose the value string No - -
    restrict Handling mode when the number of extracted values is inconsistent with the number of target fields entered by the user:
    True: ignore the extraction function and do not perform any extraction processing.
    False: try to match the first few fields
    bool No False -
    mode Write mode of the new field string No overwrite -

    Example

    • Example 1. Extract values from logs by using a comma as the separator
      Raw log:
      {"content": "hello Go,hello Java,hello python"} 
      

    Processing rule:

    // Use a comma as the separator to divide the `content` field into three parts, corresponding to the `f1`, `f2`, and `f3` fields separately.
    ext_sep("content", "f1, f2, f3", sep=",", quote="", restrict=False, mode="overwrite")
    // Delete the `content` field.
    fields_drop("content")
    

    Processing result:

    {"f1":"hello Go","f2":"hello Java","f3":"hello python"}
    
    • Example 2. Process the content string as a whole by using quote
      Raw log:
      {"content": " Go,%hello ,Java%,python"}  
      

    Processing rule:

    ext_sep("content", "f1, f2",  quote="%", restrict=False)  
    

    Processing result:

    // Though `%hello ,Java%` does contain a comma, it does not participate in separator extraction as a whole.  
    {"content":" Go,%hello ,Java%,python","f1":" Go","f2":"hello ,Java"}
    
    • Example 3: restrict=True indicates the number of divided values is different from the target fields, the function is not executed.
      Raw log:
      {"content": "1,2,3"}   
      

    Processing rule:

    ext_sep("content", "f1, f2", restrict=True)
    

    Processing result:

    {"content":"1,2,3"}  
    

    Function ext_sepstr()

    Function definition

    This function is used to extract field value content based on multiple characters (string).

    Syntax description

    ext_sepstr("Source field name","Target field 1,Target field 2,Target field...", sep="abc", restrict=False, mode="overwrite")  
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    field Field to extract string Yes - Name of an existing field in the user log
    output A single field name or multiple new field names concatenated with commas string Yes - -
    sep Separator (string) string No , -
    restrict Handling mode when the number of extracted values is inconsistent with the number of target fields entered by the user:
    True: ignore the extraction function and do not perform any extraction processing.
    False: try to match the first few fields
    bool No False -
    mode Write mode of the new field string No overwrite -

    Example

    Raw log:

    {"message":"1##2##3"}  
    

    Processing rule:

    // Use "##" as the separator to extract key-values. 
    ext_sepstr("message", "f1,f2,f3,f4", sep="##")  
    

    Processing result:

    // If the number of target fields is greater than the number of divided values, `""` is returned for the excessive fields.
    {"f1":"1","f2":"2","message":"1##2##3","f3":"3","f4":""}
    

    Function ext_json()

    Function definition

    This function is used to extract field values from JSON data.

    Syntax description

    ext_json("Source field name",prefix="",suffix="",format="full",exclude_node="JSON nodes not to expand")   
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    field Field to extract string Yes - -
    prefix Prefix of the new field string No - -
    suffix Suffix of the new field string No - -
    format full: The field name format is in full path format (parent + sep + prefix + key + suffix).
    simple: non-full path format (prefix + key + suffix)
    string No simple -
    sep Concatenation character, used to concatenate node names string No # -
    depth Depth to which the function expands the source field, beyond which nodes will not be expanded any more number No 100 1-500
    expand_array Whether to expand an array node bool No False -
    include_node Allowlist of node names that match the specified regular expression string No - -
    exclude_node Blocklist of node names that match the specified regular expression string No - -
    include_path Allowlist of node paths that match the specified regular expression string No - -
    exclude_path Allowlist of node paths that match the specified regular expression string No - -

    Example

    • Example 1. Extract the key-values of all nodes and construct new fields based on the extracted values. The example log is multi-level nesting, but the extraction does not distinguish hierarchy.
      Raw log:
      {
      "data": "{  \"k1\": 100, \"k2\": {  \"k3\": 200,  \"k4\": { \"k5\": 300}}}"
      }  
      

    Processing rule:

    ext_json("data")  
    

    Processing result:

    {"data":"{  \"k1\": 100, \"k2\": {  \"k3\": 200,  \"k4\": { \"k5\": 300}}}","k1":"100","k3":"200","k5":"300"}
    
    • Example 2. Perform extraction excluding sub_field1
      Raw log:
      {"content": "{\"sub_field1\":1,\"sub_field2\":\"2\"}"}  
      

    Processing rule:

    // `exclude_node=subfield1` indicates not to extract the node.  
    ext_json("content", format="full", exclude_node="sub_field1")  
    

    Processing result:

    {"sub_field2":"2","content":"{\"sub_field1\":1,\"sub_field2\":\"2\"}"}  
    
    • Example 3. Add prefix to subnodes
      Raw log:
      {"content": "{\"sub_field1\":{\"sub_sub_field3\":1},\"sub_field2\":\"2\"}"}  
      

    Processing rule 1:

    // When `sub_field2` is extracted, the prefix `udf\_` is automatically added to it, making it `udf\_\_sub\_field2`.
    ext_json("content", prefix="udf_", format="simple")  
    

    Processing result 1:

    {"content":"{\"sub_field1\":{\"sub_sub_field3\":1},\"sub_field2\":\"2\"}","udf_sub_field2":"2","udf_sub_sub_field3":"1"}  
    

    Processing rule 2:

    // `format=full` indicates to retain the hierarchy of the extracted field name. When `sub_field2` is extracted, the name of its parent node is automatically to it, making it `#content#__sub_field2`.  
    ext_json("content", prefix="__", format="full")  
    

    Processing result 2:

    {"#content#__sub_field2":"2","#content#sub_field1#__sub_sub_field3":"1","content":"{\"sub_field1\":{\"sub_sub_field3\":1},\"sub_field2\":\"2\"}"}
    

    Function ext_json_jmes()

    Function definition

    This function is used to extract field values from JSON data.

    Syntax description

    ext_json_jmes("Source field name", jmes= "JSON extraction expression", output="Target field", ignore_null=True, mode="overwrite")   
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    field Field to extract string Yes - -
    jmes JMES expression. For more information, see JMESPath. string Yes - -
    output Output field name. Only a single field is supported. string Yes - -
    ignore_null Whether to ignore a node whose value is null. The default value is True, ignoring fields whose value is null. Otherwise, an empty string is returned. bool No True -
    mode Write mode of the new field. Default value: overwrite string No overwrite -

    Example

    • Example 1. Extract only one node from multi-layer JSON data
      Raw log:
      {"content": "{\"a\":{\"b\":{\"c\":{\"d\":\"value\"}}}}"}  
      

    Processing rule:

    // `jmes="a.b.c.d"` means to extract the value of `a.b.c.d`.
    ext_json_jmes("content", jmes="a.b.c.d", output="target")
    

    Processing result:

    {"content":"{\"a\":{\"b\":{\"c\":{\"d\":\"value\"}}}}","target":"value"}   
    
    • Example 2
      Raw log:
      {"content": "{\"a\":{\"b\":{\"c\":{\"d\":\"value\"}}}}"}
      

    Processing rule:

    // `jmes="a.b.c.d"` means to extract the value of `a.b.c`.   
    ext_json_jmes("content", jmes="a.b.c", output="target") 
    

    Processing result:

    {"content":"{\"a\":{\"b\":{\"c\":{\"d\":\"value\"}}}}","target":"{\"d\":\"value\"}"}    
    

    Function ext_regex()

    Function definition

    This function is used to extract the value of a field by using a regular expression.

    Syntax description

    ext_regex("Source field name", regex="Regular expression", output="Target field 1,Target field 2,Target field.......", mode="overwrite") 
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    field Field to extract string Yes - -
    regex Regular expression. If the expression contains a special character, escaping is required. Otherwise, syntax error is reported. string Yes - -
    output A single field name or multiple new field names concatenated with commas string No - -
    mode Write mode of the new field. Default value: overwrite string No overwrite -

    Example

    • Example 1. Match digits
      Raw log:
      {"content": "1234abcd5678"}  
      

    Processing rule:

    ext_regex("content", regex="\d+", output="target1,target2")  
    

    Processing result:

    {"target2":"5678","content":"1234abcd5678","target1":"1234"}  
    
    • Example 2. The regular expression contains named capturing group, and some field values are automatically filled
      Raw log:
      {"content": "1234abcd"}  
      

    Processing rule:

    ext_regex("content", regex="(?<target1>\d+)(.*)", output="target2")
    

    Processing result:

    {"target2":"abcd","content":"1234abcd","target1":"1234"}  
    

    Function ext_kv()

    Function definition

    This function is used to extract key-value pairs by using two levels of separators.

    Syntax description

    ext_kv("Source field name", pair_sep=r"\s", kv_sep="=", prefix="", suffix="", mode="fill-auto")  
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    field Field to extract string Yes - -
    pair_sep Level-1 separator, separating multiple key-value pairs string Yes - -
    kv_sep Level-2 separator, separating keys and values string Yes - -
    prefix Prefix of the new field string No - -
    suffix Suffix of the new field string No - -
    mode Write mode of the new field. Default value: overwrite string No - -

    Example

    The raw log contains two levels of separators: "|" and "=".

    Raw log:

    {"content": "a=1|b=2|c=3"}  
    

    Processing rule:

    ext_kv("content", pair_sep="|", kv_sep="=")  
    

    Processing result:

    {"a":"1","b":"2","c":"3","content":"a=1|b=2|c=3"}
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support