The figure below shows the common use cases of key-value extraction functions. After key-value extraction, logs are processed into structured data, which can be used for SQL analysis.
This function is used to extract field value content based on a separator (single character).
ext_sep("Source field name", "Target field 1,Target field 2,Target field...", sep="Separator", quote="Non-segmentation part"", restrict=False, mode="overwrite")
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
field | Field to extract | string | Yes | - | Name of an existing field in the user log |
output | A single field name or multiple new field names concatenated with commas | string | Yes | - | - |
sep | Separator | string | No | , | Any single character |
quote | Characters that enclose the value | string | No | - | - |
restrict | Handling mode when the number of extracted values is inconsistent with the number of target fields entered by the user: True: ignore the extraction function and do not perform any extraction processing. False: try to match the first few fields |
bool | No | False | - |
mode | Write mode of the new field | string | No | overwrite | - |
{"content": "hello Go,hello Java,hello python"}
Processing rule:
// Use a comma as the separator to divide the `content` field into three parts, corresponding to the `f1`, `f2`, and `f3` fields separately.
ext_sep("content", "f1, f2, f3", sep=",", quote="", restrict=False, mode="overwrite")
// Delete the `content` field.
fields_drop("content")
Processing result:
{"f1":"hello Go","f2":"hello Java","f3":"hello python"}
content
string as a whole by using quote
{"content": " Go,%hello ,Java%,python"}
Processing rule:
ext_sep("content", "f1, f2", quote="%", restrict=False)
Processing result:
// Though `%hello ,Java%` does contain a comma, it does not participate in separator extraction as a whole.
{"content":" Go,%hello ,Java%,python","f1":" Go","f2":"hello ,Java"}
restrict=True
indicates the number of divided values is different from the target fields, the function is not executed.{"content": "1,2,3"}
Processing rule:
ext_sep("content", "f1, f2", restrict=True)
Processing result:
{"content":"1,2,3"}
This function is used to extract field value content based on multiple characters (string).
ext_sepstr("Source field name","Target field 1,Target field 2,Target field...", sep="abc", restrict=False, mode="overwrite")
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
field | Field to extract | string | Yes | - | Name of an existing field in the user log |
output | A single field name or multiple new field names concatenated with commas | string | Yes | - | - |
sep | Separator (string) | string | No | , | - |
restrict | Handling mode when the number of extracted values is inconsistent with the number of target fields entered by the user: True: ignore the extraction function and do not perform any extraction processing. False: try to match the first few fields |
bool | No | False | - |
mode | Write mode of the new field | string | No | overwrite | - |
Raw log:
{"message":"1##2##3"}
Processing rule:
// Use "##" as the separator to extract key-values.
ext_sepstr("message", "f1,f2,f3,f4", sep="##")
Processing result:
// If the number of target fields is greater than the number of divided values, `""` is returned for the excessive fields.
{"f1":"1","f2":"2","message":"1##2##3","f3":"3","f4":""}
This function is used to extract field values from JSON data.
ext_json("Source field name",prefix="",suffix="",format="full",exclude_node="JSON nodes not to expand")
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
field | Field to extract | string | Yes | - | - |
prefix | Prefix of the new field | string | No | - | - |
suffix | Suffix of the new field | string | No | - | - |
format | full : The field name format is in full path format (parent + sep + prefix + key + suffix).simple : non-full path format (prefix + key + suffix) |
string | No | simple | - |
sep | Concatenation character, used to concatenate node names | string | No | # | - |
depth | Depth to which the function expands the source field, beyond which nodes will not be expanded any more | number | No | 100 | 1-500 |
expand_array | Whether to expand an array node | bool | No | False | - |
include_node | Allowlist of node names that match the specified regular expression | string | No | - | - |
exclude_node | Blocklist of node names that match the specified regular expression | string | No | - | - |
include_path | Allowlist of node paths that match the specified regular expression | string | No | - | - |
exclude_path | Allowlist of node paths that match the specified regular expression | string | No | - | - |
{
"data": "{ \"k1\": 100, \"k2\": { \"k3\": 200, \"k4\": { \"k5\": 300}}}"
}
Processing rule:
ext_json("data")
Processing result:
{"data":"{ \"k1\": 100, \"k2\": { \"k3\": 200, \"k4\": { \"k5\": 300}}}","k1":"100","k3":"200","k5":"300"}
sub_field1
{"content": "{\"sub_field1\":1,\"sub_field2\":\"2\"}"}
Processing rule:
// `exclude_node=subfield1` indicates not to extract the node.
ext_json("content", format="full", exclude_node="sub_field1")
Processing result:
{"sub_field2":"2","content":"{\"sub_field1\":1,\"sub_field2\":\"2\"}"}
prefix
to subnodes{"content": "{\"sub_field1\":{\"sub_sub_field3\":1},\"sub_field2\":\"2\"}"}
Processing rule 1:
// When `sub_field2` is extracted, the prefix `udf\_` is automatically added to it, making it `udf\_\_sub\_field2`.
ext_json("content", prefix="udf_", format="simple")
Processing result 1:
{"content":"{\"sub_field1\":{\"sub_sub_field3\":1},\"sub_field2\":\"2\"}","udf_sub_field2":"2","udf_sub_sub_field3":"1"}
Processing rule 2:
// `format=full` indicates to retain the hierarchy of the extracted field name. When `sub_field2` is extracted, the name of its parent node is automatically to it, making it `#content#__sub_field2`.
ext_json("content", prefix="__", format="full")
Processing result 2:
{"#content#__sub_field2":"2","#content#sub_field1#__sub_sub_field3":"1","content":"{\"sub_field1\":{\"sub_sub_field3\":1},\"sub_field2\":\"2\"}"}
This function is used to extract field values from JSON data.
ext_json_jmes("Source field name", jmes= "JSON extraction expression", output="Target field", ignore_null=True, mode="overwrite")
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
field | Field to extract | string | Yes | - | - |
jmes | JMES expression. For more information, see JMESPath. | string | Yes | - | - |
output | Output field name. Only a single field is supported. | string | Yes | - | - |
ignore_null | Whether to ignore a node whose value is null. The default value is True , ignoring fields whose value is null. Otherwise, an empty string is returned. |
bool | No | True | - |
mode | Write mode of the new field. Default value: overwrite |
string | No | overwrite | - |
{"content": "{\"a\":{\"b\":{\"c\":{\"d\":\"value\"}}}}"}
Processing rule:
// `jmes="a.b.c.d"` means to extract the value of `a.b.c.d`.
ext_json_jmes("content", jmes="a.b.c.d", output="target")
Processing result:
{"content":"{\"a\":{\"b\":{\"c\":{\"d\":\"value\"}}}}","target":"value"}
{"content": "{\"a\":{\"b\":{\"c\":{\"d\":\"value\"}}}}"}
Processing rule:
// `jmes="a.b.c.d"` means to extract the value of `a.b.c`.
ext_json_jmes("content", jmes="a.b.c", output="target")
Processing result:
{"content":"{\"a\":{\"b\":{\"c\":{\"d\":\"value\"}}}}","target":"{\"d\":\"value\"}"}
This function is used to extract the value of a field by using a regular expression.
ext_regex("Source field name", regex="Regular expression", output="Target field 1,Target field 2,Target field.......", mode="overwrite")
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
field | Field to extract | string | Yes | - | - |
regex | Regular expression. If the expression contains a special character, escaping is required. Otherwise, syntax error is reported. | string | Yes | - | - |
output | A single field name or multiple new field names concatenated with commas | string | No | - | - |
mode | Write mode of the new field. Default value: overwrite |
string | No | overwrite | - |
{"content": "1234abcd5678"}
Processing rule:
ext_regex("content", regex="\d+", output="target1,target2")
Processing result:
{"target2":"5678","content":"1234abcd5678","target1":"1234"}
{"content": "1234abcd"}
Processing rule:
ext_regex("content", regex="(?<target1>\d+)(.*)", output="target2")
Processing result:
{"target2":"abcd","content":"1234abcd","target1":"1234"}
This function is used to extract key-value pairs by using two levels of separators.
ext_kv("Source field name", pair_sep=r"\s", kv_sep="=", prefix="", suffix="", mode="fill-auto")
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
field | Field to extract | string | Yes | - | - |
pair_sep | Level-1 separator, separating multiple key-value pairs | string | Yes | - | - |
kv_sep | Level-2 separator, separating keys and values | string | Yes | - | - |
prefix | Prefix of the new field | string | No | - | - |
suffix | Suffix of the new field | string | No | - | - |
mode | Write mode of the new field. Default value: overwrite |
string | No | - | - |
The raw log contains two levels of separators: "|" and "=".
Raw log:
{"content": "a=1|b=2|c=3"}
Processing rule:
ext_kv("content", pair_sep="|", kv_sep="=")
Processing result:
{"a":"1","b":"2","c":"3","content":"a=1|b=2|c=3"}
Was this page helpful?