Logs contain a large volume of text. When processing text, you can use regular expression functions to flexibly extract keywords, mask fields, or determine whether the text contains specified characters. See the figure below.
For examples of regular expressions commonly used in log scenarios, visit Online Test of Regular Expressions.
Purpose | Raw Log | Regular Expression | Extraction Result |
---|---|---|---|
Extract content in braces. | [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} |
\{[^\}]+\} | {"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 10], "orderField": "createTime"} |
Extract content in brackets. | [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} |
\[\S+\] | [328495eb-b562-478f-9d5d-3bf7e] [INFO] |
Extract time. | [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} |
\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} | 2021-11-08 11:11:08,232 |
Extract uppercase characters of a specific length. | [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} |
[A-Z]{4} | INFO |
Extract lowercase characters of a specific length. | [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 15], "orderField": "createTime"}}} |
[a-z]{6} | versio passwo timest interf create |
Extract letters and digits. | [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} |
([a-z]{3}):([0-9]{4}) | com:8080 |
This function is used to match data in full or partial match mode based on a regular expression and return whether the match is successful.
regex_match(Field value, regex="", full=True)
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
data | Field value | string | Yes | - | - |
regex | Regular expression | string | Yes | - | - |
full | Whether to enable full match. For full match, the entire value must fully match the regular expression. For partial match, only part of the value needs to match the regular expression. | bool | No | True | - |
192.168.0.1
of the field IP
(full=True). The regex_match
function returns True
for the case of full match.{"IP":"192.168.0.1", "status": "500"}
Processing rule:
// Check whether the regular expression "192\.168.*" fully matches the value `192.168.0.1` of the field `IP` and save the result to the new field `matched`.
t_if(regex_match(v("IP"), regex="192\.168.*", full=True), fields_set("matched", True))
Processing result:
{"IP":"192.168.0.1","matched":"TRUE","status":"500"}
192.168.0.1
of the field IP
(full=False). The regex_match
function returns True
for the case of partial match.{"IP":"192.168.0.1", "status": "500"}
Processing rule:
t_if(regex_match(v("ip"), regex="192", full=False), fields_set("matched", True))
Processing result:
{"IP":"192.168.0.1","matched":"TRUE","status":"500"}
This function is used to match data based on a regular expression and returns the corresponding partial match result. You can specify the sequence number of the matched expression and the sequence number of the group to return (partial match + sequence number of the specified matched group). If no data is matched, an empty string is returned.
regex_select(Field value, regex="", index=1, group=1)
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
data | Field value | string | Yes | - | - |
regex | Regular expression | string | Yes | - | - |
index | Sequence number of the matched expression in the match result | number | No | First | - |
group | Sequence number of the matched group in the match result | number | No | First | - |
Capture different content from a field value based on a regular expression.
Raw log:
{"data":"hello123,world456", "status": "500"}
Processing rule:
fields_set("match_result", regex_select(v("data"), regex="[a-z]+(\d+)",index=0, group=0))
fields_set("match_result1", regex_select(v("data"), regex="[a-z]+(\d+)", index=1, group=0))
fields_set("match_result2", regex_select(v("data"), regex="([a-z]+)(\d+)",index=0, group=0))
fields_set("match_result3", regex_select(v("data"), regex="([a-z]+)(\d+)",index=0, group=1))
Processing result:
{"match_result2":"hello123","match_result1":"world456","data":"hello123,world456","match_result3":"hello","match_result":"hello123","status":"500"}
This function is used to split a string and return a JSON array of the split strings (partial match).
regex_split(Field value, regex=\"\", limit=100)
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
data | Field value | string | Yes | - | - |
regex | Regular expression | string | Yes | - | - |
limit | Maximum array length for splitting. When this length is exceeded, the excessive part will be split, constructed as an element, and added to the array. | number | No | 100 | - |
Raw log:
{"data":"hello123world456", "status": "500"}
Processing rule:
fields_set("split_result", regex_split(v("data"), regex="\d+"))
Processing result:
{"data":"hello123world456","split_result":"[\"hello\",\"world\"]","status":"500"}
This function is used to match data based on a regular expression and replace the matched data (partial match).
regex_replace(Field value, regex="", replace="", count=0)
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
data | Field value | string | Yes | - | - |
regex | Regular expression | string | Yes | - | - |
replace | Target string, which is used to replace the matched result | string | Yes | - | - |
count | Replacement count. The default value is 0 , indicating complete replacement. |
number | No | 0 | - |
{"data":"hello123world456", "status": "500"}
Processing rule:
fields_set("replace_result", regex_replace(v("data"), regex="\d+", replace="", count=0))
Processing result:
{"replace_result":"helloworld","data":"hello123world456","status":"500"}
{"Id": "dev@12345","Ip": "11.111.137.225","phonenumber": "13912345678"}
Processing rule:
// Mask the `Id` field. The result is `dev@***45`.
fields_set("Id",regex_replace(v("Id"),regex="\d{3}", replace="***",count=0))
fields_set("Id",regex_replace(v("Id"),regex="\S{2}", replace="**",count=1))
// Mask the `phonenumber` field by replacing the middle 4 digits with ****. The result is `139****5678`.
fields_set("phonenumber",regex_replace(v("phonenumber"),regex="(\d{0,3})\d{4}(\d{4})", replace="$1****$2"))
// Mask the `Ip` field by replacing the octet with ***. The result is `11.***137.225`.
fields_set("Ip",regex_replace(v("Ip"),regex="(\d+\.)\d+(\.\d+\.\d+)", replace="$1***$2",count=0))
Processing result:
{"Id":"**v@***45","Ip":"11.***.137.225","phonenumber":"139****5678"}
This function is used to match data based on a regular expression and return a JSON array of the matched data (partial match).
regex_findall(Field value, regex="")
Parameter | Description | Parameter Type | Required | Default Value | Value Range |
---|---|---|---|---|---|
data | Field value | string | Yes | - | - |
regex | Regular expression | string | Yes | - | - |
Raw log:
{"data":"hello123world456", "status": "500"}
Processing rule:
fields_set("result", regex_findall(v("data"), regex="\d+"))
Processing result:
{"result":"[\"123\",\"456\"]","data":"hello123world456","status":"500"}
Was this page helpful?