tencent cloud

DokumentasiCloud Log Service

Regular Expression Processing Functions

Unduh
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-05-28 11:39:03

Introduction

Logs contain a large amount of text. During text processing, regular expression functions can flexibly extract keywords, perform desensitization, or determine whether specified characters are present.
For examples of common regular expressions in log scenarios, see Online Regular Expression Tester.
Purpose
Original Log Text
Regular Expression
Extraction Result
Extract content in braces.
[2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
\\{[^\\}]+\\}
{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 10], "orderField": "createTime"}}}
Extract content in brackets.
[2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
\\[\\S+\\]
[328495eb-b562-478f-9d5d-3bf7e][INFO]
Extraction Time
[2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}
2021-11-08 11:11:08,232
Extract uppercase characters of a specific length.
[2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
[A-Z]{4}
INFO
Extract lowercase characters of a specific length.
[2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 15], "orderField": "createTime"}}}
[a-z]{6}
versio
passwo
timest
interf
create
Extract letters + digits.
[2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
([a-z]{3}):([0-9]{4})
com:8080

regex_match Function

Function Definition

Match the data based on a regular expression, and return whether the matching is successful. You can select full matching or partial matching.
Note:
In some scenarios, this function can be replaced by the str_exist function for higher processing efficiency.
Syntax Description
regex_match(data, regex="", full=true)

Parameter Description

Parameter Name
Parameter Description
Parameter Type
Required
Default Value
Parameter Value Range
data
Field Value
string
Yes
-
-
regex
Regular Expression
string
Yes
-
-
full
full=True, complete matching. It means that the regex matches the data completely, and then the regex_match function returns True.
full=False, partial matching. It means that the regex matches part of the data, and the regex_match function returns True.
bool
No
True
T/F

Example

Example 1: When it is determined whether the regular expression "\\d{3}.\\d{3}.\\d.\\d" fully matches 192.168.0.1, the regex_match function returns True. When it is determined whether the regular expression "\\d{3}.\\d{3}" fully matches 192.168.0.1, the regex_match function returns False. Original log:
{"IP":"192.168.0.1", "status": "500"}
Processing rules:
//Check whether the regular expression "\\d{3}.\\d{3}.\\d.\\d" fully matches the value 192.168.0.1 of the field IP and save the result to the new field matched.
t_if_else(regex_match(v("IP"), regex="\\d{3}.\\d{3}.\\d.\\d", full=True), fields_set("matched", True),fields_set("matched", False))
//Whether the regular expression "\\d{3}.\\d{3}" fully matches the value 192.168.0.1 of the field IP:
t_if_else(regex_match(v("IP"), regex="\\d{3}.\\d{3}", full=True), fields_set("matched2", True),fields_set("matched2", False))
{"IP":"192.168.0.1","matched":"true","matched2":"false","status":"500"}
Example 2: The regex_match function returns True when it is determined whether the regular expression "\\d{3}.\\d{3}" partially matches the value 192.168.0.1 of the field IP. Original log:
{"IP":"192.168.0.1", "status": "500"}
Processing rules:
t_if(regex_match(v("IP"), regex="\\d{3}.\\d{3}", full=False), fields_set("matched", True))
Processing result:
{"IP":"192.168.0.1","matched":"true","status":"500"}

regex_select Function

Function Definition

Match data based on a regular expression and return the corresponding partial match result. You can specify the sequence number of the matched expression and the sequence number of the group to return (partial match + sequence number of the specified matched group). If no data is matched, an empty string is returned.

Syntax Description

regex_select(data, regex="", index=1, group=1)

Parameter Description

Parameter Name
Parameter Description
Parameter Type
Required
Default Value
Parameter Value Range
data
Field Value
string
Yes
-
-
regex
Regular Expression
string
Yes
-
-
index
Orth expression in the matching result.
number
No
Default the first value.
-
group
Orth group in the matching result.
number
No
Default the first value.
-

Example

Capture different content from a field value based on a regular expression.
Raw logs:
{"data":"hello123,world456", "status": "500"}
Processing rules:
fields_set("match_result", regex_select(v("data"), regex="[a-z]+(\\d+)",index=0, group=0))
fields_set("match_result1", regex_select(v("data"), regex="[a-z]+(\\d+)", index=1, group=0))
fields_set("match_result2", regex_select(v("data"), regex="([a-z]+)(\\d+)",index=0, group=0))
fields_set("match_result3", regex_select(v("data"), regex="([a-z]+)(\\d+)",index=0, group=1))
Processing result:
{"match_result2":"hello123","match_result1":"world456","data":"hello123,world456","match_result3":"hello","match_result":"hello123","status":"500"}

regex_split Function

Function Definition

Split data based on a regular expression and return a JSON Array string (partial match).

Syntax Description

regex_split(data, regex=\\"\\", limit=100)

Parameter Description

Parameter Name
Parameter Description
Parameter Type
Required
Default Value
Parameter Value Range
data
Field Value
string
Yes
-
-
regex
Regular Expression
string
Yes
-
-
limit
Maximum array length for splitting. When this length is exceeded, the excessive part will be split, constructed as an element, and added to the array.
number
No
Default value 100.
-

Example

Raw logs:
{"data":"hello123world456", "status": "500"}
Processing rules:
fields_set("split_result", regex_split(v("data"), regex="\\d+"))
Processing result:
{"data":"hello123world456","split_result":"[\\"hello\\",\\"world\\"]","status":"500"}

regex_replace Function

Function Definition

Match and replace based on a regular expression (partial match). Mainly used in desensitization scenarios.

Syntax Description

regex_replace(data, regex="", replace="", count=0)

Parameter Description

Parameter Name
Parameter Description
Parameter Type
Required
Default Value
Parameter Value Range
data
Field Value
string
Yes
-
-
regex
Regular Expression
string
Yes
-
-
replace
Target string, which is used to replace the matched result.
string
Yes
-
-
count
Replacement count. The default value is 0, indicating complete replacement.
number
No
Default value 0.
-

Example

Example 1: Replace field values based on a regular expression. Raw logs:
{"data":"hello123world456", "status": "500"}
Processing rules:
fields_set("replace_result", regex_replace(v("data"), regex="\\d+", replace="", count=0))
Processing result:
{"replace_result":"helloworld","data":"hello123world456","status":"500"}
Example 2: Desensitize user IDs, phone numbers, and IP addresses. Raw log:
{"Id": "dev@12345","Ip": "11.111.137.225","phonenumber": "13912345678"}
Processing rules:
//Mask the Id field, and the result is dev@***45.
fields_set("Id",regex_replace(v("Id"),regex="\\d{3}", replace="***",count=0))
fields_set("Id",regex_replace(v("Id"),regex="\\S{2}", replace="**",count=1))
//Mask the phonenumber field by replacing the middle four digits with ****. The result is 139****5678.
fields_set("phonenumber",regex_replace(v("phonenumber"),regex="(\\d{0,3})\\d{4}(\\d{4})", replace="$1****$2"))
//Mask the IP field by replacing the second paragraph with ***. The result is 11.***137.225.
fields_set("Ip",regex_replace(v("Ip"),regex="(\\d+\\.)\\d+(\\.\\d+\\.\\d+)", replace="$1***$2",count=0))
Processing result:
{"Id":"**v@***45","Ip":"11.***.137.225","phonenumber":"139****5678"}

regex_findall Function

Function Definition

Match data based on a regular expression, add the match result in the JSON array and return an Array string (partial match).

Syntax Description

regex_findall(data, regex="")

Parameter Description

Parameter Name
Parameter Description
Parameter Type
Required
Default Value
Parameter Value Range
data
Field Value
string
Yes
-
-
regex
Regular Expression
string
Yes
-
-

Example

Raw logs:
{"data":"hello123world456", "status": "500"}
Processing rules:
fields_set("result", regex_findall(v("data"), regex="\\d+"))
Processing result:
{"result":"[\\"123\\",\\"456\\"]","data":"hello123world456","status":"500"}

sensitive_detection Function

Function Definition

Detect sensitive information, such as ID cards, bank cards, and other sensitive data.
Note:
Use the desensitization masking option `replace_items` with caution, as desensitized data cannot be recovered.

Syntax Description

sensitive_detection(scope="", ratio=1, discover_items="", replace_items="")

Parameter Description

Parameter Name
Parameter Description
Parameter Type
Required
Default Value
Parameter Value Range
scope
Name of the field to be detected
string
Yes
ALL_FIELDS
-
sample_ratio
Sampling ratio, with the following values:
1: Indicates full detection.
0.5: Indicates sampling 50% for detection.
number
Yes
-
-
discover_items
Items to be detected, separated by commas
string
Yes
-
CHINA_PHONE_NUM,EMAIL,CHINA_IDCARD, ADDR,DEBIT_CARD,CREDIT_CARD,CHINA_PASSPORT,MAC_ADDR,IP,DOMAIN,LOCATION,VIN,PLATE_NUMBER,NAME,PASSWORD,TOKEN
replace_items
Items to be masked, separated by commas
string
Yes
-
CHINA_PHONE_NUM,EMAIL,CHINA_IDCARD, ADDR,DEBIT_CARD,CREDIT_CARD,CHINA_PASSPORT,MAC_ADDR,IP,VIN,PASSWORD,TOKEN,NAME

Example

Raw logs:
{
"sensitive_field1": "CLS log 13912345678 my car JTJHT00W274025559 www.tencent.com CLS data processing",
"sensitive_field2": "etl@tenctent.com ",
"NON_sensitive_field": "hello world"
}
Processing rules:
/*scope=ALL_FIELDS means searching all fields in the entire log.
sample_ratio=1 means performing sensitive information detection on all logs.
Detect phone numbers and Email addresses.
Desensitize (mask) phone information. Use the `replace_items` option with caution, as masked data cannot be recovered.*/
sensitive_detection(scope="ALL_FIELDS", sample_ratio=1, discover_items="CHINA_PHONE_NUM,EMAIL",replace_items="CHINA_PHONE_NUM")
Processing result:
{
"NON_sensitive_field":"hello world",
"SENSITIVE_FLAGS":"CHINA_PHONE_NUM,EMAIL",//Two types of sensitive information, phone numbers and email addresses, were detected.
//Phone numbers were masked.
"sensitive_field1": "CLS log 139****5678 my car JTJHT00W274025559 www.tencent.com CLS data processing",
"sensitive_field2":"etl@tenctent.com "
}

Description of Sensitive Detection Items:

Sensitive Information Item
Description
Example
Regular Expression
Masked Result
CHINA_PHONE_NUM
Chinese mobile phone number
13123456789
(1\\\\d{2})(\\\\d{4})(\\\\d{4})
139****1234
EMAIL
Email
abcd@nio.com
([A-Za-z0-9._%+-]+)(@[A-Za-z0-9.-]+\\\\.[A-Za-z]{2,})
***@nio.com
CHINA_IDCARD
Chinese ID card
420101199004135043
(1[1-5]|2[1-3]|3[1-7]|4[1-6]|5[0-4]|6[1-5]|[7-9]1)\\\\d{4}((18|19|20)\\\\d{2}((0[1-9])|(1[0-2]))((0[1-9]|1\\\\d|2[0-8])|(1[0-2](29|30)))|(1[013-9]|2[0-35-9])31)\\\\d{3}[0-9Xx]
420101****5043
ADDR
Chinese address
43 Haidian North Third Ring West Road, Beijing
Regular expression for matching Chinese addresses: ((.{1,6}?(province|city|autonomous region|autonomous prefecture|county|district|town|township))){1,3}((.{1,6}(road|street|lane|avenue|village|hamlet|group|residential quarter|building|number|square))){1,3}((.{1,6}(building number|unit|floor|room|household|number|apartment))|(\\\\d+-\\\\d+-\\\\d+)){0,3}
Masked entirely as ****
DEBIT_CARD
Debit card
6225092716776464882
(62\\\\d{5,11})(\\\\d{6})
6225092716776****
CREDIT_CARD 
Credit card
4539138994741478
([1-9]\\\\d{3}[\\\\s-]?\\\\d{4}[\\\\s-]?\\\\d{4}[\\\\s-]?)(\\\\d{4})
453913899474***
CHINA_PASSPORT
Chinese passport
G86067430
((1[45]\\\\d{7})|([P|p|S|s]\\\\d{7})|([S|s|G|g|E|e]\\\\d{8})|([Gg|Tt|Ss|Ll|Qq|Dd|Aa|Ff]\\\\d{8})|([H|h|M|m]\\\\d{8,10}))
Masked entirely as ****.
MACADDR  
Mac
Address
06-06-06-aa-bb-cc
([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})
06-06-06-aa-****cc
IP   
IP
Address
120.32.23.137
((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))
120.32.23.***
DOMAIN
Domain name.
www.abc123.com
((\\\\w|(\\\\w[\\\\w-]{0,86}\\\\w))\\\\.(\\\\w|(\\\\w[\\\\w-]{0,73}\\\\w))\\\\.((\\\\w{2,12}\\\\.\\\\w{2,12})|(\\\\w{2,25})))|((\\\\w|(\\\\w[\\\\w-]{0,162}\\\\w))\\\\.((\\\\w{2,12}\\\\.\\\\w{2,12})|(\\\\w{2,25})))
Masking feature not provided.
LOCATION
Latitude and longitude
31.886551,120.443934
[\\\\-\\+]?0(\\.\\d{4,10})|([1-9](\\d)?)(\\.\\d{4,10})|1[0-7]\\d{1}(\\.\\d{4,10})|180\\.0{1,10}
Masking feature not provided.
VIN
VIN
LJ1EEAUU8J7700492
([A-HJ-NPR-Z\\\\d]{10})([A-HJ-NPR-Z\\\\d]{7})
LJ1EEAUU8J****
plate_number
License plate number
Jing N5J980
Regular expression for matching Chinese license plate numbers: [京津沪渝冀豫云辽黑湘皖鲁新苏浙赣鄂桂甘晋蒙陕吉闽贵粤青藏川宁琼使领][A-Z]{1}[A-HJ-NP-Z0-9]{4}[A-HJ-NP-Z0-9挂学警港澳]
Masking feature not provided.
NAME
Name.
Log field name: Name.
Field Names: ["real_name","family_name","last_name","姓名","名字","用户名", "收件人","recv_person", "receive_person"]
Masked entirely as ****.
password
Password
Log field name: password.
Field Names: ["password", "passwd", "secret", "pass", "密码", "凭证"]
Masked entirely as ****.
token
token
Log field name: token.
Field Names: ["token", "account_key", "api_key", "authorization_code"]
Masked entirely as ****.

Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan