tencent cloud

Feedback

Regular Expression Processing Functions

Last updated: 2024-01-20 17:44:35

    Overview

    Logs contain a large volume of text. When processing text, you can use regular expression functions to flexibly extract keywords, mask fields, or determine whether the text contains specified characters. See the figure below.
    
    
    
    For examples of regular expressions commonly used in log scenarios, visit Online Test of Regular Expressions.
    Purpose
    Raw Log
    Regular Expression
    Extraction Result
    Extract content in braces.
    [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
    \\{[^\\}]+\\}
    {"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 10], "orderField": "createTime"}
    Extract content in brackets.
    [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
    \\[\\S+\\]
    [328495eb-b562-478f-9d5d-3bf7e]
    [INFO]
    Extract time.
    [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
    \\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}
    2021-11-08 11:11:08,232
    Extract uppercase characters of a specific length.
    [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
    [A-Z]{4}
    INFO
    Extract lowercase characters of a specific length.
    [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 15], "orderField": "createTime"}}}
    [a-z]{6}
    versio
    passwo
    timest
    interf
    create
    Extract letters and digits.
    [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}}
    ([a-z]{3}):([0-9]{4})
    com:8080

    Function regex_match

    Function definition

    This function is used to match data in full or partial match mode based on a regular expression and return whether the match is successful.
    Syntax description
    regex_match(Field value, regex="", full=True)

    Parameter description

    Parameter
    Description
    Parameter Type
    Required
    Default Value
    Value Range
    data
    Field value
    string
    Yes
    -
    -
    regex
    Regular expression
    string
    Yes
    -
    -
    full
    Whether to enable full match. For full match, the entire value must fully match the regular expression. For partial match, only part of the value needs to match the regular expression.
    bool
    No
    True
    -

    Sample

    Example 1. Check whether the regular expression "192.168.*" fully matches the value 192.168.0.1 of the field IP (full=True). The regex_match function returns True for the case of full match. Raw log:
    {"IP":"192.168.0.1", "status": "500"}
    Processing rule:
    // Check whether the regular expression "192\\.168.*" fully matches the value `192.168.0.1` of the field `IP` and save the result to the new field `matched`.
    t_if(regex_match(v("IP"), regex="192\\.168.*", full=True), fields_set("matched", True))
    Processing result:
    {"IP":"192.168.0.1","matched":"TRUE","status":"500"}
    Example 2. Check whether the regular expression "192*" partially matches the value 192.168.0.1 of the field IP (full=False). The regex_match function returns True for the case of partial match. Raw log:
    {"IP":"192.168.0.1", "status": "500"}
    Processing rule:
    t_if(regex_match(v("ip"), regex="192", full=False), fields_set("matched", True))
    Processing result:
    {"IP":"192.168.0.1","matched":"TRUE","status":"500"}

    Function regex_select

    Function definition

    This function is used to match data based on a regular expression and returns the corresponding partial match result. You can specify the sequence number of the matched expression and the sequence number of the group to return (partial match + sequence number of the specified matched group). If no data is matched, an empty string is returned.

    Syntax description

    regex_select(Field value, regex="", index=1, group=1)

    Parameter description

    Parameter
    Description
    Parameter Type
    Required
    Default Value
    Value Range
    data
    Field value
    string
    Yes
    -
    -
    regex
    Regular expression
    string
    Yes
    -
    -
    index
    Sequence number of the matched expression in the match result
    number
    No
    First
    -
    group
    Sequence number of the matched group in the match result
    number
    No
    First
    -

    Sample

    Capture different content from a field value based on a regular expression.
    Raw log:
    {"data":"hello123,world456", "status": "500"}
    Processing rule:
    fields_set("match_result", regex_select(v("data"), regex="[a-z]+(\\d+)",index=0, group=0))
    fields_set("match_result1", regex_select(v("data"), regex="[a-z]+(\\d+)", index=1, group=0))
    fields_set("match_result2", regex_select(v("data"), regex="([a-z]+)(\\d+)",index=0, group=0))
    fields_set("match_result3", regex_select(v("data"), regex="([a-z]+)(\\d+)",index=0, group=1))
    Processing result:
    {"match_result2":"hello123","match_result1":"world456","data":"hello123,world456","match_result3":"hello","match_result":"hello123","status":"500"}

    Function regex_split

    Function definition

    This function is used to split a string and return a JSON array of the split strings (partial match).

    Syntax description

    regex_split(Field value, regex=\\"\\", limit=100)

    Parameter description

    Parameter
    Description
    Parameter Type
    Required
    Default Value
    Value Range
    data
    Field value
    string
    Yes
    -
    -
    regex
    Regular expression
    string
    Yes
    -
    -
    limit
    Maximum array length for splitting. When this length is exceeded, the excessive part will be split, constructed as an element, and added to the array.
    number
    No
    100
    -

    Sample

    Raw log:
    {"data":"hello123world456", "status": "500"}
    Processing rule:
    fields_set("split_result", regex_split(v("data"), regex="\\d+"))
    Processing result:
    {"data":"hello123world456","split_result":"[\\"hello\\",\\"world\\"]","status":"500"}

    Function regex_replace

    Function definition

    This function is used to match data based on a regular expression and replace the matched data (partial match).

    Syntax description

    regex_replace(Field value, regex="", replace="", count=0)

    Parameter description

    Parameter
    Description
    Parameter Type
    Required
    Default Value
    Value Range
    data
    Field value
    string
    Yes
    -
    -
    regex
    Regular expression
    string
    Yes
    -
    -
    replace
    Target string, which is used to replace the matched result
    string
    Yes
    -
    -
    count
    Replacement count. The default value is 0, indicating complete replacement.
    number
    No
    0
    -

    Sample

    Example 1. Replaces a field value based on a regular expression Raw log:
    {"data":"hello123world456", "status": "500"}
    Processing rule:
    fields_set("replace_result", regex_replace(v("data"), regex="\\d+", replace="", count=0))
    Processing result:
    {"replace_result":"helloworld","data":"hello123world456","status":"500"}
    Example 2. Mask the user ID, phone number, and IP address Raw log:
    {"Id": "dev@12345","Ip": "11.111.137.225","phonenumber": "13912345678"}
    Processing rule:
    // Mask the `Id` field. The result is `dev@***45`.
    fields_set("Id",regex_replace(v("Id"),regex="\\d{3}", replace="***",count=0))
    fields_set("Id",regex_replace(v("Id"),regex="\\S{2}", replace="**",count=1))
    // Mask the `phonenumber` field by replacing the middle 4 digits with ****. The result is `139****5678`.
    fields_set("phonenumber",regex_replace(v("phonenumber"),regex="(\\d{0,3})\\d{4}(\\d{4})", replace="$1****$2"))
    // Mask the `Ip` field by replacing the octet with ***. The result is `11.***137.225`.
    fields_set("Ip",regex_replace(v("Ip"),regex="(\\d+\\.)\\d+(\\.\\d+\\.\\d+)", replace="$1***$2",count=0))
    Processing result:
    {"Id":"**v@***45","Ip":"11.***.137.225","phonenumber":"139****5678"}

    Function regex_findall

    Function definition

    This function is used to match data based on a regular expression and return a JSON array of the matched data (partial match).

    Syntax description

    regex_findall(Field value, regex="")

    Parameter description

    Parameter
    Description
    Parameter Type
    Required
    Default Value
    Value Range
    data
    Field value
    string
    Yes
    -
    -
    regex
    Regular expression
    string
    Yes
    -
    -

    Sample

    Raw log:
    {"data":"hello123world456", "status": "500"}
    Processing rule:
    fields_set("result", regex_findall(v("data"), regex="\\d+"))
    Processing result:
    {"result":"[\\"123\\",\\"456\\"]","data":"hello123world456","status":"500"}
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support