tencent cloud

Feedback

Regular Expression Processing Functions

Last updated: 2022-04-19 16:22:15

    Overview

    Logs contain a large volume of text. When processing text, you can use regular expression functions to flexibly extract keywords, mask fields, or determine whether the text contains specified characters. See the figure below.

    For examples of regular expressions commonly used in log scenarios, visit Online Test of Regular Expressions.

    Purpose Raw Log Regular Expression Extraction Result
    Extract content in braces. [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} \{[^\}]+\} {"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 10], "orderField": "createTime"}
    Extract content in brackets. [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} \[\S+\] [328495eb-b562-478f-9d5d-3bf7e]
    [INFO]
    Extract time. [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} 2021-11-08 11:11:08,232
    Extract uppercase characters of a specific length. [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} [A-Z]{4} INFO
    Extract lowercase characters of a specific length. [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 15], "orderField": "createTime"}}} [a-z]{6} versio
    passwo
    timest
    interf
    create
    Extract letters and digits. [2021-11-24 11:11:08,232][328495eb-b562-478f-9d5d-3bf7e][INFO] curl -H 'Host: ' http://abc.com:8080/pc/api -d '{"version": "1.0", "user": "CGW", "password": "123", "timestamp": 1637723468, "interface": {"Name": "ListDetail", "para": {"owner": "1253", "limit": [10, 14], "orderField": "createTime"}}} ([a-z]{3}):([0-9]{4}) com:8080

    Function regex_match

    Function definition

    This function is used to match data in full or partial match mode based on a regular expression and return whether the match is successful.

    Syntax description
    regex_match(Field value, regex="", full=True)
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    data Field value string Yes - -
    regex Regular expression string Yes - -
    full Whether to enable full match. For full match, the entire value must fully match the regular expression. For partial match, only part of the value needs to match the regular expression. bool No True -

    Sample

    • Example 1. Check whether the regular expression "192.168.*" fully matches the value 192.168.0.1 of the field IP (full=True). The regex_match function returns True for the case of full match.
      Raw log:
      {"IP":"192.168.0.1", "status": "500"}
      

    Processing rule:

    // Check whether the regular expression "192\.168.*" fully matches the value `192.168.0.1` of the field `IP` and save the result to the new field `matched`.
    t_if(regex_match(v("IP"), regex="192\.168.*", full=True), fields_set("matched", True))
    

    Processing result:

    {"IP":"192.168.0.1","matched":"TRUE","status":"500"}
    
    • Example 2. Check whether the regular expression "192*" partially matches the value 192.168.0.1 of the field IP (full=False). The regex_match function returns True for the case of partial match.
      Raw log:
      {"IP":"192.168.0.1", "status": "500"}
      

    Processing rule:

    t_if(regex_match(v("ip"), regex="192", full=False), fields_set("matched", True))
    

    Processing result:

    {"IP":"192.168.0.1","matched":"TRUE","status":"500"}
    

    Function regex_select

    Function definition

    This function is used to match data based on a regular expression and returns the corresponding partial match result. You can specify the sequence number of the matched expression and the sequence number of the group to return (partial match + sequence number of the specified matched group). If no data is matched, an empty string is returned.

    Syntax description

    regex_select(Field value, regex="", index=1, group=1)
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    data Field value string Yes - -
    regex Regular expression string Yes - -
    index Sequence number of the matched expression in the match result number No First -
    group Sequence number of the matched group in the match result number No First -

    Sample

    Capture different content from a field value based on a regular expression.

    Raw log:

    {"data":"hello123,world456", "status": "500"}
    

    Processing rule:

    fields_set("match_result", regex_select(v("data"), regex="[a-z]+(\d+)",index=0, group=0))
    fields_set("match_result1", regex_select(v("data"), regex="[a-z]+(\d+)", index=1, group=0))
    fields_set("match_result2", regex_select(v("data"), regex="([a-z]+)(\d+)",index=0, group=0))
    fields_set("match_result3", regex_select(v("data"), regex="([a-z]+)(\d+)",index=0, group=1))
    

    Processing result:

    {"match_result2":"hello123","match_result1":"world456","data":"hello123,world456","match_result3":"hello","match_result":"hello123","status":"500"}
    

    Function regex_split

    Function definition

    This function is used to split a string and return a JSON array of the split strings (partial match).

    Syntax description

    regex_split(Field value, regex=\"\", limit=100)
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    data Field value string Yes - -
    regex Regular expression string Yes - -
    limit Maximum array length for splitting. When this length is exceeded, the excessive part will be split, constructed as an element, and added to the array. number No 100 -

    Sample

    Raw log:

    {"data":"hello123world456", "status": "500"}
    

    Processing rule:

    fields_set("split_result", regex_split(v("data"), regex="\d+"))
    

    Processing result:

    {"data":"hello123world456","split_result":"[\"hello\",\"world\"]","status":"500"}
    

    Function regex_replace

    Function definition

    This function is used to match data based on a regular expression and replace the matched data (partial match).

    Syntax description

    regex_replace(Field value, regex="", replace="", count=0)
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    data Field value string Yes - -
    regex Regular expression string Yes - -
    replace Target string, which is used to replace the matched result string Yes - -
    count Replacement count. The default value is 0, indicating complete replacement. number No 0 -

    Sample

    • Example 1. Replaces a field value based on a regular expression
      Raw log:
      {"data":"hello123world456", "status": "500"}
      

    Processing rule:

    fields_set("replace_result", regex_replace(v("data"), regex="\d+", replace="", count=0))
    

    Processing result:

    {"replace_result":"helloworld","data":"hello123world456","status":"500"}
    
    • Example 2. Mask the user ID, phone number, and IP address
      Raw log:
      {"Id": "dev@12345","Ip": "11.111.137.225","phonenumber": "13912345678"}
      

    Processing rule:

    // Mask the `Id` field. The result is `dev@***45`.
    fields_set("Id",regex_replace(v("Id"),regex="\d{3}", replace="***",count=0))
    fields_set("Id",regex_replace(v("Id"),regex="\S{2}", replace="**",count=1))
    // Mask the `phonenumber` field by replacing the middle 4 digits with ****. The result is `139****5678`.
    fields_set("phonenumber",regex_replace(v("phonenumber"),regex="(\d{0,3})\d{4}(\d{4})", replace="$1****$2"))
    // Mask the `Ip` field by replacing the octet with ***. The result is `11.***137.225`.
    fields_set("Ip",regex_replace(v("Ip"),regex="(\d+\.)\d+(\.\d+\.\d+)", replace="$1***$2",count=0))
    

    Processing result:

    {"Id":"**v@***45","Ip":"11.***.137.225","phonenumber":"139****5678"}
    

    Function regex_findall

    Function definition

    This function is used to match data based on a regular expression and return a JSON array of the matched data (partial match).

    Syntax description

    regex_findall(Field value, regex="")
    

    Parameter description

    Parameter Description Parameter Type Required Default Value Value Range
    data Field value string Yes - -
    regex Regular expression string Yes - -

    Sample

    Raw log:

    {"data":"hello123world456", "status": "500"}
    

    Processing rule:

    fields_set("result", regex_findall(v("data"), regex="\d+"))
    

    Processing result:

    {"result":"[\"123\",\"456\"]","data":"hello123world456","status":"500"}
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support