tencent cloud

文档反馈

同义词配置

最后更新时间:2020-08-12 09:44:21

    腾讯云 Elasticsearch Service 支持以下两种方式配置同义词:上传同义词文件、直接引用同义词。

    方式一:上传同义词文件

    注意事项

    • 上传同义词文件操作将触发集群滚动重启

    • 新上传/新变更的同义词文件对老索引不生效,需要重建索引。例如现有的索引myindex使用了synonym.txt同义词文件,当该同义词文件的内容变更并重新上传后,现有的索引myindex不会动态加载更新后的同义词,需要对现有索引进行reindex操作,否则更新后的同义词文件只对新建的索引生效

    • 同义词文件要求每行一个同义词表达式(表达式支持Solr规则WordNet规则),并且文件需要为utf-8编码,扩展名为.txt。例如:

        快乐水,可乐 => 可乐,快乐水
        elasticsearch,es => es
    • 同义词文件单个文件最大为10M,上传文件总数最多为10个

    操作步骤

    1. 登录腾讯云Elasticsearch Service控制台

    2. 在集群列表页,单击集群 ID 进入集群详情页

    3. 单击【高级配置】,进入【同义词配置】页面

      1594288413628

    4. 单击【更新词典】,在更新同义词页面上传同义词文件

      1594288646412

    5. 上传完成后,单击【保存】

    使用同义词文件

    以下实例使用filter过滤器配置同义词,使用synonym.txt作为测试文件,文件内容为elasticsearch,es => es

    1. 登录已上传同义词文件的集群对应的Kibana控制台。登录控制台的具体步骤请参考通过Kibana访问集群

    2. 单击左侧导航栏的Dev Tools。

    3. 在Console中执行如下的命令,创建索引

       PUT /my_index
       {
         "settings": {
           "index": {
             "analysis": {
               "analyzer": {
                 "my_ik": {
                   "type": "custom",
                   "tokenizer": "ik_smart",
                   "filter": [
                     "my_synonym"
                   ]
                 }
               },
               "filter": {
                 "my_synonym": {
                   "type": "synonym",
                   "synonyms_path": "analysis/synonym.txt"
                 }
               }
             }
           }
         },
         "mappings": {
           "_doc": {
             "properties": {
               "content": {
                 "type": "text",
                 "analyzer": "my_ik",
                 "search_analyzer": "my_ik"
               }
             }
           }
         }
       }
    4. 执行如下命令,验证同义词配置

       GET /my_index/_analyze
       {
         "analyzer": "my_ik",
         "text":"tencet elasticsearch service"
       }

      命令执行成功,将返回如下结果

       {
         "tokens": [
           {
             "token": "tencet",
             "start_offset": 0,
             "end_offset": 6,
             "type": "ENGLISH",
             "position": 0
           },
           {
             "token": "es",
             "start_offset": 7,
             "end_offset": 20,
             "type": "SYNONYM",
             "position": 1
           },
           {
             "token": "service",
             "start_offset": 21,
             "end_offset": 28,
             "type": "ENGLISH",
             "position": 2
           }
         ]
       }

      输出结果中,token es的类型是SYNONYM同义词。

    5. 执行如下命令,添加一些文档

       POST /my_index/_doc/1
       {
         "content": "tencet elasticsearch service"
       }
      
       POST /my_index/_doc/2
       {
         "content": "hello es"
       }
    6. 执行如下命令,搜索同义词

       GET my_index/_search
       {
         "query" : { "match" : { "content" : "es" }},
         "highlight" : {
           "pre_tags" : ["<tag1>", "<tag2>"],
           "post_tags" : ["</tag1>", "</tag2>"],
           "fields" : {"content": {}}
         }
       }

      命令执行成功后,返回如下结果

       {
         "took": 4,
         "timed_out": false,
         "_shards": {
           "total": 5,
           "successful": 5,
           "skipped": 0,
           "failed": 0
         },
         "hits": {
           "total": 2,
           "max_score": 0.25811607,
           "hits": [
             {
               "_index": "my_index",
               "_type": "_doc",
               "_id": "2",
               "_score": 0.25811607,
               "_source": {
                 "content": "hello es"
               },
               "highlight": {
                 "content": [
                   "hello <tag1>es</tag1>"
                 ]
               }
             },
             {
               "_index": "my_index",
               "_type": "_doc",
               "_id": "1",
               "_score": 0.25316024,
               "_source": {
                 "content": "tencet elasticsearch service"
               },
               "highlight": {
                 "content": [
                   "tencet <tag1>elasticsearch</tag1> service"
                 ]
               }
             }
           ]
         }
       }

    方式二:直接引用同义词

    1. 登录集群对应的Kibana控制台。登录控制台的具体步骤请参考通过Kibana访问集群

    2. 单击左侧导航栏的Dev Tools。

    3. 在Console中执行如下的命令,创建索引

       PUT /my_index
       {
         "settings": {
           "index": {
             "analysis": {
               "analyzer": {
                 "my_ik": {
                   "type": "custom",
                   "tokenizer": "ik_smart",
                   "filter": [
                     "my_synonym"
                   ]
                 }
               },
               "filter": {
                 "my_synonym": {
                   "type": "synonym",
                   "synonyms": [
                     "elasticsearch,es => es"
                   ],
                 }
               }
             }
           }
         },
         "mappings": {
           "_doc": {
             "properties": {
               "content": {
                 "type": "text",
                 "analyzer": "my_ik",
                 "search_analyzer": "my_ik"
               }
             }
           }
         }
       }

      这里与使用同义词文件方式的区别是,在filter中定义同义词时,直接引用了同义词,而不是同义词文件:"synonyms": ["elasticsearch,es => es"]

    4. 执行如下命令,验证同义词配置

       GET /my_index/_analyze
       {
         "analyzer": "my_ik",
         "text":"tencet elasticsearch service"
       }

      命令执行成功,将返回如下结果

       {
         "tokens": [
           {
             "token": "tencet",
             "start_offset": 0,
             "end_offset": 6,
             "type": "ENGLISH",
             "position": 0
           },
           {
             "token": "es",
             "start_offset": 7,
             "end_offset": 20,
             "type": "SYNONYM",
             "position": 1
           },
           {
             "token": "service",
             "start_offset": 21,
             "end_offset": 28,
             "type": "ENGLISH",
             "position": 2
           }
         ]
       }

      输出结果中,token es的类型是SYNONYM同义词。

    5. 执行如下命令,添加一些文档

       POST /my_index/_doc/1
       {
         "content": "tencet elasticsearch service"
       }
      
       POST /my_index/_doc/2
       {
         "content": "hello es"
       }
    6. 执行如下命令,搜索同义词

       GET my_index/_search
       {
         "query" : { "match" : { "content" : "es" }},
         "highlight" : {
           "pre_tags" : ["<tag1>", "<tag2>"],
           "post_tags" : ["</tag1>", "</tag2>"],
           "fields" : {"content": {}}
         }
       }

      命令执行成功后,返回如下结果

       {
         "took": 4,
         "timed_out": false,
         "_shards": {
           "total": 5,
           "successful": 5,
           "skipped": 0,
           "failed": 0
         },
         "hits": {
           "total": 2,
           "max_score": 0.25811607,
           "hits": [
             {
               "_index": "my_index",
               "_type": "_doc",
               "_id": "2",
               "_score": 0.25811607,
               "_source": {
                 "content": "hello es"
               },
               "highlight": {
                 "content": [
                   "hello <tag1>es</tag1>"
                 ]
               }
             },
             {
               "_index": "my_index",
               "_type": "_doc",
               "_id": "1",
               "_score": 0.25316024,
               "_source": {
                 "content": "tencet elasticsearch service"
               },
               "highlight": {
                 "content": [
                   "tencet <tag1>elasticsearch</tag1> service"
                 ]
               }
             }
           ]
         }
       }
    联系我们

    联系我们,为您的业务提供专属服务。

    技术支持

    如果你想寻求进一步的帮助,通过工单与我们进行联络。我们提供7x24的工单服务。

    7x24 电话支持