Pages

Sunday 9 February 2014

ElasticSearch Highlighting


ElasticSearch Highlighting (Postings Highlighter)
ElasticSearch Highlighting (Force Highlighter Type)
ElasticSearch Highlighting (Fast vector highlighter)
ElasticSearch Highlighting (Highlighted Fragments)

Postings Highlighter

If index_options is set to offsets in the mapping.
The postings highlighter:
        1. Faster.
        2. Less Disk Space.
        3. Breaks the text into sentences and highlights them.
        4. Treats the document as the whole corpus, and scores individual sentences.

Here is an example of setting the content field to allow for highlighting using the postings highlighter on it:

PUT method:

http://localhost:9200/products/

 "mappings": {
        "mobile": {
            "properties": {
                "name": {
                    "type":          "string",
                    "analyzer":      "english",
                    "index_options": "offsets"
                }
            }
        }
    }

http://localhost:9200/products/mobile/_bulk
{"index":{"_id": 1}}
{"name": "nokia lumia 510"}
{"index":{"_id": 2}}
{"name": "nokia lumia 520"}
{"index":{"_id": 3}}
{"name": "nokia lumia 625"}
{"index":{"_id": 4}}
{"name": "samsung galaxy core"}
{"index":{"_id": 5}}
{"name": "samsung galaxy s2"}
{"index":{"_id": 6}}
{"name": "samsung galaxy s4"}
{"index":{"_id": 7}}
{"name": "samsung galaxy note"}
{"index":{"_id": 8}}
{"name": "micromax canvas"}
{"index":{"_id": 9}}
{"name": "micromax bolt"}
{"index":{"_id": 10}}
{"name": "moto g"}

POST method:

http://localhost:9200/products/mobile/_search/?pretty=true

{
    "query": {
        "match": { "name": "lumia"}
    },
    "highlight": {
        "fields": {
            "name": {}
        }
    }
}

Results:
{
  "took": 22,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.9722309,
    "hits": [
      {
        "_index": "products",
        "_type": "mobile",
        "_id": "1",
        "_score": 0.9722309,
        "_source": {
          "name": "nokia lumia 510"
        },
        "highlight": {
          "name": [
            "nokia <em>lumia</em> 510"
          ]
        }
      },
      {
        "_index": "products",
        "_type": "mobile",
        "_id": "2",
        "_score": 0.9722309,
        "_source": {
          "name": "nokia lumia 520"
        },
        "highlight": {
          "name": [
            "nokia <em>lumia</em> 520"
          ]
        }
      },
      {
        "_index": "products",
        "_type": "mobile",
        "_id": "3",
        "_score": 0.9722309,
        "_source": {
          "name": "nokia lumia 625"
        },
        "highlight": {
          "name": [
            "nokia <em>lumia</em> 625"
          ]
        }
      }
    ]
  }
}

NOTE: I am using firefox rest client to run this example.

1 comment:

  1. Hi Ashwin, Thank you for your blog on elasticsearch feature "highlight". I have trouble in retrieving values for below result.

    [{"_index":"anil56002","_type":"client1","_id":"AVZ-V8tfNrrX44oGO5Pm","_score":0.5005603,"fields":{"name":["Client1_CRL-A-835-838-2014_1001.41.pdf"]},"highlight":{"file.content":["He was electrocuted while attending to \n\nworks in the Grape Garden of A1 at \n\nHyderabad."]}}

    I'm unable to fetch the result for highlighted text (highlight.file.content)

    ReplyDelete