Pages

Monday 10 February 2014

Fast vector highlighter

ElasticSearch Highlighting (Postings Highlighter)
ElasticSearch Highlighting (Force Highlighter Type)
ElasticSearch Highlighting (Fast vector highlighter)
ElasticSearch Highlighting (Highlighted Fragments)

Fast vector highlighter

If term_vector information is provided by setting term_vector to with_positions_offsets in the mapping then the fast vector highlighter will be used instead of the plain highlighter.

The fast vector highlighter:
            1. Is faster especially for large fields (> 1MB)
            2. Can be customized with boundary_chars, boundary_max_scan, and                                fragment_offset
            3. Requires setting term_vector to with_positions_offsets
            4. Can combine matches from multiple fields into one result.
            5. It has a phrase_limit parameter that prevents it from analyzing too many phrases            and eating tons of memory.

Here is an example:

PUT Method:

http://localhost:9200/contents/
 "mappings": {
        "message": {
            "properties": {
                "message": {
                    "type":          "string",
                    "analyzer":      "english",
                    "term_vector" : "with_positions_offsets"
                }
            }
        }
    }

http://localhost:9200/contents/message/_bulk

{"index":{"_id": 1}}
{"message": "By combining the massively popular Elasticsearch, Logstash and Kibana we have created an end-to-end stack that delivers actionable insights in real-time from almost any type of structured and unstructured data source. Built and supported by the engineers behind each of these open source products, the Elasticsearch ELK stack makes searching and analyzing data easier than ever before."}

{"index":{"_id": 2}}
{"message": "Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Architected from the ground up for use in distributed environments where reliability and scalability are must haves, Elasticsearch gives you the ability to move easily beyond simple full-text search.Through its robust set of APIs and query DSLs, plus clients for the most popular programming languages, Elasticsearch delivers on the near limitless promises of search technology."}

{"index":{"_id": 3}}
{"message": "Logstash helps you take logs and other time based event data from any system and store it in a single place for additional transformation and processing.Logstash will scrub your logs and parse all data sources into an easy to read JSON format.The most popular open source logging solution in the market today, Logstash lets users get up and running in just minutes. "}

POST Method:

http://localhost:9200/contents/message/_search/?pretty=true

{
    "query": {
        "query_string": {
            "query": "Elasticsearch"
        }
    },
    "highlight": {
        "fields": {
            "message": {}
        }
    }
}

Results:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.21650635,
    "hits": [
      {
        "_index": "contents",
        "_type": "message",
        "_id": "2",
        "_score": 0.21650635,
        "_source": {
          "message": "Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Architected from the ground up for use in distributed environments where reliability and scalability are must haves, Elasticsearch gives you the ability to move easily beyond simple full-text search.Through its robust set of APIs and query DSLs, plus clients for the most popular programming languages, Elasticsearch delivers on the near limitless promises of search technology."
        },
        "highlight": {
          "message": [
            "<em>Elasticsearch</em> is a flexible and powerful open source, distributed, real-time search and analytics",
            " and scalability are must haves, <em>Elasticsearch</em> gives you the ability to move easily beyond simple full-text",
            " languages, <em>Elasticsearch</em> delivers on the near limitless promises of search technology."
          ]
        }
      },
      {
        "_index": "contents",
        "_type": "message",
        "_id": "1",
        "_score": 0.17677669,
        "_source": {
          "message": "By combining the massively popular Elasticsearch, Logstash and Kibana we have created an end-to-end stack that delivers actionable insights in real-time from almost any type of structured and unstructured data source. Built and supported by the engineers behind each of these open source products, the Elasticsearch ELK stack makes searching and analyzing data easier than ever before."
        },
        "highlight": {
          "message": [
            "By combining the massively popular <em>Elasticsearch</em>, Logstash and Kibana we have created an end-to-end",
            ", the <em>Elasticsearch</em> ELK stack makes searching and analyzing data easier than ever before."
          ]
        }
      }
    ]
  }

}

NOTE: I am using firefox rest client to run this example.

No comments:

Post a Comment