ElasticSeach _routing for Parent Child Document

Posted by Hariharan Vadivelu on

ParentChild and Nested documents are some of the most powerful and key features which ranks ES higher than SOLR, unfortunately there isn't enough documentation around design guidelines and usage, In this blog I explore the importance of defining "_routing" while defining multi level child documents, When defining a parent child relationship it is important to consider following.

1. For successful search operation with desired results it is important that parent - child documents reside in the same shard, this is really the secret sauce of ES for parent child, you simply can't have parent child in shards that can end up in different nodes as it won't perform well.

2. Only when you have one level of parent child relationship does elasticsearch automatically set routing of child same as parent based on parent's id.

if you were to create a relationship that involves multiple level hierarchy like Parent, Child, Grand Child  etc.. it is important that you define the routing field. elasticsearch does not ensure more than one level of hierarchy is stored automatically in same shard, to solve this issue we need to use a special field "_routing"
The value of this field should be matched to the unique identifier in parent / top level document.

3. Another cool feature of ES parentchild is that you can load child independent of parent, there is no dependency on existence of parent while loading child documents.

With this configuration, you are always guaranteed that every child and sub-child document of a root parent document will reside within the same shard.

4. Another way to test and ensure your related documents are ending up in the same shard is to run the index status Query after you load your first set of document with related parent, run this query
 curl -XGET 'http://localhost:9200/parentchild/_status?pretty=1'

You should see all documents loading into any one particular shard, this can be validated by checking numb_docs after before and after loading the document.

  • docs: {
    • num_docs2
    • max_doc2
    • deleted_docs0
    }

Here is a good and bad example of the issues noticed with search when _routing is not defined.

1. Here is an example of incorrect (without _routing) defined.

Now run the test Query, this query filters all records in immediate parent (attributes) for leaf level child (attribute), it then apply the result to filter top level parent document (product)

curl -XPOST 'http://localhost:9200/parentchild/_search?pretty=true' -d '{
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "must": [{"term": {"catentry_id" :"EI01Y" }}]
                }
            },
         
        "filter": {
                "has_child": {
                    "type": "attributes",
                    "query": {
                        "filtered": {
                            "query": {
                                "bool": {
                                    "must": [{"term": {"attribute_id": "0001"}}]
                                }
                            },
                            "filter": {
                                "has_child": {
                                    "type": "attrvalues",
                                    "query": {
                                        "term": {
                                           "stringvalue": "test"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}'


2. Here is an example of correct(with _routing) defined.




10 comments:

  1. Hi Hari,

    I am using parent child mapping in ES. Is there any query available which will fetch both matching parent and child type?

    Rishav

    ReplyDelete
    Replies
    1. unfortunately the answer is it can not be done in single query

      Delete
    2. Thanks Hari for your reply. We are developing a ElasticSearch application on top of normalized RDBMS data, the customer is looking for parent child display in output. Any suggestion from your side in implementing the same will be highly appreciated.

      Rishav

      Delete
  2. Hey Hari,
    Great post. I was wondering do you have any example of setting up parent/child mapping using jdbc-rivers?
    Thanks

    ReplyDelete
  3. This is the way of presentation here.... we like to share this information
    Websphere Training In Hyderabad

    ReplyDelete
  4. Hadoop training in hyderabad.All the basic and get the full knowledge of hadoop.
    hadoop training in hyderabad


    ReplyDelete
  5. It was very nice blog to learn about Selenium.Thanks for sharing new things.selenium training in chennai

    ReplyDelete
  6. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Saurabh Srivastava
    MaxMunus
    E-mail: saurabh@maxmunus.com
    Skype id: saurabhmaxmunus
    Ph:+91 8553576305 / 080 - 41103383
    http://www.maxmunus.com/


    ReplyDelete