ParentChild and Nested documents are some of the most powerful and key features which ranks ES higher than SOLR, unfortunately there isn't enough documentation around design guidelines and usage, In this blog I explore the importance of defining "_routing" while defining multi level child documents, When defining a parent child relationship it is important to consider following.
1. For successful search operation with desired results it is important that parent - child documents reside in the same shard, this is really the secret sauce of ES for parent child, you simply can't have parent child in shards that can end up in different nodes as it won't perform well.
2. Only when you have one level of parent child relationship does elasticsearch automatically set routing of child same as parent based on parent's id.
if you were to create a relationship that involves multiple level hierarchy like Parent, Child, Grand Child etc.. it is important that you define the routing field. elasticsearch does not ensure more than one level of hierarchy is stored automatically in same shard, to solve this issue we need to use a special field "_routing"
The value of this field should be matched to the unique identifier in parent / top level document.
3. Another cool feature of ES parentchild is that you can load child independent of parent, there is no dependency on existence of parent while loading child documents.
With this configuration, you are always guaranteed that every child and sub-child document of a root parent document will reside within the same shard.
4. Another way to test and ensure your related documents are ending up in the same shard is to run the index status Query after you load your first set of document with related parent, run this query
curl -XGET 'http://localhost:9200/parentchild/_status?pretty=1'
You should see all documents loading into any one particular shard, this can be validated by checking numb_docs after before and after loading the document.
Here is a good and bad example of the issues noticed with search when _routing is not defined.
1. Here is an example of incorrect (without _routing) defined.
Now run the test Query, this query filters all records in immediate parent (attributes) for leaf level child (attribute), it then apply the result to filter top level parent document (product)
curl -XPOST 'http://localhost:9200/parentchild/_search?pretty=true' -d '{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{"term": {"catentry_id" :"EI01Y" }}]
}
},
"filter": {
"has_child": {
"type": "attributes",
"query": {
"filtered": {
"query": {
"bool": {
"must": [{"term": {"attribute_id": "0001"}}]
}
},
"filter": {
"has_child": {
"type": "attrvalues",
"query": {
"term": {
"stringvalue": "test"
}
}
}
}
}
}
}
}
}
}
}'
2. Here is an example of correct(with _routing) defined.
Hi Hari,
ReplyDeleteI am using parent child mapping in ES. Is there any query available which will fetch both matching parent and child type?
Rishav
unfortunately the answer is it can not be done in single query
DeleteThanks Hari for your reply. We are developing a ElasticSearch application on top of normalized RDBMS data, the customer is looking for parent child display in output. Any suggestion from your side in implementing the same will be highly appreciated.
DeleteRishav
Hey Hari,
ReplyDeleteGreat post. I was wondering do you have any example of setting up parent/child mapping using jdbc-rivers?
Thanks
This is the way of presentation here.... we like to share this information
ReplyDeleteWebsphere Training In Hyderabad
Hadoop training in hyderabad.All the basic and get the full knowledge of hadoop.
ReplyDeletehadoop training in hyderabad
ReplyDeletepakistan super league
pakistan super league 3
pakistan super league song
pakistan super league 2018
pakistan super league teams
pakistan super league schedule and squad
pakistan super league Schedual
pakistan super league schedule
pakistan super league live streaming
psl 2018 live streaming
pakistan super league auction live
pakistan super league broadcast
pakistan super league broadcast in india
pakistan super league live cricket score
pakistan super league live in uk
pakistan super league live telecast
psl 2018 live telecast in india
psl 2018 live streaming in india
pakistan super league final
Good to see such a nice blog post Best Offshore Development Company in USA
ReplyDeleteRead full detail of Windows 11, All The Features You Need To Know About
ReplyDeleteQuickBooks is the most recommended and widely used accounting software If you're struggling with your QuickBooks account and you need help, call QuickBooks customer service +1 855-604-1500
ReplyDeleteIf you're looking for help a bout your QuickBooks , and looking for a quick solution to all your accounting problems then Dial QuickBooks customer service +1 773-516-5910 ho'll provide the best answer for your situation.
ReplyDelete