Apache Solr for relational JOIN:
- Run a single-use script that do the joins on the SOLR index (file) to replace the author_id and author_name, or include the field if you need both. Updating 10,000,000+ documents will get a while, however it is definitely achievable.
- After that you can have new documents indexed with both fields, doing the JOIN one at a time.
- In common, SOLR works with entirely denormalized information.
- If it is more than just authors’ names, you could still denormalize the information, repeating your documents for every one of the related author documents.
- Another technique is SOLR 4 has “JOIN” ability.
- You’d first need to at least contain a pipeline wherever author information is indexed into SOLR in real-time.
- Utilize Elastic Search to simply deal with the complex SOLR queries involving JOIN-like operations.