Apache Solr for relational JOIN:

  • Run a single-use script that do the joins on the SOLR index (file) to replace the author_id and author_name, or include the field if you need both. Updating 10,000,000+ documents will get a while, however it is definitely achievable.
  • After that you can have new documents indexed with both fields, doing the JOIN one at a time.
  • In common, SOLR works with entirely denormalized information.
  • If it is more than just authors’ names, you could still denormalize the information, repeating your documents for every one of the related author documents.
  • Another technique is SOLR 4 has “JOIN” ability.
  • You’d first need to at least contain a pipeline wherever author information is indexed into SOLR in real-time.
  • Utilize Elastic Search to simply deal with the complex SOLR queries involving JOIN-like operations.

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,