solr index hdfs files

solr index hdfs files - Wikitechy https://www.wikitechy.com/interview-questions/tag/solr-index-hdfs-files/ Interview Questions Mon, 13 Sep 2021 05:15:57 +0000 en-US hourly 1 https://wordpress.org/?v=6.9 https://www.wikitechy.com/interview-questions/wp-content/uploads/2025/10/cropped-wikitechy-icon-32x32.png solr index hdfs files - Wikitechy https://www.wikitechy.com/interview-questions/tag/solr-index-hdfs-files/ 32 32 What is best practice indexing hdfs data into solr using hive ? https://www.wikitechy.com/interview-questions/hive/what-is-best-practice-indexing-hdfs-data-into-solr-using-hive/ https://www.wikitechy.com/interview-questions/hive/what-is-best-practice-indexing-hdfs-data-into-solr-using-hive/#respond Tue, 13 Jul 2021 21:50:56 +0000 https://www.wikitechy.com/interview-questions/?p=579

Best practice indexing hdfs data into solr using hive

Here,based on the requirement especially how typically your data gets updated, volume and architecture.

Run a MR job to index data using solrj.
Create Lucene index using mr job and duplicate to the appropriate shards.
Use Hbase indexer to populate Solr.

Properly Size Index:

Understanding what to index typically requires deep business domain expertise on the data.
This yields better indexing plan and increases accuracy for searching data.
Not all data will be indexed but for an organization user have new data,Needs classification of all data untill it is understood what value it brings to the business.
It implies is that data needs to be re-indexed so it is a good practice to store raw data somewhere low cost, often in HDFS or in the cloud object storage.

]]> https://www.wikitechy.com/interview-questions/hive/what-is-best-practice-indexing-hdfs-data-into-solr-using-hive/feed/ 0