Sqoop Metastore:

  • Sqoop metastore is an internally implemented database that can hold the definition of a given job.
  • It includes details like which database,which table,which approach,what query to extract from,what target to write to and so on.
  • Consider an array of SQL databases that have the same structure, but are partitioned by for performance reasons.
  • Thus we have the same sqoop job running against multiple database instances at the same time, all unified at a single table in Hadoop.
  • In order to get parallelization, we generate a job that manages the Sqoop sub-workflows. This results in 500 distinct sqoop jobs.
  • Metastore is used to handle these complexities.
overview of sqoop metastore

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,