sqoop - Sqoop Metastore - apache sqoop - sqoop tutorial - sqoop hadoop



What is Sqoop metastore?

  • Sqoop metastore is used to store Sqoop job information in a central place.
  • The sqoop metastore helps collaboration between Sqoop users and developers; for example, user A can create a job to load some specific data, then any other user can access from any node in the cluster the same job and just run it again.

Purpose of spoop metastore:

  • The metastore tool configures Sqoop to host a shared metadata repository. Multiple users and/or remote users can define and execute saved jobs (created with sqoop job) defined in this metastore.
  • Clients must be configured to connect to the metastore in sqoop-site.xml or with the --meta-connect argument.

Syntax

$ sqoop metastore (generic-args) (metastore-args)
$ sqoop-metastore (generic-args) (metastore-args)
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
  • Although the Hadoop generic arguments must preceed any metastore arguments, the metastore arguments can be entered in any order with respect to one another.

How to set up Sqoop metastore

Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation

Environment

Product Version
Pivotal HD / HDP All supported versions
OS All supported versions

Purpose of setup sqoop metastore:

  • The sqoop metastore shows how to set up a Sqoop metastore.
  • Sqoop metastore is used to store Sqoop job information in a central place.
  • This helps collaboration between Sqoop users and developers; for example, user A can create a job to load some specific data, then any other user can access from any node in the cluster the same job and just run it again.
  • It’s very comfortable when using Sqoop in Oozie workflows.

Procedure

At a high level, the below steps were followed:

  • Choose a server to host Sqoop metastore. It is best to choose a master or administrative server
  • Setup Sqoop metastore
  • Update the service configuration to access the meta store automatically
  • Start the Sqoop metastore

Step 1: Chose the right server

  • It is strongly recommended to choose a master or administrative server. Slave nodes are not recommended because they are expected to be under heavy load and to fail at some point. Colocating Sqoop meta store with Ambari server is acceptable.

Step 2: Set up Sqoop metastore

  • Here you need to decide which user will execute the metastore. It is recommended to run the metastore as sqoop user; it is strongly discouraged to run as root. Once you have decided which user will run the metastore, the next step is to create the user and the home directory (if needed), and a folder to store the database (DB) information.
  • The next step is to configure the metastore details in sqoop-site.xml; the relevant properties to be set up are sqoop.metastore.server.location, for example: /home/sqoop/meta-store/shared.db
  • The other configuration property to set is sqoop.metastore.server.port; we can leave the default 16000.
  • For the client properties, we need to set the following properties:

For the client properties, we need to set the following properties:

  • sqoop.metastore.client.autoconnect.url
  • sqoop.metastore.client.autoconnect.username
  • sqoop.metastore.client.autoconnect.password
  • The auto-connect URL is a connect string for an HSQL DB with the following format:
  jdbc:hsqldb:hsql://<hostname_fqdn>:<port>/sqoop
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
  • Where hostname_fqdn is the hostname with domain from the host chosen in step 1; and port is the port we set in the previous step, by default 16000. An example for this is shown here:
jdbc:hsqldb:hsql://hdw1.hdp.local:16000/sqoop
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
  • The username and password, we can leave the defaults.

Step 3: Update service configuration

  • It is not possible to use Ambari to configure these settings; we have to update the files manually in the old way.

Log on to another node in the cluster and update the properties for client access:

  • sqoop.metastore.client.autoconnect.url
  • sqoop.metastore.client.autoconnect.username
  • sqoop.metastore.client.autoconnect.password
  • Do not setup the properties for server configuration. The properties sqoop.metastore.server.location and sqoop.metastore.server.port should be set only in the node running Sqoop metastore.
  • Copy this new sqoop-site.xml file to all other nodes except the Sqoop metastore server.

Step 4: Start Sqoop metastore

  • Now we can run sudo -u sqoop sqoop-metastore to test that the server comes up successfully. Once the server comes up, it binds to standard output and remains as a foreground process.
  • This is undesirable for a server process, so now we have to start and leave the server process running in the background. There are many ways to achieve this, all of them are correct. The one we recommend is the following:
  • Log on as the user who will run the metastore: su - sqoop
  • Enter in the metastore folder
  • Start the server process, redirect stdout and stderr to a file and leave it in the background: nohup sqoop-metastore &>> shared.db.out &
  • If at any point you want to shut down the metastore gracefully, use sqoop-metastore --shutdown as the user running the process

Related Searches to Sqoop Metastore