sqoop - Creating Hive Partitioned Tables Using Sqoop - apache sqoop - sqoop tutorial - sqoop hadoop



How to create Hive partitioned tables using Sqoop?

  • Sqoop is used to bring data from RDBMS, but there is a limitation of sqoop is that data which is stored in HDFS is stored in one folder.
  • If there is a partitioned table needs to be created in Hive for further queries, then the users need to create Hive script to distribute data to the appropriate partitions.
  • Hence there is no direct option of creating partition tables which is done based on Hive which is done directly from sqoop.
  • We can use sqoop features of putting the output in a specific directory to simulate a partitioned table structure which is done in HDFS.
  • If any partitioned table have a HDFS structure where each partition is <table name>/<partition column name=value> hence, we can use the following sqoop structure to select appropriate data which is done for each partition and we need to move it to the correct HDFS structure.

Syntax:

sqoop --table <table1> --where <where clause for pt=0>--target-dir /home/user1/table1/pt=0 
sqoop --table <table1> --where <where clause for pt=1> --target-dir /home/user1/table1/pt=1
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
  • Now, we need an external HIVE table which can be created which is pointing to /home/user1/table1 directory with which is done with partition column as pt.
CREATE EXTERNAL TABLE <hive_table_name>
--Column definitions---
PARTITIONED BY (pt string)
LOCATION '/home/user1/table1'
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team

Example

sqoop to hive partition

Step 1

  • create table in mysql with 4 fields (id,name,age,sex)
CREATE TABLE mon2 ( id int, name varchar(43) , age int, sex varchar(334))
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team

Step 2

  • Insert data into mysql table using csv abc.csv
001,Anu,23,f
002,Preethi,32,f
003,Vincent,43,m
004,Barbie,23,f
005,Sandy,32,m
006,Arnold,43,m
mysql> source location_of_your_csv/abc.csv
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team

Advantages

  • It is said to be Independent of the source table partition structure. Hence the Source table may or may not be partitioned.
  • It can be extended to use cases where the Hive partitioning is done based on the multiple columns.
  • Hive table partitioning scheme which is given can be different from the source table partitioning scheme.
  • Multiple sqoop are required the Hive table creation script which can be combined in one script to allow creation of any Hive partitioned table which is done from RDBMS.
  • It will allow us to get data to HDFS in a structure which is appropriate for creating partitioned HIVE table.

Related Searches to Creating Hive Partitioned Tables Using Sqoop