Difference between Pig and Sqoop in Hadoop
|Apache Pig is a tool for analytics which is used to
analyze data stored in HDFS.
|Apache Sqoop is a tool to importing structured data from RDBMS to HDFS or exporting data from HDFS to RDBMS.|
|We can import the data from Sql databases into
hive rather than NoSql Databases.
|It can integrate with any external data sources with HDFS i.e Sql , NoSql and Data warehouses as well using this tool at the same time we export it as well since this can be used as bi-directional ways|
|Pig can be used for following purposes ETL
data pipeline, Research on raw data.
|Important Sqoop control commands to import RDBMS data are Append, Columns and Where|
|The pig Metastore stores all info about the tables.
And we can execute spark sql queries because spark
can interact with pig Metastore.
|Sqoop metastore is a tool for using hosts in a shared metadata repository. Multiple users and remote users can define and execute saved jobs defined in metastore.|
|The scalar data types in pig are int, float,
double, long, chararray, and bytearray. The complex data
types in Pig are map, tuple, and bag.
|It basically converts CHAR(x), VARCHAR(x), NUMERIC(x,y) to string (with lengh 32767), and it converts DATETIME to BIGINT.|