What is the difference between Pig and Sqoop in Hadoop ?
Difference between Pig and Sqoop in Hadoop
| Pig | Sqoop |
|---|---|
| Apache Pig is a tool for analytics which is used to analyze data stored in HDFS. |
Apache Sqoop is a tool to importing structured data from RDBMS to HDFS or exporting data from HDFS to RDBMS. |
| We can import the data from Sql databases into hive rather than NoSql Databases. |
It can integrate with any external data sources with HDFS i.e Sql , NoSql and Data warehouses as well using this tool at the same time we export it as well since this can be used as bi-directional ways |
| Pig can be used for following purposes ETL data pipeline, Research on raw data. |
Important Sqoop control commands to import RDBMS data are Append, Columns and Where |
| The pig Metastore stores all info about the tables. And we can execute spark sql queries because spark can interact with pig Metastore. |
Sqoop metastore is a tool for using hosts in a shared metadata repository. Multiple users and remote users can define and execute saved jobs defined in metastore. |
| The scalar data types in pig are int, float, double, long, chararray, and bytearray. The complex data types in Pig are map, tuple, and bag. |
It basically converts CHAR(x), VARCHAR(x), NUMERIC(x,y) to string (with lengh 32767), and it converts DATETIME to BIGINT. |
