Difference between hive and pig

Hive pig
Hive is used by data analysts. Pig Hadoop Component is generally used by Researchers and Programmers.
Hive is used in structured Data Pig Hadoop Component is used for semi structured data.
Hive Hadoop Component has a
declarative SQL language (HiveQL).
Pig has a procedural data flow language (Pig Latin).
Hive uses thrift based server that send queries
and corner directly to the Hive
server which execute them.
This feature is not available with Pig.
Hive directly leverages SQL expertise
and thus can be learnt easily.
Pig is also SQL but varies to a great extent and it will take some time efforts to master Pig.
Hive not support in Avro. Pig supports in Avro.
Hive Hadoop Component operates
on the server side of any cluster.
Pig Hadoop Component operates on the client side of any cluster.
Hive Hadoop Component is mainly
used for creating reports.
Pig Hadoop Component is mainly used for programming.
Hive helpful for ETL. Pig is a great (Extract, Transform and Load) tool for
big data its powerful transformation and processing capabilities.
Hive makes use of exact variation of the
SQL DLL language by defining the tables beforehand
and storing the schema details in any local database.
In Pig there is no dedicated metadata database and the schemas or data types will be defined in script itself.
The Hive has a provision for partitions so that can
process the subset of data by date
or in an alphabetical order.
Pig Hadoop component does not have any notion for partitions though might be one can achieve this through filters.
It renders users with sample data for each
scenario and each step through
its “Illustrate” function.
This feature is not incorporated with the Hive Hadoop Component.

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,