Difference between Pig, Hive and HBase

Pig Hive Hbase
It is used for semi structured data. Hive is query engine HBase is a data storage particularly for unstructured data.
Pig Hadoop Component is generally
used by Researchers and Programmers.
Apache Hive is mainly used for
batch processing i.e. OLAP
and creating reports.
HBase is extensively used for transactional processing where in the response time of the query is not highly interactive i.e. OLTP
Pig Hadoop Component operates on
the client side of any cluster.
Hive Hadoop Component operates on
the server side of any cluster.
Operations in HBase are run in real-time on the database
Avro supported for Pig. Hive does not support Avro. The client which is reading/writing the data has to deal with the avro schemas, after HBase delivered the raw data to it.
Pig Hadoop is a great ETL tool
for big data because of its powerful
transformation and processing
capabilities.
Hive Hadoop Component is helpful
for ETL.
Hbase Component is helpful for ETL.
Pig are high-level languages that
compile to MapReduce.
Hive is also a high-level languages
that compile to MapReduce.
HBase allows Hadoop to support the transactions on key value pairs.
Pig is also SQL-like but varies to
a great extent and thus it will
take some time efforts to master Pig.
Hive directly leverages SQL expertise
and thus can be learnt easily.
HBase allows you to do quick random versus scan all of data sequentially, do insert/update/delete from middle, and not just add/append.

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,