What is the difference between Pig, Hive and HBase ?
Difference between Pig, Hive and HBase
| Pig | Hive | Hbase |
|---|---|---|
| It is used for semi structured data. | Hive is query engine | HBase is a data storage particularly for unstructured data. |
| Pig Hadoop Component is generally used by Researchers and Programmers. |
Apache Hive is mainly used for batch processing i.e. OLAP and creating reports. |
HBase is extensively used for transactional processing where in the response time of the query is not highly interactive i.e. OLTP |
| Pig Hadoop Component operates on the client side of any cluster. |
Hive Hadoop Component operates on the server side of any cluster. |
Operations in HBase are run in real-time on the database |
| Avro supported for Pig. | Hive does not support Avro. | The client which is reading/writing the data has to deal with the avro schemas, after HBase delivered the raw data to it. |
| Pig Hadoop is a great ETL tool for big data because of its powerful transformation and processing capabilities. |
Hive Hadoop Component is helpful for ETL. |
Hbase Component is helpful for ETL. |
| Pig are high-level languages that compile to MapReduce. |
Hive is also a high-level languages that compile to MapReduce. |
HBase allows Hadoop to support the transactions on key value pairs. |
| Pig is also SQL-like but varies to a great extent and thus it will take some time efforts to master Pig. |
Hive directly leverages SQL expertise and thus can be learnt easily. |
HBase allows you to do quick random versus scan all of data sequentially, do insert/update/delete from middle, and not just add/append. |
