What is the Relationship Between Hadoop Hbase and Hive ?

Hadoop HBase And Hive

Hadoop is an open source software stack that runs on a cluster of machines.
Hadoop provides distributed storage and distributed process for very big data sets.

It has following 2 core components:

Hadoop Distributed file system or HDFS is a Java based distributed file system that enables us to store big data across multiple nodes in a Hadoop cluster.
So, if you install Hadoop, you will get HDFS as an underlying storage system for storing the big data sets in the distributed environment.

MapReduce is a programming framework for writing applications that method massive amounts of structured and unstructured data in parallel across a cluster of thousands of machines, in a reliable, fault-tolerant manner.

Apache HBase is Hadoop database, a distributed, scalable, column oriented big data store.
HBase is built on top of HDFS means that data you store in HBase is stored in HDFS itself.

Hive is an important tool for Hadoop ecosystem it provides an SQL for querying data in HDFS, other file systems that integrate with Hadoop like MapR-FS and Amazon’s S3 and databases like HBase(the Hadoop database) and Cassandra.
Hive too like HBase stores data into HDFS but it uses MapReduce too. It compiles queries into MapReduce jobs and runs them on the cluster. It was the primary abstraction engines to be built on top of MapReduce. Hive needs a metastore(JDBC compliant RDBMS) to store its metadata.

Categorized in:

Tagged in:

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.