apache hive - Hive Introduction - hive tutorial - hadoop hive - hadoop hive - hiveql

The Big Data is a largest collection of datasets that include huge volume, high velocity, and a variety of data
It is difficult to process Big Data while using traditional data Management Systems
Hadoop is a framework which was used to solve Big Data management challenges and it was introduced by Apache Software Foundation.
Hence Big Data includes a data warehouse project which is called Hadoop and hence it is used to solve Big Data Processing Challenges.

learn hive - hive tutorial - apache hive - big data history - hive examples

what is Hadoop

Hadoop is an open-source context and hence it process Big Data which is used in a distributed storage
It contains two modules which is used in Hadoop and they are MapReduce and Hadoop Distributed File System (HDFS).

learn hive - hive tutorial - apache hive - hadoop - hive examples

Learn hive - hive Tutorial - - hive - hive Examples

What is MapReduce : It is a parallel programming model for processing and producing large amounts of organized data and large datasets on clusters of parallel distributed algorithm.

learn hive - hive tutorial - apache hive - big data and Hadoop mapreduce progamming

What is HDFS : Hadoop Distributed File System is a part of Hadoop framework, used to store and process the datasets. It provides a fault-tolerant file system to run on produce hardware.

learn hive - hive tutorial - apache hive Hadoop mapreduce progamming - hive examples

The Hadoop ecosystem contains three different types of sub-projects (tools) such as Sqoop, Pig, and Hive which are used to help Hadoop components.

What is Sqoop : It is used to import and export data from relational databases and also between HDFS and RDBMS.
What is Pig : It is a procedural language platform which is used to develop a script for MapReduce actions.
What is Hive : It is a data warehouse software project which is used to develop SQL type scripts to do MapReduce actions.

learn hive - hive tutorial - apache hive - big data and Hadoop mapreduce progamming - hive examples

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

What is Hive ?

Hive is a data warehouse infrastructure tool built on the top of the Hadoop to process structured data. Hence, it summarize Big Data, and makes enquiring and studying large amount of data
Hive was developed by Facebook, and later it was developed by the Apache Software Foundation as an open source Apache Hive. Hence Hive was used by different companies.

learn hive - hive tutorial - hive shell command - hive programs - hive examples

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Hive is not

A relational database which is used to collect large amount of data
A project which is done for OnLine Transaction Processing (OLTP)
A language which is castoff for real-time queries and row-level updates

Features of Hive

It stores schema in a database and it also processed data into HDFS(Hadoop Disturbed File System).

learn hive - hive tutorial - apache hive - hive wiki - hive examples

It is designed for OLAP(Online Analytic Processing).
It provides SQL type language for querying data which is called HiveQL or HQL(Hive Query Language
It is familiar, fast, mountable, and extensible.

learn hive - hive tutorial - apache hive - hive usage - hive examples

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Architecture of Hive

The following diagram is a component diagram specifies the architecture of Hive:

learn hive tutorial - hive architecture - hive example

apache hive - learn hive - hive tutorial - hive architecture - hive example

This component diagram contains different units. The following table describes each unit:

Unit Name	Operation
User Interface	Hive is a data warehouse infrastructure software that can create interaction between user and HDFS. The user interfaces that Hive supports are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server).
Meta Store	Hive chooses respective database servers to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping.
HiveQL Process Engine	HiveQL is similar to SQL for querying on schema info on the Metastore. It is one of the replacements of traditional approach for MapReduce program. Instead of writing MapReduce program in Java, we can write a query for MapReduce job and process it.
Execution Engine	The conjunction part of HiveQL process Engine and MapReduce is Hive Execution Engine. Execution engine processes the query and generates results as same as MapReduce results. It uses the flavor of MapReduce.
HDFS or HBASE	Hadoop distributed file system or HBASE are the data storage techniques to store data into file system.

learn hive - hive tutorial - apache hive components hive programs - hive examples

Working of Hive

The following diagram give us a specification about the workflow which is done between Hive and Hadoop.

learn hive tutorial -hive workflow- hive example

apache hive - learn hive - hive tutorial -hive workflow- hive example

The following table defines how Hive interacts with Hadoop framework:

Step No.	Operation
1	Execute Query The Hive interface such as Command Line or Web UI sends query to Driver (any database driver such as JDBC, ODBC, etc.) to execute.
2	Get Plan The driver takes the help of query compiler that parses the query to check the syntax and query plan or the requirement of query.
3	Get Metadata The compiler sends metadata request to Metastore (any database).
4	Send Metadata Metastore sends metadata as a response to the compiler.
5	Send Plan The compiler checks the requirement and resends the plan to the driver. Up to here, the parsing and compiling of a query is complete.
6	Execute Plan The driver sends the execute plan to the execution engine.
7	Execute Job Internally, the process of execution job is a MapReduce job. The execution engine sends the job to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data node. Here, the query executes MapReduce job.
7.1	Metadata Ops Meanwhile in execution, the execution engine can execute metadata operations with Metastore.
8	Fetch Result The execution engine receives the results from Data nodes.
9	Send Results The execution engine sends those resultant values to the driver.
10	Send Results The driver sends the results to Hive Interfaces.

learn hive - hive tutorial - apache hive spark impala - hive programs - hive examples

ACID Properties

Atomicity:
      – Partition loads are atomic through directory renames in HDFS.
Consistency:
     – Ensured by HDFS. All nodes see the same partitions at all times.
     – Immutable data = no update or delete consistency issues.
Isolation:
     – Read committed with an exception for partition deletes.
     – Partitions can be deleted during queries. New partitions will not be seen by jobs started before the partition add.
Durability:
     – Data is durable in HDFS before partition exposed to Hive.

apache hive - Hive Introduction - hive tutorial - hadoop hive - hadoop hive - hiveql

What is big data?

what is Hadoop

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

What is Hive ?

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Hive is not

Features of Hive

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Architecture of Hive

Working of Hive

ACID Properties

Related Searches to HIVE INTRODUCTION

Wikitechy

Workshop

Join our Community

Other Languages