apache hive - Hive Introduction - hive tutorial - hadoop hive - hadoop hive - hiveql



What is big data?

  • The Big Data is a largest collection of datasets that include huge volume, high velocity, and a variety of data
  • It is difficult to process Big Data while using traditional data Management Systems
  • Hadoop is a framework which was used to solve Big Data management challenges and it was introduced by Apache Software Foundation.
  • Hence Big Data includes a data warehouse project which is called Hadoop and hence it is used to solve Big Data Processing Challenges.
learn hive - hive tutorial - apache hive - big data history -  hive examples

learn hive - hive tutorial - apache hive - big data history - hive examples

what is Hadoop

  • Hadoop is an open-source context and hence it process Big Data which is used in a distributed storage
  • It contains two modules which is used in Hadoop and they are MapReduce and Hadoop Distributed File System (HDFS).
  • learn hive - hive tutorial - apache hive - hadoop -  hive examples

    Learn hive - hive Tutorial - - hive - hive Examples

  • What is MapReduce : It is a parallel programming model for processing and producing large amounts of organized data and large datasets on clusters of parallel distributed algorithm.
learn hive - hive tutorial - apache hive - big data and Hadoop mapreduce progamming -  hive examples

learn hive - hive tutorial - apache hive - big data and Hadoop mapreduce progamming

  • What is HDFS : Hadoop Distributed File System is a part of Hadoop framework, used to store and process the datasets. It provides a fault-tolerant file system to run on produce hardware.
learn hive - hive tutorial - apache hive Hadoop mapreduce progamming -  hive examples

learn hive - hive tutorial - apache hive Hadoop mapreduce progamming - hive examples

The Hadoop ecosystem contains three different types of sub-projects (tools) such as Sqoop, Pig, and Hive which are used to help Hadoop components.

  • What is Sqoop : It is used to import and export data from relational databases and also between HDFS and RDBMS.
  • What is Pig : It is a procedural language platform which is used to develop a script for MapReduce actions.
  • What is Hive : It is a data warehouse software project which is used to develop SQL type scripts to do MapReduce actions.
learn hive - hive tutorial - apache hive - big data and Hadoop mapreduce progamming -  hive examples

learn hive - hive tutorial - apache hive - big data and Hadoop mapreduce progamming - hive examples

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

What is Hive ?

  • Hive is a data warehouse infrastructure tool built on the top of the Hadoop to process structured data. Hence, it summarize Big Data, and makes enquiring and studying large amount of data
  • Hive was developed by Facebook, and later it was developed by the Apache Software Foundation as an open source Apache Hive. Hence Hive was used by different companies.
  • learn hive - hive tutorial - hive  shell command -  hive programs -  hive examples

    learn hive - hive tutorial - hive shell command - hive programs - hive examples

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Hive is not

  • A relational database which is used to collect large amount of data
  • A project which is done for OnLine Transaction Processing (OLTP)
  • A language which is castoff for real-time queries and row-level updates

Features of Hive

  • It stores schema in a database and it also processed data into HDFS(Hadoop Disturbed File System).
  • learn hive - hive tutorial - apache hive - hive wiki -  hive examples

    learn hive - hive tutorial - apache hive - hive wiki - hive examples

  • It is designed for OLAP(Online Analytic Processing).
  • It provides SQL type language for querying data which is called HiveQL or HQL(Hive Query Language
  • It is familiar, fast, mountable, and extensible.
learn hive - hive tutorial - apache hive - hive usage -  hive examples

learn hive - hive tutorial - apache hive - hive usage - hive examples

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Architecture of Hive

  • The following diagram is a component diagram specifies the architecture of Hive:
 learn hive tutorial - hive architecture - hive example

apache hive - learn hive - hive tutorial - hive architecture - hive example

This component diagram contains different units. The following table describes each unit:

Unit Name Operation
User Interface Hive is a data warehouse infrastructure software that can create interaction between user and HDFS. The user interfaces that Hive supports are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server).
Meta Store Hive chooses respective database servers to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping.
HiveQL Process Engine HiveQL is similar to SQL for querying on schema info on the Metastore. It is one of the replacements of traditional approach for MapReduce program. Instead of writing MapReduce program in Java, we can write a query for MapReduce job and process it.
Execution Engine The conjunction part of HiveQL process Engine and MapReduce is Hive Execution Engine. Execution engine processes the query and generates results as same as MapReduce results. It uses the flavor of MapReduce.
HDFS or HBASE Hadoop distributed file system or HBASE are the data storage techniques to store data into file system.
learn hive - hive tutorial - apache hive components  hive programs -  hive examples

learn hive - hive tutorial - apache hive components hive programs - hive examples

Working of Hive

The following diagram give us a specification about the workflow which is done between Hive and Hadoop.

 learn hive tutorial -hive workflow- hive example

apache hive - learn hive - hive tutorial -hive workflow- hive example

The following table defines how Hive interacts with Hadoop framework:

Step No. Operation
1 Execute Query

The Hive interface such as Command Line or Web UI sends query to Driver (any database driver such as JDBC, ODBC, etc.) to execute.

2 Get Plan

The driver takes the help of query compiler that parses the query to check the syntax and query plan or the requirement of query.

3 Get Metadata

The compiler sends metadata request to Metastore (any database).

4 Send Metadata

Metastore sends metadata as a response to the compiler.

5 Send Plan

The compiler checks the requirement and resends the plan to the driver. Up to here, the parsing and compiling of a query is complete.

6 Execute Plan

The driver sends the execute plan to the execution engine.

7 Execute Job

Internally, the process of execution job is a MapReduce job. The execution engine sends the job to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data node. Here, the query executes MapReduce job.

7.1 Metadata Ops

Meanwhile in execution, the execution engine can execute metadata operations with Metastore.

8 Fetch Result

The execution engine receives the results from Data nodes.

9 Send Results

The execution engine sends those resultant values to the driver.

10 Send Results

The driver sends the results to Hive Interfaces.

learn hive - hive tutorial - apache hive spark impala -  hive programs -  hive examples

learn hive - hive tutorial - apache hive spark impala - hive programs - hive examples

ACID Properties

  • Atomicity:
            – Partition loads are atomic through directory renames in HDFS.
    Consistency:
           – Ensured by HDFS. All nodes see the same partitions at all times.
           – Immutable data = no update or delete consistency issues.
    Isolation:
           – Read committed with an exception for partition deletes.
           – Partitions can be deleted during queries. New partitions will not be seen by jobs started before the partition add.
    Durability:
           – Data is durable in HDFS before partition exposed to Hive.

  • Wikitechy Apache Hive tutorials provides you the base of all the following topics . Enjoy learning on big data , hadoop , data analytics , big data analytics , mapreduce , hadoop tutorial , what is hadoop , big data hadoop , apache hadoop , apache hive , hadoop wiki , hadoop jobs , hadoop training , hive tutorial , hadoop big data , hadoop architecture , hadoop certification , hadoop ecosystem , hadoop fs , apache pig , hadoop cluster , cloudera hadoop , hadoop download , hadoop mapreduce , hadoop workflow , hive data types , hadoop hive , pig hadoop , hadoop administration , hadoop installation , hive hadoop , learn hadoop , hadoop for dummies , hadoop commands , hive definition , hiveql , learnhive , hive sql , hive database , hive date functions , hive query , apache hive tutorial , hive apache , hive wiki , what is a hive , hive big data , programming hive , what is hive in hadoop , hive documentation , how does hive work

    Related Searches to HIVE INTRODUCTION