Difference between Apache Hive and Apache Spark

apache-hive-and-apache-spark
Apache-Hive Apache-Spark
Apache hive is a distributed query
engine built on top of the hadoop eco-system.
Hive is introduced by facebook and later
on they open sourced it.
Apache spark may be a cluster computing framework that runs on Hadoop and handles differing kinds of data.
The language used by hive to write queries is
HQL i.e., Hive+SQL.
Spark has rich resources for handling the data and most importantly, it is 10-20x faster than Hadoop’s MapReduce.
These hive queries can internally get converted
into MapReduce job and this is fully handled
by the Hadoop framework reducing the
work load on the developers.
It attains this speed of computation by its in-memory primitives. The data is cached and is present in the memory (RAM) and performs all the computations in-memory.
Tools to alter easy access to data via SQL,
so enabling data warehousing tasks such as
extract/transform/load (ETL), reporting,
and data analysis.
It is fast and grouping the computing system. It provides high-level in Java, Scala, Python,R, and an optimized engine that supports execution graphs.
Access files to stored either directly
in Apache HDFS or such as Apache HBase
Spark structured query language is a Spark module for structured data processing, in which in-memory processing is its core.

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,