What is the difference between Spark and Hadoop MapReduce ?

Answer : Apache Spark is an open-source distributed cluster-computing…

What is Apache Spark

  • Apache Spark is an open-source distributed cluster-computing framework.
  • Spark is a data processing engine developed to provide faster and ease-of-use analytics than Hadoop MapReduce.
  • Before Apache Software Foundation took possession of Spark, it was under the control of University of California, Berkeley’s AMP Lab.

What is Apache Hadoop

  • Apache Hadoop is an open-source framework written in Java that allows us to store and process Big Data in a distributed environment, across various clusters of computers using simple programming constructs.
  • To do this, Hadoop uses an algorithm called MapReduce, which divides the task into small parts and assigns them to a set of computers.
  • Hadoop also has its own file system, Hadoop Distributed File System (HDFS), which is based on the Google File System (GFS).
  • HDFS is designed to run on low-cost hardware.
Memory Let’s save data on memory with
the use of RDD’s.
Does not leverage the memory of the hadoop cluster to maximum.
Disk usage Spark caches data in-memory
and ensures low latency.
MapReduce is disk oriented.
Processing Supports real-time processing through
spark streaming.
Only batch processing is supported
Installation Is not bound to Hadoop. Is bound to hadoop.
Storage Leverage exciting HDFS
Speed 10 – 100X faster. Fast.
Rsource management standalone YARN
Hadoop Vs Spark
Leave a Reply

Your email address will not be published.

You May Also Like