locality optimization in compiler design

locality optimization in compiler design - Wikitechy https://www.wikitechy.com/interview-questions/tag/locality-optimization-in-compiler-design/ Interview Questions Wed, 22 Sep 2021 05:53:05 +0000 en-US hourly 1 https://wordpress.org/?v=6.9 https://www.wikitechy.com/interview-questions/wp-content/uploads/2025/10/cropped-wikitechy-icon-32x32.png locality optimization in compiler design - Wikitechy https://www.wikitechy.com/interview-questions/tag/locality-optimization-in-compiler-design/ 32 32 Why do we need Data Locality in Hadoop ? https://www.wikitechy.com/interview-questions/big-data/why-do-we-need-data-locality-in-hadoop/ https://www.wikitechy.com/interview-questions/big-data/why-do-we-need-data-locality-in-hadoop/#respond Mon, 12 Jul 2021 18:21:39 +0000 https://www.wikitechy.com/interview-questions/?p=287

Why do we need Data Locality in Hadoop ?

Datasets in HDFS store as blocks in DataNodes the Hadoop cluster.
During the execution of a MapReduce job the individual Mapper processes the blocks (Input Splits).
If the data does not reside in the same node where the Mapper is executing the job, the data needs to be copied from the DataNode over the network to the mapper DataNode.

Datasets in HDFS - Data Locality in Hadoop

Now if a MapReduce job has more than 100 Mapper and each Mapper tries to copy the data from other DataNode in the cluster simultaneously, it would cause serious network congestion which is a big performance issue of the overall system.
Hence, data proximity to the computation is an effective and cost-effective solution which is technically termed as Data locality in Hadoop. It helps to increase the overall throughput of the system.

Types of data locality

Data local
- In this type data and the mapper resides on the same node. This is the closest proximity of data and the most preferred scenario.

Rack Local
- In this type data and the mapper resides on the same node. This is the closest proximity of data and the most preferred scenario.
- In this scenarios mapper and data reside on the same rack but on the different data nodes.

Different Rack
- In this scenario mapper and data reside on the different racks.

]]> https://www.wikitechy.com/interview-questions/big-data/why-do-we-need-data-locality-in-hadoop/feed/ 0