<br />
<b>Warning</b>:  Undefined array key "global_protection_id" in <b>/home/wikitechy/public_html/interview-questions/wp-content/plugins/content-protector/inc/class-ps-rest-handler.php</b> on line <b>51</b><br />
{"id":287,"date":"2021-07-12T18:21:39","date_gmt":"2021-07-12T18:21:39","guid":{"rendered":"https:\/\/www.wikitechy.com\/interview-questions\/?p=287"},"modified":"2021-09-22T05:53:05","modified_gmt":"2021-09-22T05:53:05","slug":"why-do-we-need-data-locality-in-hadoop","status":"publish","type":"post","link":"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/","title":{"rendered":"Why do we need Data Locality in Hadoop ?"},"content":{"rendered":"<div class=\"TextHeading\">\n<div class=\"hddn\">\n<h2 id=\"why-do-we-need-data-locality-in-hadoop\" class=\"color-pink\" style=\"text-align: justify;\">Why do we need Data Locality in Hadoop ?<\/h2>\n<\/div>\n<\/div>\n<div class=\"ImageContent\" style=\"text-align: justify;\">\n<div class=\"hddn\"><img decoding=\"async\" class=\"img-responsive center-block aligncenter\" src=\"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png\" alt=\" Data Locality in Hadoop \" \/><\/div>\n<\/div>\n<div class=\"Content\" style=\"text-align: justify;\">\n<div class=\"hddn\">\n<ul>\n<li>Datasets in\u00a0<a href=\"https:\/\/www.wikitechy.com\/tutorials\/sqoop\/sqoop-vs-hdfs\" target=\"_blank\" rel=\"noopener\">HDFS<\/a>\u00a0store as blocks in DataNodes the Hadoop cluster.<\/li>\n<li>During the execution of a\u00a0<a href=\"https:\/\/www.wikitechy.com\/tutorials\/hive\/hive-mapreduce-hadoop-mapreduce\" target=\"_blank\" rel=\"noopener\">MapReduce<\/a>\u00a0job the individual Mapper processes the blocks (Input Splits).<\/li>\n<li>If the data does not reside in the same node where the Mapper is executing the job, the data needs to be copied from the DataNode over the\u00a0<a href=\"https:\/\/www.wikitechy.com\/errors-and-fixes\/sql\/cluster-network-name-showing-netbios-status-as-the-system-cannot-find-the-file-specified\" target=\"_blank\" rel=\"noopener\">network<\/a>\u00a0to the mapper DataNode.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div class=\"ImageContent\" style=\"text-align: justify;\">\n<div class=\"hddn\"><img decoding=\"async\" class=\"img-responsive center-block aligncenter\" src=\"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/data-locality-in-hadoop.gif\" alt=\"Datasets in HDFS - Data Locality in Hadoop\" \/><\/div>\n<\/div>\n<div class=\"Content\" style=\"text-align: justify;\">\n<div class=\"hddn\">\n<ul>\n<li>Now if a MapReduce job has more than 100 Mapper and each Mapper tries to copy the data from other DataNode in the cluster simultaneously, it would cause serious network congestion which is a big performance issue of the overall system.<\/li>\n<li>Hence, data proximity to the computation is an effective and cost-effective solution which is technically termed as\u00a0<a href=\"https:\/\/www.wikitechy.com\/interview-questions\/hadoop\/what-are-the-features-of-hadoop\/\" target=\"_blank\" rel=\"noopener\">Data locality in Hadoop<\/a>. It helps to increase the overall throughput of the system.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div class=\"ImageContent\" style=\"text-align: justify;\">\n<div class=\"hddn\"><img decoding=\"async\" class=\"img-responsive center-block aligncenter\" src=\"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/mapreduce-job-data-locality.gif\" alt=\" \" \/><\/div>\n<\/div>\n<div class=\"TextHeading\" style=\"text-align: justify;\">\n<div class=\"hddn\">\n<h2 id=\"types-of-data-locality\" class=\"color-green\">Types of data locality<\/h2>\n<\/div>\n<\/div>\n<div class=\"Content\" style=\"text-align: justify;\">\n<div class=\"hddn\">\n<ul>\n<li><b>Data local<\/b>\n<ul>\n<li>In this type data and the mapper resides on the same node. This is the closest proximity of data and the most preferred scenario.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div class=\"Content\" style=\"text-align: justify;\">\n<div class=\"hddn\">\n<ul>\n<li><b>Rack Local<\/b>\n<ul>\n<li>In this type data and the mapper resides on the same node. This is the closest proximity of data and the most preferred scenario.<\/li>\n<li>In this scenarios mapper and data reside on the same rack but on the different data nodes.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div class=\"Content\" style=\"text-align: justify;\">\n<div class=\"hddn\">\n<ul>\n<li><b>Different Rack<\/b>\n<ul>\n<li>In this scenario mapper and data reside on the different racks.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div class=\"ImageContent\">\n<div class=\"hddn\"><img decoding=\"async\" class=\"img-responsive center-block aligncenter\" src=\"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/types-of-data-locality.jpg\" alt=\"Types of data locality\" \/><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Answer : Datasets in HDFS store as blocks in DataNodes&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"passster_activate_protection":false,"passster_protect_child_pages":"","passster_protection_type":"password","passster_password":"","passster_activate_overwrite_defaults":"","passster_headline":"","passster_instruction":"","passster_placeholder":"","passster_button":"","passster_id":"","passster_activate_misc_settings":"","passster_redirect_url":"","passster_hide":"no","passster_area_shortcode":"","gtb_hide_title":false,"gtb_wrap_title":false,"gtb_class_title":"","gtb_remove_headerfooter":false,"footnotes":""},"categories":[1065],"tags":[1467,195,1354,1119,360,1482,1168,203,199,214,1120,1466,1474,1473,1479,1464,1453,1462,1463,1471,1469,1478,205,1350,485,222,484,1481,1468,1169,1483,1316,1480,1375,1348,1465,1165,196,1457,1460,1472,212,1456,1459,286,1461,1477,970,366,288,1470,367,206,975,200,974,197,280,364,1029,1049,1455,1476,1454,1458,1475,1313,1167,968,216,285,1158,1121],"class_list":["post-287","post","type-post","status-publish","format-standard","hentry","category-big-data","tag-3-data-locality","tag-accenture-interview-questions-and-answers","tag-apache-hadoop","tag-att-interview-questions-and-answers","tag-atos-interview-questions-and-answers","tag-azure-hadoop","tag-big-data-hadoop","tag-capgemini-interview-questions-and-answers","tag-casting-networks-india-pvt-limited-interview-questions-and-answers","tag-cgi-group-inc-interview-questions-and-answers","tag-collabera-technologiesinterview-questions-and-answers","tag-data-flow-in-mapreduces","tag-data-locality","tag-data-locality-c","tag-data-locality-definition","tag-data-locality-in-cloud-computing","tag-data-locality-in-hadoop","tag-data-locality-in-spark","tag-data-locality-in-yarn","tag-data-locality-nutanix","tag-data-locality-optimization-in-hadoop","tag-data-localization-in-hadoop","tag-dell-international-services-india-pvt-ltd-interview-questions-and-answers","tag-distributed-file-system","tag-ernst-young-interview-questions-and-answers","tag-flipkart-interview-questions-and-answers","tag-genpact-interview-questions-and-answers","tag-hadoop-cluster","tag-hadoop-data-partitioning","tag-hadoop-database","tag-hadoop-distributed-file-system","tag-hadoop-ecosystem","tag-hadoop-file-system","tag-hadoop-framework","tag-hadoop-mapreduce","tag-hadoop-optimization-techniques","tag-hdfs-architecture","tag-ibm-interview-questions-and-answers","tag-importance-of-data-locality","tag-improving-data-processing-performance-with-hadoop-data-locality","tag-in-the-local-disk-of-the-name-node-the-files-which-are-stored-persistently-are","tag-indecomm-global-services-interview-questions-and-answers","tag-introduction-to-data-locality-in-hadoop-mapreduce","tag-job-scheduling-for-optimizing-data-locality-in-hadoop-clusters","tag-lt-infotech-interview-questions-and-answers","tag-locality-optimization-in-compiler-design","tag-mapreduce-data-locality","tag-mindtree-interview-questions-and-answers","tag-netapp-interview-questions-and-answers","tag-r-systems-interview-questions-and-answers","tag-rack-awareness-in-hadoop","tag-rbs-india-development-centre-pvt-ltd-interview-questions-and-answers","tag-sap-labs-india-pvt-ltd-interview-questions-and-answers","tag-tata-consultancy-service-interview-questions-and-answers","tag-tech-mahindra-interview-questions-and-answers","tag-trigent-software-interview-questions-and-answers","tag-unitedhealth-group-interview-questions-and-answers","tag-virtusa-consulting-services-pvt-ltd-interview-questions-and-answers","tag-wells-fargo-interview-questions-and-answers","tag-what-is-big-data-and-hadoop","tag-what-is-big-data-hadoop","tag-what-is-data-locality","tag-what-is-data-locality-in-hadoop","tag-what-is-data-locality-in-hadoopwhat-does-the-term-data-locality-mean-in-hadoop","tag-what-is-data-locality-optimization-in-hadoop","tag-what-is-data-localization-in-hadoop","tag-what-is-hadoop","tag-what-is-hadoop-used-for","tag-wipro-infotech-interview-questions-and-answers","tag-wipro-interview-questions-and-answers","tag-xoriant-solutions-pvt-ltd-interview-questions-and-answers","tag-yarn-hadoop","tag-zs-associates-interview-questions-and-answers"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Why do we need Data Locality in Hadoop ? - Big Data<\/title>\n<meta name=\"description\" content=\"Why do we need Data Locality in Hadoop ? - Big Data - Datasets in HDFS store as blocks in Data Nodes the Hadoop cluster.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Why do we need Data Locality in Hadoop ? - Big Data\" \/>\n<meta property=\"og:description\" content=\"Why do we need Data Locality in Hadoop ? - Big Data - Datasets in HDFS store as blocks in Data Nodes the Hadoop cluster.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/\" \/>\n<meta property=\"og:site_name\" content=\"Wikitechy\" \/>\n<meta property=\"article:published_time\" content=\"2021-07-12T18:21:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-09-22T05:53:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png\" \/>\n<meta name=\"author\" content=\"Editor\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Editor\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/\",\"url\":\"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/\",\"name\":\"Why do we need Data Locality in Hadoop ? - Big Data\",\"isPartOf\":{\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png\",\"datePublished\":\"2021-07-12T18:21:39+00:00\",\"dateModified\":\"2021-09-22T05:53:05+00:00\",\"author\":{\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/#\/schema\/person\/4d5a581fb5470d1560324bddc5e8b757\"},\"description\":\"Why do we need Data Locality in Hadoop ? - Big Data - Datasets in HDFS store as blocks in Data Nodes the Hadoop cluster.\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/#primaryimage\",\"url\":\"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png\",\"contentUrl\":\"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/#website\",\"url\":\"https:\/\/www.wikitechy.com\/interview-questions\/\",\"name\":\"Wikitechy\",\"description\":\"Interview Questions\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.wikitechy.com\/interview-questions\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/#\/schema\/person\/4d5a581fb5470d1560324bddc5e8b757\",\"name\":\"Editor\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.wikitechy.com\/interview-questions\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e9531079fe7e07841b7b156c04d65e5f39d4adfd18b6ffe3edfff8ca5aab85b5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e9531079fe7e07841b7b156c04d65e5f39d4adfd18b6ffe3edfff8ca5aab85b5?s=96&d=mm&r=g\",\"caption\":\"Editor\"},\"url\":\"https:\/\/www.wikitechy.com\/interview-questions\/author\/editor\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Why do we need Data Locality in Hadoop ? - Big Data","description":"Why do we need Data Locality in Hadoop ? - Big Data - Datasets in HDFS store as blocks in Data Nodes the Hadoop cluster.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/","og_locale":"en_US","og_type":"article","og_title":"Why do we need Data Locality in Hadoop ? - Big Data","og_description":"Why do we need Data Locality in Hadoop ? - Big Data - Datasets in HDFS store as blocks in Data Nodes the Hadoop cluster.","og_url":"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/","og_site_name":"Wikitechy","article_published_time":"2021-07-12T18:21:39+00:00","article_modified_time":"2021-09-22T05:53:05+00:00","og_image":[{"url":"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png"}],"author":"Editor","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Editor","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/","url":"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/","name":"Why do we need Data Locality in Hadoop ? - Big Data","isPartOf":{"@id":"https:\/\/www.wikitechy.com\/interview-questions\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/#primaryimage"},"image":{"@id":"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png","datePublished":"2021-07-12T18:21:39+00:00","dateModified":"2021-09-22T05:53:05+00:00","author":{"@id":"https:\/\/www.wikitechy.com\/interview-questions\/#\/schema\/person\/4d5a581fb5470d1560324bddc5e8b757"},"description":"Why do we need Data Locality in Hadoop ? - Big Data - Datasets in HDFS store as blocks in Data Nodes the Hadoop cluster.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.wikitechy.com\/interview-questions\/big-data\/why-do-we-need-data-locality-in-hadoop\/#primaryimage","url":"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png","contentUrl":"https:\/\/cdn.wikitechy.com\/interview-questions\/hadoop\/why-we-need-data-locality-in-hadoop.png"},{"@type":"WebSite","@id":"https:\/\/www.wikitechy.com\/interview-questions\/#website","url":"https:\/\/www.wikitechy.com\/interview-questions\/","name":"Wikitechy","description":"Interview Questions","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.wikitechy.com\/interview-questions\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.wikitechy.com\/interview-questions\/#\/schema\/person\/4d5a581fb5470d1560324bddc5e8b757","name":"Editor","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.wikitechy.com\/interview-questions\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e9531079fe7e07841b7b156c04d65e5f39d4adfd18b6ffe3edfff8ca5aab85b5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e9531079fe7e07841b7b156c04d65e5f39d4adfd18b6ffe3edfff8ca5aab85b5?s=96&d=mm&r=g","caption":"Editor"},"url":"https:\/\/www.wikitechy.com\/interview-questions\/author\/editor\/"}]}},"_links":{"self":[{"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/posts\/287","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/comments?post=287"}],"version-history":[{"count":5,"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/posts\/287\/revisions"}],"predecessor-version":[{"id":3858,"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/posts\/287\/revisions\/3858"}],"wp:attachment":[{"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/media?parent=287"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/categories?post=287"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wikitechy.com\/interview-questions\/wp-json\/wp\/v2\/tags?post=287"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}