Top Hadoop Interview Questions With Answers Part 1
1) What is Hadoop Map Reduce?
- For processing large data sets in parallel across a hadoop cluster, Hadoop Map Reduce framework is used.
- Data analysis uses a two-step map and reduce process.
2) How Hadoop Map Reduce works?
- In Map Reduce, during the map phase it counts the words in each document, while in the reduce phase it aggregates the data as per the document spanning the entire collection.
- During the map phase the input data is divided into splits for analysis by map tasks running in parallel across Hadoop framework.
3) Explain what is shuffling in Map Reduce?
The process by which the system performs the sort and transfers the map outputs to the reducer as inputs is known as the shuffle
4) Explain what is distributed Cache in Map Reduce Framework?
Distributed Cache is an important feature provided by map reduce framework. When you want to share some files across all nodes in Hadoop Cluster, Distributed Cache is used. The files could be an executable jar files or simple properties file.
5) Explain what is Name Node in Hadoop?
- Name Node in Hadoop is the node, where Hadoop stores all the file location information in HDFS (Hadoop Distributed File System).
- In other words, Name Node is the centrepiece of an HDFS file system.
- It keeps the record of all the files in the file system, and tracks the file data across the cluster or multiple machines
6) Explain what is Job Tracker in Hadoop? What are the actions followed by Hadoop?
In Hadoop for submitting and tracking Map Reduce jobs, Job Tracker is used. Job tracker run on its own JVM process
Hadoop performs following actions in Hadoop
- Client application submit jobs to the job tracker
- Job Tracker communicates to the Name mode to determine data location
- Near the data or with available slots Job Tracker locates Task Tracker nodes
- On chosen Task Tracker Nodes, it submits the work.
7) Explain what is heartbeat in HDFS?
Heartbeat is referred to a signal used between a data node and Name node, and between task tracker and job tracker, if the Name node or job tracker does not respond to the signal, then it is considered there is some issues with data node or task tracker
8) Explain what combiners is and when you should use a combiner in a Map Reduce Job?
- To increase the efficiency of Map Reduce Program, Combiners are used.
- The amount of data can be reduced with the help of combiner’s that need to be transferred across to the reducers.
- If the operation performed is commutative and associative you can use your reducer code as a combiner.
9) What happens when a data node fails?
When a data node fails
- Job tracker and name node detect the failure
- On the failed node all tasks are re-scheduled
- Name node replicates the users data to another node
10) Explain what is Speculative Execution?
In Hadoop during Speculative Execution a certain number of duplicate tasks are launched.
On different slave node, multiple copies of same map or reduce task can be executed using Speculative Execution.
In simple words, if a particular drive is taking long time to complete a task, Hadoop will create a duplicate task on another disk.
Disk that finish the task first are retained and disks that do not finish first are killed.