Dassault Systemes Hadoop Interview Questions Answers

Dassault Systemes Most Frequently Asked Latest Hadoop Interview Questions Answers

How Mapper Is Instantiated In A Running Job?

The Mapper itself is instantiated in the running job, and will be passed a MapContext object which it can use to configure itself.

Which Are The Methods In The Mapper Interface?

The Mapper contains the run() method, which call its own setup() method only once, it also call a map() method for each input and finally calls it cleanup() method. All above methods you can override in your code.

What Happens If You Don't Override The Mapper Methods And Keep Them As It Is?

If you do not override any methods (leaving even map as-is), it will act as the identity function, emitting each input record as a separate output.

What Is The Use Of Context Object?

The Context object allows the mapper to interact with the rest of the Hadoop system. It Includes configuration data for the job, as well as interfaces which allow it to emit output.

How Can You Add The Arbitrary Key-value Pairs In Your Mapper?

You can set arbitrary (key, value) pairs of configuration data in your Job, e.g. with
Job.getConfiguration().set("myKey", "myVal"), and then retrieve this data in your mapper with
Context.getConfiguration().get("myKey"). This kind of functionality is typically done in the Mapper's setup() method.

How Does Mapper's Run() Method Works?

The Mapper.run() method then calls map(KeyInType, ValInType, Context) for each key/value pair in the InputSplit for that task
Dassault Systemes Most Frequently Asked Latest Hadoop Interview Questions Answers
Dassault Systemes Most Frequently Asked Latest Hadoop Interview Questions Answers

Which Object Can Be Used To Get The Progress Of A Particular Job ?

Context

What Is Next Step After Mapper Or Maptask?

The output of the Mapper are sorted and Partitions will be created for the output. Number of partition depends on the number of reducer.

How Can We Control Particular Key Should Go In A Specific Reducer?

Users can control which keys (and hence records) go to which Reducer by implementing a custom Partitioned.

What Is The Use Of Combiner?

It is an optional component or class, and can be specify via Job.setCombinerClass(ClassName), to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.

How Many Maps Are There In A Particular Job?

The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files.

Generally it is around 10-100 maps per-node. Task setup takes awhile, so it is best if the maps take at least a minute to execute.

Suppose, if you expect 10TB of input data and have a block size of 128MB, you'll end up with 82,000 maps, to control the number of block you can use the mapreduce.job.maps parameter (which only provides a hint to the framework). Ultimately, the number of tasks is controlled by the number of splits returned by the InputFormat.getSplits() method (which you can override).

What Is The Reducer Used For?

Reducer reduces a set of intermediate values which share a key to a (usually smaller) set of values.

The number of reduces for the job is set by the user via Job.setNumReduceTasks(int).

What Is The Jobtracker And What It Performs In A Hadoop Cluster?

JobTracker is a daemon service which submits and tracks the MapReduce tasks to the Hadoop cluster. It runs its own JVM process. And usually it run on a separate machine, and each slave node is configured with job tracker node location. The JobTracker is single point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted.

JobTracker in Hadoop performs following actions

Client applications submit jobs to the Job tracker.
The JobTracker talks to the NameNode to determine the location of the data
The JobTracker locates TaskTracker nodes with available slots at or near the data
The JobTracker submits the work to the chosen TaskTracker nodes.
A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable.
When the work is completed, the JobTracker updates its status.
The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.
A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable.
When the work is completed, the JobTracker updates its status.
Client applications can poll the JobTracker for information.

What Are Combiners? When Should I Use A Combiner In My Mapreduce Job?

Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times. Therefore your MapReduce jobs should not depend on the combiners execution.

What Is Writable & Writablecomparable Interface?

org.apache.hadoop.io.Writable is a Java interface. Any key or value type in the Hadoop Map-Reduce framework implements this interface. Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.
org.apache.hadoop.io.WritableComparable is a Java interface. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface. WritableComparable objects can be compared to each other using Comparators.

How A Task Is Scheduled By A Jobtracker?

The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These messages also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.

How Many Instances Of Tasktracker Run On A Hadoop Cluster?

There is one Daemon Tasktracker process for each slave node in the Hadoop cluster.

What Are The Two Main Parts Of The Hadoop Framework?

Hadoop consists of two main parts.

Hadoop distributed file system, a distributed file system with high throughput,
Hadoop MapReduce, a software framework for processing large data sets.

Can Reducer Talk With Each Other?

No, Reducer runs in isolation.

Where The Mapper's Intermediate Data Will Be Stored?

The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each individual mapper nodes. This is typically a temporary directory location which can be setup in config by the Hadoop administrator. The intermediate data is cleaned up after the Hadoop Job completes.

What Is The Use Of Combiners In The Hadoop Framework?

Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers.

You can use your reducer code as a combiner if the operation performed is commutative and associative.

The execution of combiner is not guaranteed; Hadoop may or may not execute a combiner. Also, if required it may execute it more than 1 times. Therefore your MapReduce jobs should not depend on the combiners’ execution.

Post a Comment

Previous Post Next Post