Mapreduce Types And Hadoop Cluster(Engineering > Computer Science And Engineering > Hadoop ) Questions and answers for exam Preparation

Question 1. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.

Mapper
Cascader
Scalding
None of the mentioned

Explanation:-

Answer: Option D. -> None of the mentioned

The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.

Question 2. Which of the following phases occur simultaneously ?

Shuffle and Sort
Reduce and Sort
Shuffle and Map
All of the mentioned

Explanation:-

Answer: Option A. -> Shuffle and Sort

The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

Question 3. Which is the most popular NoSQL database for scalable big data store with Hadoop ?

Hbase
MongoDB
Cassandra
None of the mentioned

Explanation:-

Answer: Option A. -> Hbase

HBase is the Hadoop database: a distributed, scalable Big Data store that lets you host very large tables ” billions of rows multiplied by millions of columns ” on clusters built with commodity hardware.

Question 4. HDFS and NoSQL file systems focus almost exclusively on adding nodes to :

Scale out
Scale up
Both Scale out and up
None of the mentioned

Explanation:-

Answer: Option A. -> Scale out

HDFS and NoSQL file systems focus almost exclusively on adding nodes to increase performance (scale-out) but even they require node configuration with elements of scale up.

Question 5. The ________ option allows you to copy jars locally to the current working directory of tasks and automatically unjar the files.

archives
files
task
None of the mentioned

Explanation:-

Answer: Option A. -> archives

Archives options is also a generic option.

Question 6. ______________ class allows the Map/Reduce framework to partition the map outputs based on certain key fields, not the whole keys.

KeyFieldPartitioner
KeyFieldBasedPartitioner
KeyFieldBased
None of the mentioned

Explanation:-

Answer: Option B. -> KeyFieldBasedPartitioner

The primary key is used for partitioning, and the combination of the primary and secondary keys is used for sorting.

Question 7. An ___________ is responsible for creating the input splits, and dividing them into records.

TextOutputFormat
TextInputFormat
OutputInputFormat
InputFormat

Explanation:-

Answer: Option D. -> InputFormat

As a MapReduce application writer, you don't need to deal with InputSplits directly, as they are created by an InputFormat.

Question 8. __________ class allows you to specify the InputFormat and Mapper to use on a per-path basis.

MultipleOutputs
MultipleInputs
SingleInputs
None of the mentioned

Explanation:-

Answer: Option B. -> MultipleInputs

One might be tab-separated plain text, the other a binary sequence file. Even if they are in the same format, they may have different representations, and therefore need to be parsed differently.

Question 9. ______________ is another implementation of the MapRunnable interface that runs mappers concurrently in a configurable number of threads.

MultithreadedRunner
MultithreadedMap
MultithreadedMapRunner
SinglethreadedMapRunner

Explanation:-

Answer: Option C. -> MultithreadedMapRunner

A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs, which it passes to the map function.

Question 10. __________ is a variant of SequenceFileInputFormat that converts the sequence file's keys and values to Text objects

SequenceFile
SequenceFileAsTextInputFormat
SequenceAsTextInputFormat
All of the mentioned

Explanation:-

Answer: Option B. -> SequenceFileAsTextInputFormat

With multiple reducers, records will be allocated evenly across reduce tasks, with all records that share the same key being processed by the same reduce task.