Hdfs(Engineering > Computer Science And Engineering > Hadoop ) Questions and answers for exam Preparation

Question 1. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.

puts
gets
getSplits
All of the mentioned

Explanation:-

Answer: Option C. -> getSplits

InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.

Question 2. _________ identifies filesystem pathnames which work as usual with regular expressions.

-archiveName
source
destination
None of the mentioned

Explanation:-

Answer: Option D. -> None of the mentioned

identifies destination directory which would contain the archive.

Question 3. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain a _________ for that split.

InputReader
RecordReader
OutputReader
None of the mentioned

Explanation:-

Answer: Option B. -> RecordReader

The RecordReader loads data from its source and converts into key-value pairs suitable for reading by mapper.

Question 4. _________ is a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.

Flow Scheduler
Data Scheduler
Capacity Scheduler
None of the mentioned

Explanation:-

Answer: Option C. -> Capacity Scheduler

The Capacity Scheduler supports for multiple queues, where a job is submitted to a queue.

Question 5. Which of the following parameter describes destination directory which would contain the archive ?

-archiveName
source
destination
None of the mentioned

Explanation:-

Answer: Option C. -> destination

-archiveName is the name of the archive to be created.

Question 6. Which of the following scenario may not be a good fit for HDFS ?

HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
HDFS is suitable for storing data related to applications requiring low latency data access
HDFS is suitable for storing data related to applications requiring low latency data access
None of the mentioned

Explanation:-

Answer: Option A. -> HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file

HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.

Question 7. The need for data replication can arise in various scenarios like :

Replication Factor is changed
DataNode goes down
Data Blocks get corrupted
All of the mentioned

Explanation:-

Answer: Option D. -> All of the mentioned

Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.

Question 8. ________ is the slave/worker node and holds the user data in the form of Data Blocks.

DataNode
NameNode
Data block
Replication

Explanation:-

Answer: Option A. -> DataNode

A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.

Question 9. Reducer is input the grouped output of a :

Mapper
Reducer
Writable
Readable

Explanation:-

Answer: Option A. -> Mapper

In the phase the framework, for each Reducer, fetches the relevant partition of the output of all the Mappers, via HTTP.

Question 10. Interface ____________ reduces a set of intermediate values which share a key to a smaller set of values.

Mapper
Reducer
Writable
Readable

Explanation:-

Answer: Option B. -> Reducer

Reducer implementations can access the JobConf for the job.