Hdfs(Engineering > Computer Science And Engineering > Hadoop ) Questions and Answers

Question 1. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.
  1.    puts
  2.    gets
  3.    getSplits
  4.    All of the mentioned
Explanation:-
Answer: Option C. -> getSplits


InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.



Question 2. _________ identifies filesystem pathnames which work as usual with regular expressions.
  1.    -archiveName
  2.    source
  3.    destination
  4.    None of the mentioned
Explanation:-
Answer: Option D. -> None of the mentioned


identifies destination directory which would contain the archive.



Question 3. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain a _________ for that split.
  1.    InputReader
  2.    RecordReader
  3.    OutputReader
  4.    None of the mentioned
Explanation:-
Answer: Option B. -> RecordReader


The RecordReader loads data from its source and converts into key-value pairs suitable for reading by mapper.



Question 4. _________ is a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.
  1.    Flow Scheduler
  2.    Data Scheduler
  3.    Capacity Scheduler
  4.    None of the mentioned
Explanation:-
Answer: Option C. -> Capacity Scheduler


The Capacity Scheduler supports for multiple queues, where a job is submitted to a queue.



Question 5. Which of the following parameter describes destination directory which would contain the archive ?
  1.    -archiveName
  2.    source
  3.    destination
  4.    None of the mentioned
Explanation:-
Answer: Option C. -> destination


-archiveName is the name of the archive to be created.



Question 6. Which of the following scenario may not be a good fit for HDFS ?
  1.    HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
  2.    HDFS is suitable for storing data related to applications requiring low latency data access
  3.    HDFS is suitable for storing data related to applications requiring low latency data access
  4.    None of the mentioned
Explanation:-
Answer: Option A. -> HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file


HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.



Question 7. The need for data replication can arise in various scenarios like :
  1.    Replication Factor is changed
  2.    DataNode goes down
  3.    Data Blocks get corrupted
  4.    All of the mentioned
Explanation:-
Answer: Option D. -> All of the mentioned


Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.



Question 8. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
  1.    DataNode
  2.    NameNode
  3.    Data block
  4.    Replication
Explanation:-
Answer: Option A. -> DataNode


A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.



Question 9. Reducer is input the grouped output of a :
  1.    Mapper
  2.    Reducer
  3.    Writable
  4.    Readable
Explanation:-
Answer: Option A. -> Mapper


In the phase the framework, for each Reducer, fetches the relevant partition of the output of all the Mappers, via HTTP.



Question 10. Interface ____________ reduces a set of intermediate values which share a key to a smaller set of values.
  1.    Mapper
  2.    Reducer
  3.    Writable
  4.    Readable
Explanation:-
Answer: Option B. -> Reducer


Reducer implementations can access the JobConf for the job.