Pig(Engineering > Computer Science And Engineering > Hadoop ) Questions and Answers
Question 1. Which of the following function is used to read data in PIG ?
WRITE
READ
LOAD
None of the mentioned
Explanation:-
Answer: Option C. -> LOAD
PigStorage is the default load function.
Question 2. You can run Pig in interactive mode using the ______ shell.
Grunt
FS
HDFS
None of the mentioned
Explanation:-
Answer: Option A. -> Grunt
Invoke the Grunt shell using the "pig command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.
Question 3. The ________ class mimics the behavior of the Main class but gives users a statistics object back.
PigRun
PigRunner
RunnerPig
None of the mentioned
Explanation:-
Answer: Option B. -> PigRunner
Optionally, you can call the API with an implementation of progress listener which will be invoked by Pig run time during the execution.
Question 4. __________ is a framework for collecting and storing script-level statistics for Pig Latin.
Pig Stats
PStatistics
Pig Statistics
None of the mentioned
Explanation:-
Answer: Option C. -> Pig Statistics
The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file.
Question 5. ___________ return a list of hdfs files to ship to distributed cache.
relativeToAbsolutePath()
setUdfContextSignature()
getCacheFiles()
getShipFiles()
Explanation:-
Answer: Option D. -> getShipFiles()
The default implementation provided in LoadFunc handles this for FileSystem locations.
Question 6. The loader should use ______ method to communicate the load information to the underlying InputFormat.
relativeToAbsolutePath()
setUdfContextSignature()
getCacheFiles()
setLocation()
Explanation:-
Answer: Option D. -> setLocation()
setLocation() method is called by Pig to communicate the load location to the loader.
Question 7. Which of the following file contains user defined functions (UDFs) ?
script2-local.pig
pig.jar
tutorial.jar
excite.log.bz2
Explanation:-
Answer: Option C. -> tutorial.jar
tutorial.jar contains java classes also.
Question 8. Which of the following command can be used for debugging ?
exec
execute
error
throw
Explanation:-
Answer: Option A. -> exec
With the exec command, store statements will not trigger execution; rather, the entire script is parsed before execution starts.
Question 9. Which of the following script is used to check scripts that have failed jobs ?
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;c = filter b by status != 'SUCCESS';dump c;
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;c = group b by (id, user, script_name) parallel 10;d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;e = filter d by max_reduces == 1;dump e;
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;c = group b by (id, user, queue) parallel 10;d = foreach c generate group.user, group.queue, COUNT(b);dump d;
None of the mentioned
Explanation:-
Answer: Option A. -> a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;c = filter b by status != 'SUCCESS';dump c;
Pig provides the ability to register a listener to receive event notifications during the execution of a script.
Question 10. Which of the following code is used to find scripts that use only the default parallelism ?
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;c = filter b by status != 'SUCCESS';dump c;
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;c = group b by (id, user, script_name) parallel 10;d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;e = filter d by max_reduces == 1;dump e;
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;c = group b by (id, user, queue) parallel 10;d = foreach c generate group.user, group.queue, COUNT(b);dump d;
None of the mentioned
Explanation:-
Answer: Option B. -> a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;c = group b by (id, user, script_name) parallel 10;d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;e = filter d by max_reduces == 1;dump e;
The first map in the schema contains job-related entries.