Pig(Engineering > Computer Science And Engineering > Hadoop ) Questions and answers for exam Preparation

Question 1. Which of the following function is used to read data in PIG ?

WRITE
READ
LOAD
None of the mentioned

Explanation:-

Answer: Option C. -> LOAD

PigStorage is the default load function.

Question 2. You can run Pig in interactive mode using the ______ shell.

Grunt
FS
HDFS
None of the mentioned

Explanation:-

Answer: Option A. -> Grunt

Invoke the Grunt shell using the "pig command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.

Question 3. The ________ class mimics the behavior of the Main class but gives users a statistics object back.

PigRun
PigRunner
RunnerPig
None of the mentioned

Explanation:-

Answer: Option B. -> PigRunner

Optionally, you can call the API with an implementation of progress listener which will be invoked by Pig run time during the execution.

Question 4. __________ is a framework for collecting and storing script-level statistics for Pig Latin.

Pig Stats
PStatistics
Pig Statistics
None of the mentioned

Explanation:-

Answer: Option C. -> Pig Statistics

The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file.

Question 5. ___________ return a list of hdfs files to ship to distributed cache.

relativeToAbsolutePath()
setUdfContextSignature()
getCacheFiles()
getShipFiles()

Explanation:-

Answer: Option D. -> getShipFiles()

The default implementation provided in LoadFunc handles this for FileSystem locations.

Question 6. The loader should use ______ method to communicate the load information to the underlying InputFormat.

relativeToAbsolutePath()
setUdfContextSignature()
getCacheFiles()
setLocation()

Explanation:-

Answer: Option D. -> setLocation()

setLocation() method is called by Pig to communicate the load location to the loader.

Question 7. Which of the following file contains user defined functions (UDFs) ?

script2-local.pig
pig.jar
tutorial.jar
excite.log.bz2

Explanation:-

Answer: Option C. -> tutorial.jar

tutorial.jar contains java classes also.

Question 8. Which of the following command can be used for debugging ?

exec
execute
error
throw

Explanation:-

Answer: Option A. -> exec

With the exec command, store statements will not trigger execution; rather, the entire script is parsed before execution starts.

Question 9. Which of the following script is used to check scripts that have failed jobs ?

a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;c = filter b by status != 'SUCCESS';dump c;
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;c = group b by (id, user, script_name) parallel 10;d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;e = filter d by max_reduces == 1;dump e;
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;c = group b by (id, user, queue) parallel 10;d = foreach c generate group.user, group.queue, COUNT(b);dump d;
None of the mentioned

Explanation:-

Answer: Option A. -> a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;c = filter b by status != 'SUCCESS';dump c;

Pig provides the ability to register a listener to receive event notifications during the execution of a script.

Question 10. Which of the following code is used to find scripts that use only the default parallelism ?

a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;c = filter b by status != 'SUCCESS';dump c;
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;c = group b by (id, user, script_name) parallel 10;d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;e = filter d by max_reduces == 1;dump e;
a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;c = group b by (id, user, queue) parallel 10;d = foreach c generate group.user, group.queue, COUNT(b);dump d;
None of the mentioned

Explanation:-

Answer: Option B. -> a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;c = group b by (id, user, script_name) parallel 10;d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;e = filter d by max_reduces == 1;dump e;

The first map in the schema contains job-related entries.