pig tutorial - apache pig tutorial - Apache Pig - Execution - pig latin - apache pig - pig hadoop



How to execute in Apache Pig ?

  • We can run Apache Pig in two modes and they are
    • Local Mode
    • HDFS mode.
learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - hadoop  - apache pig code - apache pig program - apache pig download - apache pig example

Local Mode

  • In this local mode, all the files are installed and run from our local host and our local file system.
  • This local mode is generally used for testing purpose.
  • To run Pig command in local mode, we need access to a single machine where all files are installed and run using our local host and file system.
  • We need to specify local mode by using the -x flag (pig -x local).
  • In local mode, the pig runs on single JVM and accesses our local file system.
  • The local mode is best suitable for dealing with the smaller data sets.
  • By providing the command -x local, we can get in to Pig local mode of execution.
  • In the local mode, Pig always looks for the local file system path where the data is loaded.
  • The command $pig -x local implies that the execution mode is in local mode.

Example:

/* local mode */
$ pig -x local ...
/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...

MapReduce Mode

  • MapReduce mode is used when we load or process the data which exists in the Hadoop File System (HDFS) which is done by using Apache Pig.
  • In this MapReduce mode, whenever we execute the Pig Latin statements to process the data, which is invoked in the back-end to perform a particular operation on the data which exists in the HDFS.
  • To run Pig in MapReduce mode, we need access to a Hadoop cluster and the HDFS installation.
  • MapReduce mode is the default mode when compared to local mode which is specified using the -x flag (pig -x mapreduce).
  • In this MapReduce mode, we are having proper Hadoop cluster setup and Hadoop installations given.
  • The pig runs on MR mode which is default mode for Pig.
  • Pig translates the submitted queries into Map reduce jobs and runs them on top of Hadoop cluster.
  • Pig Latin statements like LOAD, STORE are used to read data from the HDFS file system and to generate output in MapReduce mode.
pig tutorial - apache pig

Example:

/* mapreduce mode */
$ pig ...
or
$ pig -x mapreduce ...
/* mapreduce mode */
$ java -cp pig.jar org.apache.pig.Main ...
or
$ java -cp pig.jar org.apache.pig.Main -x mapreduce ...

Apache Pig Execution Mechanisms

  • Apache Pig scripts can be executed and run in three modes and they are:
    • interactive mode
    • batch mode
    • embedded mode

Interactive Mode

  • We run Apache Pig in interactive mode which is done by using the Grunt shell.
  • In this interactive mode, we can enter the Pig Latin statements and get the output by using Dump operator.

Example:

grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 

Batch Mode

  • We can run Apache Pig in Batch mode by writing the command the Pig Latin script in a single file with .pig extension.

Example:

/* id.pig */

A = load 'passwd' using PigStorage(':');  -- load the passwd file 
B = foreach A generate $0 as id;  -- extract the user IDs 
store B into ‘id.out’;  -- write the results to a file name id.out

Invoking the Grunt Shell

  • We can invoke the Grunt shell in a desired mode (local/MapReduce) by using the −x option as which is given below in table format.
Local mode MapReduce mode

Command −

$ ./pig –x local

Command −

$ ./pig -x mapreduce

Output

Local Mode Output

Output

MapReduce Mode Output
  • Local mode and MapReduce commands will give you the Grunt shell prompt as shown below
grunt>
  • We can exit the Grunt shell using the command ‘ctrl + d’.
  • After invoking the Grunt shell, we can execute a Pig script by entering the Pig Latin statements in it.
grunt> customers = LOAD 'customers.txt' USING PigStorage(',');

Executing Apache Pig in Batch Mode

  • We can write an entire Pig Latin script in a file and execute it using the -x command.

Sample-script.pig

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
   PigStorage(',') as (id:int,name:chararray,city:chararray);
Dump student;
  • Execute the script in the above file which is given below:
Local mode MapReduce mode
$ pig -x local Sample-script.pig $ pig -x mapreduce Sample-script.pig

Related Searches to Apache Pig - Execution