pig tutorial - apache pig tutorial - Apache Pig - Execution - pig latin - apache pig - pig hadoop




How to execute in Apache Pig ?

  • We can run Apache Pig in two modes and they are
    • Local Mode
    • HDFS mode.
learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - hadoop  - apache pig code - apache pig program - apache pig download - apache pig example

Local Mode

  • In this local mode, all the files are installed and run from our local host and our local file system.
  • This local mode is generally used for testing purpose.
  • To run Pig command in local mode, we need access to a single machine where all files are installed and run using our local host and file system.
  • We need to specify local mode by using the -x flag (pig -x local).
  • In local mode, the pig runs on single JVM and accesses our local file system.
  • The local mode is best suitable for dealing with the smaller data sets.
  • By providing the command -x local, we can get in to Pig local mode of execution.
  • In the local mode, Pig always looks for the local file system path where the data is loaded.
  • The command $pig -x local implies that the execution mode is in local mode.

Example:

/* local mode */
$ pig -x local ...
/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...

MapReduce Mode

  • MapReduce mode is used when we load or process the data which exists in the Hadoop File System (HDFS) which is done by using Apache Pig.
  • In this MapReduce mode, whenever we execute the Pig Latin statements to process the data, which is invoked in the back-end to perform a particular operation on the data which exists in the HDFS.
  • To run Pig in MapReduce mode, we need access to a Hadoop cluster and the HDFS installation.
  • MapReduce mode is the default mode when compared to local mode which is specified using the -x flag (pig -x mapreduce).
  • In this MapReduce mode, we are having proper Hadoop cluster setup and Hadoop installations given.
  • The pig runs on MR mode which is default mode for Pig.
  • Pig translates the submitted queries into Map reduce jobs and runs them on top of Hadoop cluster.
  • Pig Latin statements like LOAD, STORE are used to read data from the HDFS file system and to generate output in MapReduce mode.
pig tutorial - apache pig

Example:

/* mapreduce mode */
$ pig ...
or
$ pig -x mapreduce ...
/* mapreduce mode */
$ java -cp pig.jar org.apache.pig.Main ...
or
$ java -cp pig.jar org.apache.pig.Main -x mapreduce ...

Apache Pig Execution Mechanisms

  • Apache Pig scripts can be executed and run in three modes and they are:
    • interactive mode
    • batch mode
    • embedded mode

Interactive Mode

  • We run Apache Pig in interactive mode which is done by using the Grunt shell.
  • In this interactive mode, we can enter the Pig Latin statements and get the output by using Dump operator.

Example:

grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 

Batch Mode

  • We can run Apache Pig in Batch mode by writing the command the Pig Latin script in a single file with .pig extension.

Example:

/* id.pig */

A = load 'passwd' using PigStorage(':');  -- load the passwd file 
B = foreach A generate $0 as id;  -- extract the user IDs 
store B into ‘id.out’;  -- write the results to a file name id.out

Invoking the Grunt Shell

  • We can invoke the Grunt shell in a desired mode (local/MapReduce) by using the −x option as which is given below in table format.
Local mode MapReduce mode

Command −

$ ./pig –x local

Command −

$ ./pig -x mapreduce

Output

Local Mode Output

Output

MapReduce Mode Output
  • Local mode and MapReduce commands will give you the Grunt shell prompt as shown below
grunt>
  • We can exit the Grunt shell using the command ‘ctrl + d’.
  • After invoking the Grunt shell, we can execute a Pig script by entering the Pig Latin statements in it.
grunt> customers = LOAD 'customers.txt' USING PigStorage(',');

Executing Apache Pig in Batch Mode

  • We can write an entire Pig Latin script in a file and execute it using the -x command.

Sample-script.pig

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
   PigStorage(',') as (id:int,name:chararray,city:chararray);
Dump student;
  • Execute the script in the above file which is given below:
Local mode MapReduce mode
$ pig -x local Sample-script.pig $ pig -x mapreduce Sample-script.pig

Related Searches to Apache Pig - Execution

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

×