pig tutorial - apache pig tutorial - Apache Pig Grunt Shell - pig latin - apache pig - pig hadoop



What is Grunt Shell in Apache Pig ?

  • Grunt Shell is a Shell Command.
  • The Grunt shell of Apache Pig is mainly used to write Pig Latin scripts. Prior to that, we can invoke any shell commands using sh and fs.
  • There are certain useful shell and utility commands provided and given by the Grunt shell.
The Grunt Shell: An interactive shell to write and execute Pig-Latin and to access HDFS
  • Shell commands
    • Fs
      • Invokes any FsShell command from within a Pig script or the Grunt shell.
        • fs -mkdir /tmp
        • fs -copyFromLocal file-x file-y
        • fs -ls file-y
    • Sh
      • Invokes any sh shell command from within a Pig script or the Grunt shell.
        • ls
        • Pwd
  • Utility commands
    • Clear
    • Exec
    • Help
    • History
    • Kill
    • Exec
      • Run a Pig script.
      • exec [–param param_name = param_value] [–param_file file_name] [script]
      • Use the exec command to run a Pig script with no interaction between the script and the Grunt shell (batch mode).
      • Aliases defined in the script are not available to the shell;
    • Run
      • Run a Pig script
      • run [–param param_name = param_value] [–param_file file_name] script
      • Interactive mode

    Shell Commands

    • The Grunt shell of Apache Pig is used to write Pig Latin scripts.
    • We can invoke any shell commands by two commands and they are sh and fs.

    sh Command

    • We can invoke any shell commands which is given from the Grunt shell by using the sh command.
    • By the using the sh command from the Grunt shell, we cannot execute the commands which are a part of the shell environment.

    Syntax

    grunt> sh shell command parameters
    

    Sample Code:

    grunt> sh ls
    pig 
    pig_1444799121955.log 
    pig.cmd 
    pig.py

    fs Command

    • We can invoke any FsShell commands from the Grunt shell by using the fs command.
    • The fs command extends the set of supported file system commands and the capabilities supported for existing commands

    Syntax

    grunt> sh File System command parameters
    

    Sample Code:

    • grunt> fs -ls
    • Found 3 items
    • drwxrwxrwx - Hadoop supergroup 0 2015-09-08 14:13 Hbase
    • drwxr-xr-x - Hadoop supergroup 0 2015-09-09 14:52 seqgen_data
    • drwxr-xr-x - Hadoop supergroup 0 2015-09-08 11:30 twitter_data

    Utility Commands

    • The Grunt shell provides a set of utility commands which is a type of shell command which is used.
    • They include utility commands such as clear, help, history, quit, set, exec, kill, and run to control Pig from the Grunt shell.

    Clear Command

    • The clear command is a utility command which is used to clear the screen of the Grunt shell.

    Syntax

    grunt> clear
    

    Help Command

    • The help command is a utility command which give us a list of Pig commands and Pig properties.

    Usage

    • We get a list of Pig commands by using the help command which is given below:
    grunt> help
    Commands: <pig latin statement>; - See the PigLatin manual for details:
    http://hadoop.apache.org/pig
    File system commands:fs <fs arguments> - Equivalent to Hadoop dfs  command:
    http://hadoop.apache.org/common/docs/current/hdfs_shell.html	 
    Diagnostic Commands:describe <alias>[::<alias] - Show the schema for the alias.
    Inner aliases can be described as A::B.
        explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml] 
           [-param <param_name>=<pCram_value>]
           [-param_file <file_name>] [<alias>] - 
           Show the execution plan to compute the alias or for entire script.
           -script - Explain the entire script.
           -out - Store the output into directory rather than print to stdout.
           -brief - Don't expand nested plans (presenting a smaller graph for overview).
           -dot - Generate the output in .dot format. Default is text format.
           -xml - Generate the output in .xml format. Default is text format.
           -param <param_name - See parameter substitution for details.
           -param_file <file_name> - See parameter substitution for details.
           alias - Alias to explain.
           dump <alias> - Compute the alias and writes the results to stdout.
    Utility Commands: exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -
           Execute the script with access to grunt environment including aliases.
           -param <param_name - See parameter substitution for details.
           -param_file <file_name> - See parameter substitution for details.
           script - Script to be executed.
        run [-param <param_name>=param_value] [-param_file <file_name>] <script> -
           Execute the script with access to grunt environment.
    		 -param <param_name - See parameter substitution for details.         
           -param_file <file_name> - See parameter substitution for details.
           script - Script to be executed.
        sh  <shell command> - Invoke a shell command.
        kill <job_id> - Kill the hadoop job specified by the hadoop job id.
        set <key> <value> - Provide execution parameters to Pig. Keys and values are case sensitive.
           The following keys are supported:
           default_parallel - Script-level reduce parallelism. Basic input size heuristics used 
           by default.
           debug - Set debug on or off. Default is off.
           job.name - Single-quoted name for jobs. Default is PigLatin:<script name>     
           job.priority - Priority for jobs. Values: very_low, low, normal, high, very_high.
           Default is normal stream.skippath - String that contains the path.
           This is used by streaming any hadoop property.
        help - Display this message.
        history [-n] - Display the list statements in cache.
           -n Hide line numbers.
        quit - Quit the grunt shell. 
    
    

    History Command

    • This command will display a list of statements which are executed and used since the Grunt sell has been invoked.

    Usage

    • We have executed the three statements since the opening the Grunt shell.
    grunt> customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
    grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
    grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
    
    • We can produce the following output by using the history command
    grunt> history
    customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(','); 
    orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
    student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
    

    set Command

    • The set command which is given is used to show and assign values to the keys which is used in Pig.

    Usage

    • We can set values to the following keys by using set commands
    Key Description and values
    default_parallel You can set the number of reducers for a map job by passing any whole number as a value to this key.
    debug You can turn off or turn on the debugging freature in Pig by passing on/off to this key.
    job.name You can set the Job name to the required job by passing a string value to this key.
    job.priority

    You can set the job priority to a job by passing one of the following values to this key −

    • very_low
    • low
    • normal
    • high
    • very_high
    stream.skippath For streaming, you can set the path from where the data is not to be transferred, by passing the desired path in the form of a string to this key.

    quit Command

    • We can quit from the Grunt shell by using the quit command.

    Syntax:

    grunt> quit
    

    exec Command

    • We can execute Pig scripts from the Grunt shell by using the exec command

    Syntax

    grunt> exec [-param param_name = param_value] [-param_file file_name] [script]
    

    Example

    Student.txt

    001,Suresh,Hyderabad
    002,Panitha,Malaysia
    003,Pratyush,Singapore
    
    • Here is the sample script which is given for Exec command and it is given as sample_script.pig

    Sample_script.pig

    student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',') 
       as (id:int,name:chararray,city:chararray) 
    Dump student;
    

    Syntax:

    grunt> exec /sample_script.pig
    

    Output:

     (1,Suresh,Hyderabad)
    (2,Panitha,Malaysia)
    (3,Pratyush,Singapore)
    

    kill Command

    • We can kill a MapReduce job from the Grunt shell by using the kill command.

    Syntax:

    grunt> kill JobId
    

    Example:

    grunt> kill Id_0055
    

    run Command

    • We can run a Pig script from the Grunt shell by using the run command

    Syntax

    grunt> run [-param param_name = param_value] [-param_file file_name] script
    

    Example

    Student.txt

    004,vanitha,Delhi
    005,priya,Mumbai
    006,supriya,Banglaore
    
    • We can assume that we have a script file which is called sample_script.pig in the local file system which is given with the following content.

    Sample_script.pig

    student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
       PigStorage(',') as (id:int,name:chararray,city:chararray);
    

    Sample_script.pig Syntax:

    grunt> run /sample_script.pig
    

    Output:

    grunt> Dump;
    (4,vanitha,Delhi)
    (5,priya,Mumbai)
    (6,supriya,Banglaore)
    
    

    Related Searches to Apache Pig Grunt Shell