pig tutorial - apache pig tutorial - Apache Pig Grunt Shell - pig latin - apache pig - pig hadoop




What is Grunt Shell in Apache Pig ?

  • Grunt Shell is a Shell Command.
  • The Grunt shell of Apache Pig is mainly used to write Pig Latin scripts. Prior to that, we can invoke any shell commands using sh and fs.
  • There are certain useful shell and utility commands provided and given by the Grunt shell.
The Grunt Shell: An interactive shell to write and execute Pig-Latin and to access HDFS
  • Shell commands
    • Fs
      • Invokes any FsShell command from within a Pig script or the Grunt shell.
        • fs -mkdir /tmp
        • fs -copyFromLocal file-x file-y
        • fs -ls file-y
    • Sh
      • Invokes any sh shell command from within a Pig script or the Grunt shell.
        • ls
        • Pwd
  • Utility commands
    • Clear
    • Exec
    • Help
    • History
    • Kill
    • Exec
      • Run a Pig script.
      • exec [–param param_name = param_value] [–param_file file_name] [script]
      • Use the exec command to run a Pig script with no interaction between the script and the Grunt shell (batch mode).
      • Aliases defined in the script are not available to the shell;
    • Run
      • Run a Pig script
      • run [–param param_name = param_value] [–param_file file_name] script
      • Interactive mode

    Shell Commands

    • The Grunt shell of Apache Pig is used to write Pig Latin scripts.
    • We can invoke any shell commands by two commands and they are sh and fs.

    sh Command

    • We can invoke any shell commands which is given from the Grunt shell by using the sh command.
    • By the using the sh command from the Grunt shell, we cannot execute the commands which are a part of the shell environment.

    Syntax

    grunt> sh shell command parameters
    

    Sample Code:

    grunt> sh ls
    pig 
    pig_1444799121955.log 
    pig.cmd 
    pig.py

    fs Command

    • We can invoke any FsShell commands from the Grunt shell by using the fs command.
    • The fs command extends the set of supported file system commands and the capabilities supported for existing commands

    Syntax

    grunt> sh File System command parameters
    

    Sample Code:

    • grunt> fs -ls
    • Found 3 items
    • drwxrwxrwx - Hadoop supergroup 0 2015-09-08 14:13 Hbase
    • drwxr-xr-x - Hadoop supergroup 0 2015-09-09 14:52 seqgen_data
    • drwxr-xr-x - Hadoop supergroup 0 2015-09-08 11:30 twitter_data

    Utility Commands

    • The Grunt shell provides a set of utility commands which is a type of shell command which is used.
    • They include utility commands such as clear, help, history, quit, set, exec, kill, and run to control Pig from the Grunt shell.

    Clear Command

    • The clear command is a utility command which is used to clear the screen of the Grunt shell.

    Syntax

    grunt> clear
    

    Help Command

    • The help command is a utility command which give us a list of Pig commands and Pig properties.

    Usage

    • We get a list of Pig commands by using the help command which is given below:
    grunt> help
    Commands: <pig latin statement>; - See the PigLatin manual for details:
    http://hadoop.apache.org/pig
    File system commands:fs <fs arguments> - Equivalent to Hadoop dfs  command:
    http://hadoop.apache.org/common/docs/current/hdfs_shell.html	 
    Diagnostic Commands:describe <alias>[::<alias] - Show the schema for the alias.
    Inner aliases can be described as A::B.
        explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml] 
           [-param <param_name>=<pCram_value>]
           [-param_file <file_name>] [<alias>] - 
           Show the execution plan to compute the alias or for entire script.
           -script - Explain the entire script.
           -out - Store the output into directory rather than print to stdout.
           -brief - Don't expand nested plans (presenting a smaller graph for overview).
           -dot - Generate the output in .dot format. Default is text format.
           -xml - Generate the output in .xml format. Default is text format.
           -param <param_name - See parameter substitution for details.
           -param_file <file_name> - See parameter substitution for details.
           alias - Alias to explain.
           dump <alias> - Compute the alias and writes the results to stdout.
    Utility Commands: exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -
           Execute the script with access to grunt environment including aliases.
           -param <param_name - See parameter substitution for details.
           -param_file <file_name> - See parameter substitution for details.
           script - Script to be executed.
        run [-param <param_name>=param_value] [-param_file <file_name>] <script> -
           Execute the script with access to grunt environment.
    		 -param <param_name - See parameter substitution for details.         
           -param_file <file_name> - See parameter substitution for details.
           script - Script to be executed.
        sh  <shell command> - Invoke a shell command.
        kill <job_id> - Kill the hadoop job specified by the hadoop job id.
        set <key> <value> - Provide execution parameters to Pig. Keys and values are case sensitive.
           The following keys are supported:
           default_parallel - Script-level reduce parallelism. Basic input size heuristics used 
           by default.
           debug - Set debug on or off. Default is off.
           job.name - Single-quoted name for jobs. Default is PigLatin:<script name>     
           job.priority - Priority for jobs. Values: very_low, low, normal, high, very_high.
           Default is normal stream.skippath - String that contains the path.
           This is used by streaming any hadoop property.
        help - Display this message.
        history [-n] - Display the list statements in cache.
           -n Hide line numbers.
        quit - Quit the grunt shell. 
    
    

    History Command

    • This command will display a list of statements which are executed and used since the Grunt sell has been invoked.

    Usage

    • We have executed the three statements since the opening the Grunt shell.
    grunt> customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
    grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
    grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
    
    • We can produce the following output by using the history command
    grunt> history
    customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(','); 
    orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
    student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
    

    set Command

    • The set command which is given is used to show and assign values to the keys which is used in Pig.

    Usage

    • We can set values to the following keys by using set commands
    Key Description and values
    default_parallel You can set the number of reducers for a map job by passing any whole number as a value to this key.
    debug You can turn off or turn on the debugging freature in Pig by passing on/off to this key.
    job.name You can set the Job name to the required job by passing a string value to this key.
    job.priority

    You can set the job priority to a job by passing one of the following values to this key −

    • very_low
    • low
    • normal
    • high
    • very_high
    stream.skippath For streaming, you can set the path from where the data is not to be transferred, by passing the desired path in the form of a string to this key.

    quit Command

    • We can quit from the Grunt shell by using the quit command.

    Syntax:

    grunt> quit
    

    exec Command

    • We can execute Pig scripts from the Grunt shell by using the exec command

    Syntax

    grunt> exec [-param param_name = param_value] [-param_file file_name] [script]
    

    Example

    Student.txt

    001,Suresh,Hyderabad
    002,Panitha,Malaysia
    003,Pratyush,Singapore
    
    • Here is the sample script which is given for Exec command and it is given as sample_script.pig

    Sample_script.pig

    student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',') 
       as (id:int,name:chararray,city:chararray) 
    Dump student;
    

    Syntax:

    grunt> exec /sample_script.pig
    

    Output:

     (1,Suresh,Hyderabad)
    (2,Panitha,Malaysia)
    (3,Pratyush,Singapore)
    

    kill Command

    • We can kill a MapReduce job from the Grunt shell by using the kill command.

    Syntax:

    grunt> kill JobId
    

    Example:

    grunt> kill Id_0055
    

    run Command

    • We can run a Pig script from the Grunt shell by using the run command

    Syntax

    grunt> run [-param param_name = param_value] [-param_file file_name] script
    

    Example

    Student.txt

    004,vanitha,Delhi
    005,priya,Mumbai
    006,supriya,Banglaore
    
    • We can assume that we have a script file which is called sample_script.pig in the local file system which is given with the following content.

    Sample_script.pig

    student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
       PigStorage(',') as (id:int,name:chararray,city:chararray);
    

    Sample_script.pig Syntax:

    grunt> run /sample_script.pig
    

    Output:

    grunt> Dump;
    (4,vanitha,Delhi)
    (5,priya,Mumbai)
    (6,supriya,Banglaore)
    
    

    Related Searches to Apache Pig Grunt Shell

    Adblocker detected! Please consider reading this notice.

    We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

    We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

    We need money to operate the site, and almost all of it comes from our online advertising.

    Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

    ×