pig tutorial - apache pig tutorial - Apache Pig - Running Scripts - pig latin - apache pig - pig hadoop



What is a script?

  • A script is text file that contains one or more Windows PowerShell commands or expressions.
  • When we run the script, the commands and expressions in the script file run, just as if we typed them at the command line.
  • Typically, we write a script to save command sequence that you use frequently or to share a command sequence with others.
  • Scripts can be as simple as a one-line command or as complex as an application program.
  • Windows PowerShell includes a very rich and powerful scripting language that is designed especially for people who are not programmers.
  • It supports language constructs for looping, conditions, flow-control, variable assignment, and much more.
 pig script

Learn Apache Pig - Apache Pig tutorial - pig script - Apache Pig examples - Apache Pig programs

How to Run Scripts in Apache Pig ?

  • To run a script, type the path and name of the script file.
  • The path is required, even when the script is located in the current directory, to make it more difficult for malicious code to run scripts.
  • The file name extension is optional and, as always, Windows PowerShell is not case sensitive.
learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig latin scripts hadoop architecture

Learn Apache Pig - Apache Pig tutorial - pig running script - Apache Pig examples - Apache Pig programs

Comments in Pig Script:

  • Writing a script in a file, you can include comments in it as given below.

Single -line comments:

  • The single-line comments start with “ - “.
--we can write single line comments like this.

Multi-line comments:

  • The multi-line comments start with “ /* ”, and end with “ */ ”.
/* These are the multi-line comments 
  In the pig script */ 

Executing Pig Script in Batch mode:

  • Although executing Apache Pig statements in batch mode, follow the steps given below.

Step 1:

  • Write all the required Pig Latin statements in a single file.
  • We can write all the Pig Latin statements and commands in a single file and save it as .pig file.

Step 2:

  • Execute the Apache Pig script.
  • You can execute the Pig script from the shell (Linux) as shown below.
Local mode Map Reduce mode
$ pig -x local Example_script.pig $ pig -x mapreduce Example_script.pig
  • We can execute it from the Grunt shell as well using the exec command as given below.
grunt> exec /Example_script.pig

Executing a Pig Script from HDFS:

  • We can also execute a Pig script that resides in the HDFS.
  • Assume there is a Pig script with the name Example_script.pig in the HDFS directory named /pig_data/. You can execute it given below.
$ pig -x mapreduce hdfs://localhost:9000/pig_data/Example_script.pig 

Example:

  • Ensure that we have a file name wikitechy_emp_details.txt in HDFS with the following content.

wikitechy_emp_details.txt

111,Anu,Shankar,23,9876543210,Chennai
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
114,Preethi,Antony,21,9876543213,Pune
115,Raj,Gopal,21,9876543214,Hyderabad
116,Yashika,Kannan,22,9876543215,Delhi
117,siddu,Narayanan,22,9876543216,Kolkata
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
  • You have a sample script with the name Example_script.pig, in the same HDFS directory.
  • This file contains performing operations and transformations on the employee relation, as given below.
  • The first statement of the script will load the data in the file named wikitechy_emp_details.txt as a relation named employee.
  • The second statement of the script will arrange the tuples of the relation in descending order, based on age, and store it as employee_order.
  • The third statement of the script will store the first 4 tuples of employee_order as employee_limit.
  • Finally the fourth statement will dump the content of the relation student_limit.

Execution:

  • The Example_script.pig execute is given below.
$./pig -x mapreduce hdfs://localhost:9000/pig_data/Example_script.pig

Output:

  • Apache pig is executed the content of employee details in the following output.
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
111,Anu,Shankar,23,9876543210,Chennai
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
2015-10-19 10:31:27,446 [main] INFO  org.apache.pig.Main - Pig script completed in 12minutes, 32 seconds and 751 milliseconds (752751 ms)

Related Searches to Apache Pig - Running Scripts