pig tutorial - apache pig tutorial - Apache Pig - Running Scripts - pig latin - apache pig - pig hadoop
What is a script?
- A script is text file that contains one or more Windows PowerShell commands or expressions.
- When we run the script, the commands and expressions in the script file run, just as if we typed them at the command line.
- Typically, we write a script to save command sequence that you use frequently or to share a command sequence with others.
- Scripts can be as simple as a one-line command or as complex as an application program.
- Windows PowerShell includes a very rich and powerful scripting language that is designed especially for people who are not programmers.
- It supports language constructs for looping, conditions, flow-control, variable assignment, and much more.
Learn Apache Pig - Apache Pig tutorial - pig script - Apache Pig examples - Apache Pig programs
How to Run Scripts in Apache Pig ?
- To run a script, type the path and name of the script file.
- The path is required, even when the script is located in the current directory, to make it more difficult for malicious code to run scripts.
- The file name extension is optional and, as always, Windows PowerShell is not case sensitive.
Learn Apache Pig - Apache Pig tutorial - pig running script - Apache Pig examples - Apache Pig programs
Comments in Pig Script:
- Writing a script in a file, you can include comments in it as given below.
Single -line comments:
- The single-line comments start with “ - “.
- The multi-line comments start with “ /* ”, and end with “ */ ”.
Executing Pig Script in Batch mode:
- Although executing Apache Pig statements in batch mode, follow the steps given below.
- Write all the required Pig Latin statements in a single file.
- We can write all the Pig Latin statements and commands in a single file and save it as .pig file.
- Execute the Apache Pig script.
- You can execute the Pig script from the shell (Linux) as shown below.
|Map Reduce mode
|$ pig -x local Example_script.pig
|$ pig -x mapreduce Example_script.pig
- We can execute it from the Grunt shell as well using the exec command as given below.
Executing a Pig Script from HDFS:
- We can also execute a Pig script that resides in the HDFS.
- Assume there is a Pig script with the name Example_script.pig in the HDFS directory named /pig_data/. You can execute it given below.
- Ensure that we have a file name wikitechy_emp_details.txt in HDFS with the following content.
- You have a sample script with the name Example_script.pig, in the same HDFS directory.
- This file contains performing operations and transformations on the employee relation, as given below.
- The first statement of the script will load the data in the file named wikitechy_emp_details.txt as a relation named employee.
- The second statement of the script will arrange the tuples of the relation in descending order, based on age, and store it as employee_order.
- The third statement of the script will store the first 4 tuples of employee_order as employee_limit.
- Finally the fourth statement will dump the content of the relation student_limit.
- The Example_script.pig execute is given below.
- Apache pig is executed the content of employee details in the following output.