pig tutorial - apache pig tutorial - Apache Pig Cross Operator - pig latin - apache pig - pig hadoop
What is CROSS operator?
The CROSS operator computes the cross-product of two or more relations. This chapter explains with example how to use the cross operator in Pig Latin.
- Cartesian product of two or more relations
The syntax of the CROSS operator is
Assume that we have two files namely customers.txt and orders.txt in the /pig_data/ directory of HDFS as shown below.
And we have loaded these two files into Pig with the relations customers and orders as shown below.
Now get the cross-product of these two relations using the cross operator on these two relations as shown below.
Verify the relation cross_data using the DUMP operator as shown below.
It will produce the following output, displaying the contents of the relation cross_data.
Step 1 - Change the directory to /usr/local/pig/bin
Step 2 - Enter into grunt shell in MapReduce mode.
Step 3 - Create a customers.txt file.
Step 4 - Add these following lines to customers.txt file.
Step 5 - Create a orders.txt file.
Step 6 - Add these following lines to orders.txt file.
Step 7 - Copy customers.txt and orders.txt from local file system to HDFS. In my case, the customers.txt and orders.txt file are stored in /home/hduser/Desktop/PIG/ directory.
Step 8 - Load customers data.
Step 9 - Load orders data.
Step 10 - Cross data.