pig tutorial - apache pig tutorial - Apache Pig - SUBTRACT() Function - pig latin - apache pig - pig hadoop
What is SUBTRACT() Function in Apache Pig ?
- The SUBTRACT() function used in Apache Pig is used to subtract two bags.
- The SUBTRACT() function takes two bags as inputs and returns the bag contains the tuples of the first bag which are not there in the second bag.
- The SUBTRACT() function returns the difference between two numbers.
We need to assume that we have two files namely wikitechy_employee_sales.txt and wikitechy_employee_bonus.txt in the HDFS directory /pig_data/ which is given below:
We have loaded the files into Pig, with the relation names called employee_sales and employee_bonus respectively.
- We need to group the records and the tuples of the relation names employee_sales and employee_bonus with the key sno, by using the COGROUP operator which is given below:
- We need to verify the relation cogroup_data by using the DUMP operator which is given below:
Subtracting One Relation from the Other
- We will now need to subtract the tuples of employee_bonus relation from the employee_sales relation.
- Now in the same way, we need to subtract the employee_sales relation from the relation employee_bonus relation which is given below:
- Now we need to verify the contents of the sub_data relation by using the Dump operator which is given below: