pig tutorial - apache pig tutorial - Apache Pig - SUBTRACT() Function - pig latin - apache pig - pig hadoop



What is SUBTRACT() Function in Apache Pig ?

    • The SUBTRACT() function used in Apache Pig is used to subtract two bags.
    • The SUBTRACT() function takes two bags as inputs and returns the bag contains the tuples of the first bag which are not there in the second bag.
    • The SUBTRACT() function returns the difference between two numbers.

    Syntax

      grunt> SUBTRACT(expression, expression)
      

      Example

        We need to assume that we have two files namely wikitechy_employee_sales.txt and wikitechy_employee_bonus.txt in the HDFS directory /pig_data/ which is given below:

        <b>wikitechy_employee_sales.txt</b>
        1,RobinHood,22,25000,sales 
        2,BOB,23,30000,sales 
        3,Saya,23,25000,sales 
        4,Sarah,25,40000,sales 
        5,Joseph,23,45000,sales 
        6,Vanitha,22,35000,sales
        <b>wikitechy_employee_bonus.txt</b>
        1,RobinHood,22,25000,sales 
        2,Abirami,23,20000,admin 
        3,Saya,23,25000,sales 
        4,Preethi,25,50000,admin 
        5,Joseph,23,45000,sales 
        6,Sruti,30,30000,admin
        

        We have loaded the files into Pig, with the relation names called employee_sales and employee_bonus respectively.

        employee_sales

          grunt> employee_sales = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_sales.txt' USING PigStorage(',')
             as (sno:int, name:chararray, age:int, salary:int, dept:chararray);
          

          employee_bonus

            grunt> employee_bonus = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_bonus.txt' USING PigStorage(',')
               as (sno:int, name:chararray, age:int, salary:int, dept:chararray);	
            
            • We need to group the records and the tuples of the relation names employee_sales and employee_bonus with the key sno, by using the COGROUP operator which is given below:
            grunt> cogroup_data = COGROUP employee_sales by sno, employee_bonus by sno;
            
            • We need to verify the relation cogroup_data by using the DUMP operator which is given below:
            <b>grunt> Dump cogroup_data;</b> 
            (1,{(1,RobinHood,22,25000,sales)},{(1,RobinHood,22,25000,sales)}) 
            (2,{(2,BOB,23,30000,sales)},{(2,Abirami,23,30000,admin)}) 
            (3,{(3,Saya,23,25000,sales)},{(3,Saya,23,25000,sales)}) 
            (4,{(4,Sarah,25,40000,sales)},{(4,Preethi,25,50000,admin)}) 
            (5,{(5,Joseph,23,45000,sales)},{(5,Joseph,23,45000,sales)}) 
            (6,{(6,Vanitha,22,35000,sales)},{(6,Sruti,30,30000,admin)})
            

            Subtracting One Relation from the Other

              • We will now need to subtract the tuples of employee_bonus relation from the employee_sales relation.
              grunt> sub_data = FOREACH cogroup_data GENERATE SUBTRACT(employee_sales, employee_bonus);
              

              Verification

                <b>grunt> Dump sub_data;</b>
                
                ({})
                ({(2,BOB,23,30000,sales)})
                ({})
                ({(4,Sarah,25,40000,sales)})
                ({})
                ({(6,Vanitha,22,35000,sales)})
                
                • Now in the same way, we need to subtract the employee_sales relation from the relation employee_bonus relation which is given below:
                grunt> sub_data = FOREACH cogroup_data GENERATE SUBTRACT(employee_bonus, employee_sales);
                
                • Now we need to verify the contents of the sub_data relation by using the Dump operator which is given below:
                <b>grunt> Dump sub_data;</b>
                ({}) 
                ({(2,Abirami,23,20000,admin)})
                ({})
                ({(4,Preethi,25,50000,admin)})
                ({})
                ({(6,Sruti,30,30000,admin)})
                

                Related Searches to Apache Pig - SUBTRACT() Function