pig tutorial - apache pig tutorial - Apache Pig - PluckTuple() Function - pig latin - apache pig - pig hadoop



What is PluckTuple() Function in Apache Pig ?

    • PluckTuple() which is used in Apache Pig is an regex pattern to pluck by
    • We can use the function PluckTuple() after performing operations like join to differentiate the columns of the two schemas.
    • We need to define a string Prefix and we need to filter for the columns in the relation that begin with the prefix.
    • It will allow the user to specify a string prefix, and it will filter for the columns in a relation that begin match that give us the regex pattern.
    • We can include flag 'false' to filter for the columns that do not match that prefix which is given for regex pattern.

    Syntax

      DEFINE pluck PluckTuple(expression1) 
      DEFINE pluck PluckTuple(expression1,expression3) 
      pluck(expression2)
      

      Example

        • We can assume that we have two files namely wikitechy_employee_sales.txt and wikitechy_employee_bonus.txt in the HDFS directory /pig_data/.
        <b>wikitechy_employee_sales.txt</b>
        1,Joseph,22,25000,sales 
        2,BOB,23,30000,sales 
        3,Saya,23,25000,sales 
        4,Sarah,25,40000,sales 
        5,John,23,45000,sales 
        6,Vanitha,22,35000,sales
        

        wikitechy_employee_bonus.txt

          <b>wikitechy_employee_bonus.txt</b>
          1,Joseph,22,25000,sales 
          2,Jaya,23,20000,admin 
          3,Saya,23,25000,sales 
          4,Preethi,25,50000,admin 
          5,John,23,45000,sales
          6,Sruti,30,30000,admin
          
          • We have loaded these files into Pig, with the relation names called employee_sales and employee_bonus

          employee_sales

            grunt> employee_sales = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_sales.txt' USING PigStorage(',')
            as (sno:int, name:chararray, age:int, salary:int, dept:chararray);
            

            employee_bonus

              grunt> employee_bonus = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_bonus.txt' USING PigStorage(',')
              as (sno:int, name:chararray, age:int, salary:int, dept:chararray);
              

              We need to join these two relations by using the join operator which is given below.

              grunt> join_data = join employee_sales by sno, employee_bonus by sno;
              
              • We can verify the relation join_data by using the Dump operator which is given below:
              <b>grunt> Dump join_data;</b>
              (1,Joseph,22,25000,sales,1,Joseph,22,25000,sales)
              (2,BOB,23,30000,sales,2,Jaya,23,20000,admin)
              (3,Saya,23,25000,sales,3,Saya,23,25000,sales)
              (4,Sarah,25,40000,sales,4,Preethi,25,50000,admin) 
              (5,John,23,45000,sales,5,John,23,45000,sales) 
              (6,Vanitha,22,35000,sales,6,Sruti,30,30000,admin)
              

              Using PluckTuple() Function

                • We need to define the required expression by which we want to differentiate the columns by using PluckTupe() function.
                grunt> DEFINE pluck PluckTuple('a::');
                
                • We need to filter the columns in the join_data relation which is given below:
                grunt> data = foreach join_data generate FLATTEN(pluck(*));
                
                • We need to describe the relation named data by using the grunt operator which is given below:
                <b>grunt> Describe data;</b>
                data: {employee_sales::sno: int, employee_sales::name: chararray, employee_sales::age: int,
                   employee_sales::salary: int, employee_sales::dept: chararray, employee_bonus::sno: int,
                   employee_bonus::name: chararray, employee_bonus::age: int, employee_bonus::salary: int,
                   employee_bonus::dept: chararray}
                

                Related Searches to Apache Pig - PluckTuple() Function