pig tutorial - apache pig tutorial - Apache Pig - PluckTuple() Function - pig latin - apache pig - pig hadoop

What is PluckTuple() Function in Apache Pig ?

    • PluckTuple() which is used in Apache Pig is an regex pattern to pluck by
    • We can use the function PluckTuple() after performing operations like join to differentiate the columns of the two schemas.
    • We need to define a string Prefix and we need to filter for the columns in the relation that begin with the prefix.
    • It will allow the user to specify a string prefix, and it will filter for the columns in a relation that begin match that give us the regex pattern.
    • We can include flag 'false' to filter for the columns that do not match that prefix which is given for regex pattern.


      DEFINE pluck PluckTuple(expression1) 
      DEFINE pluck PluckTuple(expression1,expression3) 


        • We can assume that we have two files namely wikitechy_employee_sales.txt and wikitechy_employee_bonus.txt in the HDFS directory /pig_data/.


          • We have loaded these files into Pig, with the relation names called employee_sales and employee_bonus


            grunt> employee_sales = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_sales.txt' USING PigStorage(',')
            as (sno:int, name:chararray, age:int, salary:int, dept:chararray);


              grunt> employee_bonus = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_bonus.txt' USING PigStorage(',')
              as (sno:int, name:chararray, age:int, salary:int, dept:chararray);

              We need to join these two relations by using the join operator which is given below.

              grunt> join_data = join employee_sales by sno, employee_bonus by sno;
              • We can verify the relation join_data by using the Dump operator which is given below:
              <b>grunt> Dump join_data;</b>

              Using PluckTuple() Function

                • We need to define the required expression by which we want to differentiate the columns by using PluckTupe() function.
                grunt> DEFINE pluck PluckTuple('a::');
                • We need to filter the columns in the join_data relation which is given below:
                grunt> data = foreach join_data generate FLATTEN(pluck(*));
                • We need to describe the relation named data by using the grunt operator which is given below:
                <b>grunt> Describe data;</b>
                data: {employee_sales::sno: int, employee_sales::name: chararray, employee_sales::age: int,
                   employee_sales::salary: int, employee_sales::dept: chararray, employee_bonus::sno: int,
                   employee_bonus::name: chararray, employee_bonus::age: int, employee_bonus::salary: int,
                   employee_bonus::dept: chararray}

                Related Searches to Apache Pig - PluckTuple() Function

                Adblocker detected! Please consider reading this notice.

                We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

                We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

                We need money to operate the site, and almost all of it comes from our online advertising.

                Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.