pig tutorial - apache pig tutorial - Apache Pig - Filter Operator - pig latin - apache pig - pig hadoop



What is Filter Operator in Apache Pig ?

    • Filter operator is a simple and a powerful operation which is given in Apache Pig.
    • Filter operator filters only the desired data out of huge chunk of data and then it process business logic in parallel which is much faster when compared to filtering the data and running business logic on the full volume data.
    • The filter operator which is used in pig is used to remove unwanted records from the data file.
    • The filter operator is used to select the required tuples from a relation which is done based on the condition.
    • Filter operator allows us to remove unwanted records based on a condition.
  • FILTER instruction:
    • Generate a new relation by filtering data on a relation
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig filter operation

    Syntax

      grunt> Relation2_name = FILTER Relation1_name BY (condition);
      

      Example:

        wikitechy_student_details.txt

          001,Suresh,Reddy,21,9848022337,Hyderabad
          002,harish,Battacharya,22,9848022338,Kolkata
          003,Fathima,Khanna,23,9848022339,Delhi 
          004,Preethi,Agarwal,21,9848022330,Pune 
          005,Vanitha,Mohanthy,24,9848022336,Bhuwaneshwar 
          006,Sruti,Mishra,25,9848022335,Chennai 
          007,Kamal,Nayak,26,9848022334,trivendram 
          008,Barath,Nambiayar,27,9848022333,Chennai
          
          • We have loaded the file into Pig with the relation name wikitechy_student_details which is given below:
          grunt>  wikitechy_student_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_student_details.txt' USING PigStorage(',')
             as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);
          
          • Now we need to filter the data by using the Filter operator to get the details of the students who belong to the city Chennai.
          filter_data = FILTER wikitechy_student_details BY city == 'Chennai';
          

          Verification:

            grunt> Dump filter_data;
            

            Output:

              (6,Sruti,Mishra,23,9848022335,Chennai)
              (8,Barath,Nambiayar,24,9848022333,Chennai)
              

              Related Searches to Apache Pig - Filter Operator