pig tutorial - apache pig tutorial - Apache Pig Cogroup Operator - pig latin - apache pig - pig hadoop



What is COGROUP operator in Apache Pig ?

  • The COGROUP operator is similar to works on the GROUP operator.
  • The only difference between the two operators is that the group operator is normally used with one relation, while the cogroup operator is used in statements involving two or more relations.

Grouping Two Relations using Cogroup

  • Ensure that we have two files namely student_details.txt and wikitechy_employee_detai ls.txt in the HDFS directory /pig_data/ as given below.

student_details.txt

111,Anu,Shankar,23,9876543210,Chennai
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
114,Preethi,Antony,21,9876543213,Pune
115,Raj,Gopal,21,9876543214,Hyderabad
116,Yashika,Kannan,22,9876543215,Delhi
117,siddu,Narayanan,22,9876543216,Kolkata
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar

Wikitechy_employee_details.txt

111,Robert,22,newyork 
112,Bastin,23,Kolkata 
113,Martin,23,Tokyo 
114,Sangavi,25,London 
115,David,23,Bhuwaneshwar 
116,Arnold,22,Chennai
  • You have loaded these files into Pig with the relation names student_details and wikitechy_employee_details respectively, as given below.
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray); 
  
grunt> wikitechy_employee_details = LOAD 'hdfs://localhost:9000/pig_data/employee_details.txt' USING PigStorage(',')
   as (id:int, name:chararray, age:int, city:chararray);
  • You have loaded these files into Pig with the relation names student_details and wikitechy_employee_details respectively, as given below.
  • Now group the records/tuples of the relations student_details and employee_details with the key age, as given below.
grunt> cogroup_data = COGROUP student_details by age, employee_details by age;

Verification

  • Now verify the relation cogroup_data using the DUMP operator as given below.
grunt> Dump cogroup-data;

Output

  • The output, displaying the contents of the relation named cogroup_data as given below.
(21,{(114,Preethi,Antony,21,9876543213,Pune),(115, Raj,Gopal,21,9876543214,Hyderabad)})
(22,{(116,Yashika,Kannan,22,9876543215,Delhi),(117,siddu,Narayanan,22,9876543216,Kolkata)})
(23,{(111,Anu,Shankar,23,9876543210,Chennai),(118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar)})
(24,{(112,Barvathi,Nambiayar,24,9876543211,Chennai),(113,Kajal,Nayak,24,9876543212,Trivendram)})
 (25,{   }, {(114,Sangavi,25,London )})
  • The cogroup operator groups the tuples from each relation according to age where each group depicts a particular age value.

Example

  • If we consider the 1st tuple of the result, it is grouped by age 21. And it contains two bags,
    • The first bag holds all the tuples from the first relation (student_details in this case) having age 21.
    • The second bag contains all the tuples from the second relation (wikitechy_employee_details in this case) having age 21.
    • In case a relation doesn’t have tuples having the age value 21, it returns an empty bag.
 Apache Pig Cogroup Operator

Learn Apache Pig - Apache Pig tutorial - Apache Pig Cogroup Operator - Apache Pig examples - Apache Pig programs


Related Searches to Apache Pig Cogroup Operator