pig tutorial - apache pig tutorial - Apache Pig Cogroup Operator - pig latin - apache pig - pig hadoop

What is COGROUP operator in Apache Pig ?

  • The COGROUP operator is similar to works on the GROUP operator.
  • The only difference between the two operators is that the group operator is normally used with one relation, while the cogroup operator is used in statements involving two or more relations.

Grouping Two Relations using Cogroup

  • Ensure that we have two files namely student_details.txt and wikitechy_employee_detai ls.txt in the HDFS directory /pig_data/ as given below.




  • You have loaded these files into Pig with the relation names student_details and wikitechy_employee_details respectively, as given below.
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray); 
grunt> wikitechy_employee_details = LOAD 'hdfs://localhost:9000/pig_data/employee_details.txt' USING PigStorage(',')
   as (id:int, name:chararray, age:int, city:chararray);
  • You have loaded these files into Pig with the relation names student_details and wikitechy_employee_details respectively, as given below.
  • Now group the records/tuples of the relations student_details and employee_details with the key age, as given below.
grunt> cogroup_data = COGROUP student_details by age, employee_details by age;


  • Now verify the relation cogroup_data using the DUMP operator as given below.
grunt> Dump cogroup-data;


  • The output, displaying the contents of the relation named cogroup_data as given below.
(21,{(114,Preethi,Antony,21,9876543213,Pune),(115, Raj,Gopal,21,9876543214,Hyderabad)})
 (25,{   }, {(114,Sangavi,25,London )})
  • The cogroup operator groups the tuples from each relation according to age where each group depicts a particular age value.


  • If we consider the 1st tuple of the result, it is grouped by age 21. And it contains two bags,
    • The first bag holds all the tuples from the first relation (student_details in this case) having age 21.
    • The second bag contains all the tuples from the second relation (wikitechy_employee_details in this case) having age 21.
    • In case a relation doesn’t have tuples having the age value 21, it returns an empty bag.
 Apache Pig Cogroup Operator

Learn Apache Pig - Apache Pig tutorial - Apache Pig Cogroup Operator - Apache Pig examples - Apache Pig programs

Related Searches to Apache Pig Cogroup Operator

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.