[Solved-2 Solutions] Latin pig bag to tuple after group by ?

  • A bag is a collection of tuples.
  • A tuple is an ordered set of fields.
  • A field is a piece of data.
  • A Pig relation is a bag of tuples. A Pig relation is similar to a table in a relational database, where the tuples in the bag correspond to the rows in a table.
  • Unlike a relational table, however, Pig relations don't require that every tuple contain the same number of fields or that the fields in the same position (column) have the same type.

What is GROUP BY

  • The GROUP operator is used to group the data in one or more relations. It collects the data having the same key.


  • The syntax of the group operator.
grunt> Group_data = GROUP Relation_name BY age;


We have the following data with schema (t0: chararray,t1: int,t2: int)


We need to generate the following results like this: (group by t0, and ordered by t1)

(A, ((1,2),(2,3),(3,2)))
(B, ((1,2),(2,2),(4,2)))

Note that we need only tuples in the second component, not bags.

Solution 1:

  • We can use the below code to tuple after GROUP BY
grunt> a_input = Load '/home/training/pig/Join/order_temp.csv' Using PigStorage(',') as (t0:chararray,t1:int,t2:int);
Group by in pig

Learn Apache pig - Apache pig tutorial - Group By Operator in pig - Apache pig examples - Apache pig programs

Solution 2:

We can use the below code.

-- A: (t0: chararray,t1: int,t2: int)
B = GROUP A BY t0 ;
            -- Project out the first column of A.
            projected = FOREACH A GENERATE t1, t2 ;
            -- Now you can order the projection.
            ordered = ORDER projected BY t1 ;
    GENERATE group AS t0, ordered AS vals ;
  • Tuples should only be used when we know the exact number and position of the fields in the tuple.
  • Otherwise then your schema will not be defined and it will be very difficult in order to access the fields. This is because the entire tuple will be treated as a bytearray, and so we will manually have to find and cast everything.

Related Searches to Latin pig bag to tuple after group by