[Solved-2 Solutions] Pivot table with Apache Pig ?



Problem:

This is related to the question in Pivot table with Apache Pig.

Input data as

Id    Name     Value 
1     Column1  Row11 
1     Column2  Row12 
1     Column3  Row13 
2     Column1  Row21 
2     Column2  Row22 
2     Column3  Row23 

and need to pivot and get the output as

Id    Column1 Column2 Column3 
1      Row11    Row12   Row13 
2      Row21    Row22   Row23 

How to do it in Pig ?

Solution 1:

It can be done in 2 ways:

1. Write a UDF which returns a bag of tuples. It will be the most flexible solution, but requires Java code.
2. Write a rigid script like this below:

inpt = load '/pig_fun/input/pivot.txt' as (Id, Column1, Column2, Column3);
bagged = foreach inpt generate Id, TOBAG(TOTUPLE('Column1', Column1), TOTUPLE('Column2', Column2), TOTUPLE('Column3', Column3)) as toPivot;
pivoted_1 = foreach bagged generate Id, FLATTEN(toPivot) as t_value;
pivoted = foreach pivoted_1 generate Id, FLATTEN(t_value);
dump pivoted;

Running this script gives following results:

(1,Column1,11)
(1,Column2,12)
(1,Column3,13)
(2,Column1,21)
(2,Column2,22)
(2,Column3,23)
(3,Column1,31)
(3,Column2,32)
(3,Column3,33)

Solution 2:

  • We can remove col3 from id 1 to show how to handle optional data
  • Id Name Value 1 Column1 Row11 1 Column2 Row12 2 Column1 Row21 2 Column2 Row22 2 Column3 Row23

Here is the code using pigscript

data1       = load 'data.txt' using PigStorage() as (id:int, key:chararray, value:chararray);
grped       = group data1 by id;
pvt         = foreach grped {
    col1        = filter data1 by key =='Column1';
    col2        =filter data1  by key =='Column2';
    col3        =filter data1  by key =='Column3';
    generate flatten(group) as id,
        flatten(col1.value) as col1, 
        flatten(col2.value) as col2, 
        flatten((IsEmpty(col3.value) ? {('NULL')} : col3.value)) as col3; --HANDLE NULL
};
dump pvt;

Results:

(1,Row11,Row12,NULL)
(2,Row21,Row22,Row23)

Related Searches to Pivot table with Apache Pig