[Solved-2 Solutions] Pig: Pivoting & Sum 3 relations ?



Problem:

We have 3 different relations as mentioned below & we can get the output using UDFs but looking for implementation in PIG.

Proc:

FN1,10
FN2,20
FN3,23
FN4,25
FN5,15
FN7,40
FN10,56

Rej:

FN1,12
FN2,13
FN3,33
FN6,60
FN8,23
FN9,44
FN10,4

AllFN:

FN1
FN2
FN3
FN4
FN5
FN6
FN7
FN8
FN9
FN10

Output required is:

FN1,10,12,22
FN2,20,13,33
FN3,23,33,56
FN4,25,0,25
FN5,15,0,15
FN6,0,60,60
FN7,40,0,40
FN8,0,23,23
FN9,0,44,44
FN10,56,4,60

Solution 1:

The below code shows pivoting and summing of relations

A = LOAD 'test.txt' using PigStorage(',');
B = LOAD 'test2.txt' using PigStorage(',');
C = LOAD 'test3.txt' using PigStorage(',');
D = COGROUP A by $0, B by $0;
E = COGROUP C by $0, D by $0;
F = FOREACH E generate $0, FLATTEN(D.A), FLATTEN(D.B);
G = FOREACH F generate $0, $1.$1, $2.$1;
H = FOREACH G generate $0, FLATTEN((IsEmpty($1)?null:$1)), FLATTEN((IsEmpty($2)?null:$2));
I = foreach H generate $0, ($1 is null?0:$1),($2 is null?0:$2),($1 is null?0:$1)+($2 is null?$0:$2);
dump I;

Output for the above code

A = LOAD 'test.txt' using PigStorage(',');
B = LOAD 'test2.txt' using PigStorage(',');
C = LOAD 'test3.txt' using PigStorage(',');
D = COGROUP A by $0, B by $0;
E = COGROUP C by $0, D by $0;
F = FOREACH E generate $0, FLATTEN(D.A), FLATTEN(D.B);
G = FOREACH F generate $0, $1.$1, $2.$1;
H = FOREACH G generate $0, FLATTEN((IsEmpty($1)?null:$1)), FLATTEN((IsEmpty($2)?null:$2));
I = foreach H generate $0, ($1 is null?0:$1),($2 is null?0:$2),($1 is null?0:$1)+($2 is null?$0:$2);
dump I;(FN1,10,12,22)
(FN2,20,13,33)
(FN3,23,33,56)
(FN4,25,0,)
(FN5,15,0,)
(FN6,0,60,60)
(FN7,40,0,)

Solution 2:

  • The way to do it without UDF is to group on Id and than in nested foreach select rows for each of the column names, then join them in the generate.
inpt = load '~/rows_to_cols.txt' as (Id : chararray, Name : chararray, Value: chararray);
grp = group inpt by Id;
maps = foreach grp {
    col1 = filter inpt by Name == 'Column1';
    col2 = filter inpt by Name == 'Column2';
    col3 = filter inpt by Name == 'Column3';
    generate flatten(group) as Id, flatten(col1.Value) as Column1, flatten(col2.Value)  as Column2, flatten(col3.Value)  as Column3;
};


Output:

(1,Row11,Row12,Row13)
(2,Row21,Row22,Row23)

Related Searches to Pig: Pivoting & Sum 3 relations