[Solved-1 Solution] How to handle recursive hierarchy in pig ?



What is recursive function

  • A recursive function (DEF) is a function which either calls itself or is in a potential cycle of function calls. As the definition specifies, there are two types of recursive functions. Consider a function which calls itself: we call this type of recursion immediate recursion.

Problem:

  • This below model have unbalanced tree data stored in tabular format like:
parent,child
a,b
b,c
c,d
c,f
Recursive Function

Learn Apache pig - Apache pig tutorial - Recursive Function - Apache pig examples - Apache pig programs

  • How to flatten this hierarchy where each row contains entire path from leaf node to root node in a row as:
leaf node, root node, intermediate nodes
d,a,d:c:b
f,a,e:b

How to solve above problem using hive, pig or mapreduce?

Solution 1:

Here is the sample code to handle recursive hierarchy in pig

Join function:

-- Join parent and child
Define join_hierarchy ( leftA, source, result) returns output {
    joined= join $leftA by parent left, $source by child;
    tmp_filtered= filter joined by source::parent is null;
    part= foreach tmp_filtered leftA::child as child, leftA::path as path;
    $result= union part, $result;
    part_remaining= filter joined by source::parent is not null;
    $output= foreach part_remaining generate $leftA::child as child, source::parent as parent, concat(concat(source::parent,':'),$leftA::path)
 }

Why join

  • A Join simply brings together two data sets. These joins can happen in different ways in Pig - inner, outer, right, left, and outer joins. These however are simple joins and there are specialized joins supported byPig.

Using join is one of the way to handle this function

Load dataset:

--My dataset field delimiter is ','.    
source= load '*****' using pigStorage(',') as (parent:chararray, child:chararray);
--create additional column for path
leftA= foreach source generate child, parent, concat(parent,':');  

--initially result table will be blank.
result= limit leftA 1;
result= foreach result generate '' as child , '' as parent;
--Flatten hierarchy to 4 levels. Add below lines equivalent to hierarchy depth.

leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);

Related Searches to How to handle recursive hierarchy in pig