[Solved-1 Solution] Why “Flatten” in not a UDF in PIG ?



Problem:

Why “Flatten” in not a UDF in PIG ?

Solution 1:

  • The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags.
  • The idea is the same, but the operation and result is different for each type of structure.
  • For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).
  • For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples.
  • If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen.
  • For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).

Related Searches to Why “Flatten” in not a UDF in PIG ?