[Solved-2 Solution] How to Expand an array with Apache Pig ?



What is an array

  • An array is a data structure that contains a group of elements. Typically these elements are all of the same data type , such as an integer or string .
  • Arrays are commonly used in computer programs to organize data so that a related set of values can be easily sorted or searched.

Problem:

How to Expand an array with Apache Pig ?

Solution 1:

  • If we need to do this transformation right now the easiest way is probably to do a UDF in Python or Java
  • However, most of the time it is better to keep the same number of columns in each record (e.g. keep your array as a bag or tuple and don't "flatten" it in one record).

Solution 2:

  • Check out this Python UDF we wrote for doing that (hopefully soon to be part of Python PiggyBank). we can use bags and then flatten them to get the results we want,
  • For example, assuming your data set is called blah, we should be able to register function and,

Using flatten is one of the way to achieve

flattened_blah = FOREACH blah GENERATE item1, item2, FLATTEN(bagToTuple(item3)) AS item4, item5, item6, item7, item8, item9

Related Searches to How to Expand an array with Apache Pig