Usage of FOREACH operation in Pig scripts

  • The FOREACH operator is used to generate specified data transformations based on the column data.

Syntax:

grunt> Relation_name2 = FOREACH Relatin_name1 GENERATE (required data);

Pig FOREACH can be used in two ways:

  1. Simple FOREACH…GENERATE
  2. Nested FOREACH {…GENERATE }

Simple FOREACH…GENERATE:

  • This is simple to use the FOREACH…GENERATE in a single line of pig statement to generate the columns that we need.
  • Relation_Name = FOREACH <Previous Relation Name> GENERATE column1, column2…..;

Nested FOREACH {…GENERATE }

  • Use operators such as DISTINCT, FILTER, LIMIT, ORDER and SAMPLE.

Syntax:

Relation_Name = FOREACH Nested_Relation_Name {Inner Relation = Nested_Operation; GENERATE expression [, expression …] };
    • It should be taken set of expressions and it applied to every record in the data pipeline, hence define as FOREACH.
    • The pipeline to next operator.For those database, it is Pig’s projection operator.

For example, the code loads an entire record, then it removes all user and id fields from each record:

    • A = load ‘input’ as (user:chararray, id:long, address:chararray, phone:chararray, preferences:map[]);
    • B = foreach A generate user, id;
what-is-the-usage-of-foreach-operation-in-pig-scripts

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,