[Solved-2 Solutions] Generating an id/counter for foreach in pig latin ?



What is Foreach ?

  • The FOREACH operator is used to generate specified data transformations based on the column data.

Syntax

  • Here is the syntax of FOREACH operator
grunt> Relation_name2 = FOREACH Relatin_name1 GENERATE (required data);

Problem:

  • If you want some sort of unique identifier/line_number/counter to be generated/appended in foreach construct while iterates through the records. Is there a way to accomplish this without writing a UDF?

How to get that 'a_unique_id' implemented?

B = foreach A generate a_unique_id, field1,...etc

Solution 1 :

  • If we care using pig 0.11 or later then the RANK is exactly what you are looking for.

Here is an example

DUMP A;
(foo,19)
(foo,19)
(foo,7)
(bar,90)
(etc.,0)

B = RANK A ;

DUMP B ;
(1,foo,19)
(2,foo,19)
(3,foo,7)
(4,bar,90)
(5,etc.,0)

Solution 2:

  • There is no built-in UUID function in the main Pig distribution or piggybank. Unfortunately, we think your only option is going to be writing a UDF.

Related Searches to Generating an id/counter for foreach in pig latin