[Solved-2 Solutions] How to use Apache Pig rank function ?



What is Rank function ?

  • Pig 0.11.0 rank function used for generating ranks for every data.

Syntax :

ranked = RANK input [BY [COL [ASC|DESC]]] [DENSE];

Problem :

How to use apache pig rank function ?

Solution 1:

  • We can group data by id then use the UDF Enumerate to append an index to each tuple of the bags.
register datafu-1.1.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');

data = load 'data' using PigStorage(',') as (id:chararray, rating:int);
data = group data by id;
data = foreach data {
  sorted = order data by rating DESC;
  generate group, sorted;

Solution 2:

  • We can use RANK function as below:
B = rank A by rating DESC; dump B;

Related Searches to How to use Apache Pig rank function ?