[Solved-1 Solution] Pig - Get Max Count ?



What is Max() ?

  • The Pig Latin MAX() function is used to calculate the highest value for a column (numeric values or chararrays) in a single-column bag. While calculating the maximum value, the Max() function ignores the NULL values.
  • To get the global maximum value, we need to perform a Group All operation, and calculate the maximum value using the MAX() function.
  • To get the maximum value of a group, we need to group it using the Group By operator and proceed with the maximum function.

Syntax

  • the syntax of the Max() function.
grunt> Max(expression)

What is group operator ?

  • The GROUP operator is used to group the data in one or more relations. It collects the data having the same key.

Syntax

  • Given below is the syntax of the group operator.
grunt> Group_data = GROUP Relation_name BY age;
  • To get max count we can use max function.

Problem:

Sample Data

DATE      WindDirection

1/1/2000  SW
1/2/2000  SW
1/3/2000  SW
1/4/2000  NW
1/5/2000  NW

Every day is unqiue, and wind direction is not unique, so now we are trying to get the COUNT of the most COMMON wind direction

Query:

weather_data = FOREACH Weather GENERATE $16 AS Date, $9 AS w_direction;
e = FOREACH weather_data 
            {
                unique_winds = DISTINCT weather_data.w_direction;
                GENERATE unique_winds, COUNT(unique_winds);
            }
dump e;

The logic is to find the DISTINCT WindDirections (there are like 7), then group by WindDirectionand apply count.

We get the total number or count of directions of winds.

Solution 1:

We will have to GROUP BY wind direction and get the counts. Order the counts by desc order and get the top most row

wd = FOREACH Weather GENERATE $9 AS w_direction;
gwd = GROUP wd BY w_direction;
cwd = FOREACH gwd GENERATE group as wd,COUNT(wd.$0);
owd = ORDER cwd BY $1 DESC;
mwd  = LIMIT owd 1;
DUMP mwd;

Related Searches to Pig - Get Max Count