[Solved-1 Solution] Hadoop Pig count number ?



What is count()

  • The COUNT() function of Pig Latin is used to get the number of elements in a bag. While counting the number of tuples in a bag, the COUNT() function ignores (will not count) the tuples having a NULL value in the FIRST FIELD.

Syntax

  • Given below is the syntax of the COUNT() function.
grunt> COUNT(expression)

Problem :

What's the effective way to count number in Pig?

Solution 1:

  • Two things. Firstly, count should actually be COUNT . In pig, all builtin functions should be called with all-caps.
  • Secondly, COUNT counts the number of values in a bag, not for a value. Therefore, we should group by true/false, then COUNT:
boolean = FOREACH records GENERATE $3 AS trueORfalse ;
groups = GROUP boolean BY trueORfalse ;
counts = FOREACH groups GENERATE group AS trueORfalse, COUNT(boolean) ;

Output:

true, 2)
(false, 1)
  • If we want the counts of true and false in their own relations then we can FILTER the output of counts. However, it would probably be better to SPLIT boolean, then do two separate counts:
boolean = FOREACH records GENERATE $3 AS trueORfalse ;
SPLIT boolean INTO alltrue IF trueORfalse == 'true', 
                   allfalse IF trueORfalse == 'false' ;

tcount = FOREACH (GROUP alltrue ALL) GENERATE COUNT(alltrue) ;
fcount = FOREACH (GROUP allfalse ALL) GENERATE COUNT(allfalse) ;

Related Searches to Hadoop Pig count number