[Solved-3 Solutions] Could not infer COUNT function in pig ?



What is Count() ?

  • The COUNT() function of Pig Latin is used to get the number of elements in a bag. While counting the number of tuples in a bag, the COUNT() function ignores (will not count) the tuples having a NULL value in the FIRST FIELD
  • We trying to write a pig latin script to pull the count of a dataset that we have filtered.

Syntax

  • Given below is the syntax of the COUNT() function
grunt> COUNT(expression)

Problem :

We are trying to write a pig latin script to pull the count of a dataset that we have filtered.

Here is the Script:

/* scans by title */

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
scancount       = FOREACH productscans GENERATE COUNT($0);
DUMP scancount;
  • For some reason, we get the error:

Could not infer the matching function for org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an explicit cast.

Solution 1:

  • Group by all to bring everything into one bag, then count the item
scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT(productscans);
dump count;
  • COUNT requires a preceding GROUP ALL statement for global counts and a GROUP BY statement for group counts.

Solution 2:

  • We can use this below code to count function in pig
/* scans by title */

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
scancount       = FOREACH productscans GENERATE COUNT(productscans);
DUMP scancount;

Solution 3:

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT(productscans);
DUMP scancount;

Alternative way to solve:

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT($1);
DUMP scancount;

Related Searches to Could not infer COUNT function in pig