[Solved-1 Solution] Pig programming to use split on group by having count(*) ?



What is group by ?

  • The GROUP by operator is used to group the data in one or more relations. It collects the data having the same key.

What is count ?

  • The COUNT() function of Pig Latin is used to get the number of elements in a bag. While counting the number of tuples in a bag, the COUNT() function ignores (will not count) the tuples having a NULL value in the FIRST FIELD.

Problem :

How to use split on group by operator in pig ?

Solution 1:

We can Group by item, get the count and then use filter on the count

A = LOAD 'location_of_file' using PigStorage('\t') as (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
B = GROUP A BY item;
C = FOREACH B GENERATE group,COUNT(A.item) AS Total;
D = FILTER C BY Total > 3;
E = JOIN A BY item,D BY $0;
F = FOREACH E GENERATE $0..$4;
DUMP F;

Related Searches to Pig programming to use split on group by having count(*)