pig tutorial - apache pig tutorial - Apache Pig COUNTSTAR() Function - pig latin - apache pig - pig hadoop



What is COUNT_STAR() function in Apache Pig ?

  • The COUNT_STAR() function used in Apache Pig is similar to the COUNT() function.
  • While counting the elements, the COUNT_STAR() function includes the NULL values while counting the elements.
  • COUNT_STAR function is used to compute the number of elements in a bag.
  • COUNT_STAR function requires a preceding GROUP ALL statement for global counts and also a GROUP BY statement for group counts.
  • COUNT_STAR will generate the count of the values of the first field which is given in a tuple.

Syntax

grunt> COUNT_STAR(expression)

Example

wikitechy_employee_details.txt

001,Anushka,Reddy,21,9848022337,Hyderabad,89 
002,Arvin,Battacharya,22,9848022338,Kolkata,78 
003,Arun,Khanna,22,9848022339,Delhi,90 
004,Preethi,Agarwal,21,9848022330,Pune,93 
005,Sruti,Mohanthy,23,9848022336,Bhuwaneshwar,75 
006,Amit,Mishra,23,9848022335,Chennai,87 
007,Komala,Nayak,24,9848022334,trivendram,83 
008,Bharath,Nambiayar,24,9848022333,Chennai,72
  • We have loaded the file into Pig with the relation which is called wikitechy_employee_details which is given below
grunt> employee_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray, gpa:int);

Calculating the Number of Tuples

  • We need to group the relation wikitechy_employee_details by using the Group All operator, and also we need to store the result in the relation name employee_group_all which is given below:
grunt> employee_group_all = Group wikitechy_employee_details All;
  • It will produce a relation which is using DUMP employee which is given below:
grunt> Dump employee_group_all;  

(all,{(8,Bharath,Nambiayar,24,9848022333,Chennai,72),
(7,Komala,Nayak,24,9848022 334,trivendram,83),
(6,Amit,Mishra,23,9848022335,Chennai,87),
(5,Sruti,Mohan thy,23,9848022336,Bhuwaneshwar,75),
(4,Preethi,Agarwal,21,9848022330,Pune,93),
(3 ,Arun,Khanna,22,9848022339,Delhi,90),
(2,Arvin,Battacharya,22,9848022338,Ko lkata,78),
(1,Anushka,Reddy,21,9848022337,Hyderabad,89),}) 
  • We need to calculate the number of tuples and records which is given in the relation.
grunt> employee_count = foreach employee_group_all  Generate COUNT_STAR(wikitechy_employee_details.gpa);

Verification

grunt> Dump employee_count;

Output

9

Related Searches to Apache Pig COUNTSTAR() Function