pig tutorial - apache pig tutorial - Apache Pig COUNT() Function - pig latin - apache pig - pig hadoop



What is Count Function in Apache Pig ?

  • The COUNT() function used in Apache Pig is used to get the number of elements in a bag.
  • The COUNT() function ignores all the tuples which is having a NULL value in the first field while counting the number of tuples given in a bag
  • The COUNT() function returns the number of rows that matches a specified criteria.
  • The COUNT() function returns the number of values which is given in a set of values.
  • The COUNT() function counts the number of cells that contain numbers, and also counts the numbers which are written within the list of arguments.
 Apache Pig Count() Function

Learn Apache Pig - Apache Pig tutorial - Apache Pig Count() Function - Apache Pig examples - Apache Pig programs

Pig Operations - Aggregation

  • build-in functions e.g. AVG, COUNT, COUNT_STAR, MAX, MIN, SUM
  • possibility to implement custom UDFs
  • learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  - pig aggregation count function

    Syntax

    grunt> COUNT(expression)
    

    Example

    wikitechy_employee_details.txt

    001,Hansika,Reddy,21,9848022337,Hyderabad,89
    002,Aysha,Battacharya,22,9848022338,Kolkata,78 
    003,Swaminathan,Khanna,22,9848022339,Delhi,90 
    004,Preethi,Agarwal,21,9848022330,Pune,93 
    005,Sruti,Mohanthy,23,9848022336,Bhuwaneshwar,75 
    006,Karishma,Mishra,23,9848022335,Chennai,87 
    007,Kamala,Nayak,24,9848022334,trivendram,83 
    008,Krish,Nambiayar,24,9848022333,Chennai,72
    
    • We have loaded the into Pig with the relation name wikitechy_employee_details which is given below:
    grunt> employee_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_details.txt' USING PigStorage(',')
       as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray, gpa:int);
    

    Calculating the Number of Tuples

    • We can use the built-in function COUNT() to calculate the number of tuples which is given in a relation.
    • We need to group the relation wikitechy_employee_details using the Group All operator, and store the result in the relation name employee_group_all which is given below:
    grunt> employee_group_all = Group wikitechy_employee_details All;
    
    • It will produce a relation for calculating the number of tuples which is given below:
    • We can calculate number of tuples and records which is given in the relation.
    grunt> employee_count = foreach employee_group_all  Generate COUNT(wikitechy_employee_details.gpa);
    

    Verification

    grunt> Dump employee_count;
    

    Output

    8
    

    Related Searches to Apache Pig COUNT() Function