pig tutorial - apache pig tutorial - Apache Pig SUM() Function - pig latin - apache pig - pig hadoop

What is SUM() Function in Apache Pig ?

  • The SUM() function used in Apache Pig is used to get the total of the numeric values of a column in a single-column bag.
  • The SUM() function ignores the NULL values while computing the total
  • The SUM() Function will requires a preceding GROUP ALL statement for global sums and the GROUP BY statement for group sums.
  • The SUM() function can add individual values, cell references and the ranges.
  • The SUM() Function will adds all the numbers which are given in a column.
 learn pig tutorial - apache  pig sum function - pig example

learn pig tutorial - apache pig sum function - pig example


grunt> SUM(expression)



  • We have loaded the file into Pig with the relation name called employee_data which is given below:
grunt> employee_data = LOAD 'hdfs://localhost:9000/pig_data/ wikitechy_employee.txt' USING PigStorage(',')
   as (id:int, name:chararray, workdate:chararray, daily_typing_pages:int);

Calculating the Sum of All GPA

  • We need to group the relation name employee_data by using the Group All operator, and we need to store the result in the relation name employee_group which is given below:
grunt> employee_group = Group employee_data all;
  • It will produce a relation for calculating the sum of all gpa which is given below:
grunt> Dump employee_group;
  • Now, we will need to calculate the global sum of the pages which is typed daily.
grunt> employee_workpages_sum = foreach employee_group Generate(employee_data.name,employee_data.daily_typing_pages),SUM(employee_data.daily_typing_pages);


grunt> Dump employee_workpages_sum;


(({ (Sarah), (Sarah), (Phil) ,(John) , (John) , (Ramesh) , (Joseph) }, 
{ (350) , (300) , (220) ,(100) , (170)  ,  (220)  , (250)  }),1610)

Related Searches to Apache Pig SUM() Function