pig tutorial - apache pig tutorial - Apache Pig SUM() Function - pig latin - apache pig - pig hadoop



What is SUM() Function in Apache Pig ?

  • The SUM() function used in Apache Pig is used to get the total of the numeric values of a column in a single-column bag.
  • The SUM() function ignores the NULL values while computing the total
  • The SUM() Function will requires a preceding GROUP ALL statement for global sums and the GROUP BY statement for group sums.
  • The SUM() function can add individual values, cell references and the ranges.
  • The SUM() Function will adds all the numbers which are given in a column.
 learn pig tutorial - apache  pig sum function - pig example

learn pig tutorial - apache pig sum function - pig example

Syntax

grunt> SUM(expression)

Example

wikitechy_employee.txt

1,Joseph,2007-01-24,250  
2,Ramesh,2007-05-27,220  
3,John,2007-05-06,170  
3,John,2007-04-06,100 
4,Phil,2007-04-06,220 
5,Sarah,2007-06-06,300
5,Sarah,2007-02-06,350
  • We have loaded the file into Pig with the relation name called employee_data which is given below:
grunt> employee_data = LOAD 'hdfs://localhost:9000/pig_data/ wikitechy_employee.txt' USING PigStorage(',')
   as (id:int, name:chararray, workdate:chararray, daily_typing_pages:int);

Calculating the Sum of All GPA

  • We need to group the relation name employee_data by using the Group All operator, and we need to store the result in the relation name employee_group which is given below:
grunt> employee_group = Group employee_data all;
  • It will produce a relation for calculating the sum of all gpa which is given below:
grunt> Dump employee_group;
(all,{(5,Sarah,2007-02-06,350),
(5,Sarah,2007-06-06,300),
(4,Phil,2007-0406,220),
(3,John,2007-04-06,100),
(3,John,2007-05-06,170),
(2,Ramesh,2007-0527,220),
(1,Joseph,2007-01-24,250)})
  • Now, we will need to calculate the global sum of the pages which is typed daily.
grunt> employee_workpages_sum = foreach employee_group Generate(employee_data.name,employee_data.daily_typing_pages),SUM(employee_data.daily_typing_pages);

Verification

grunt> Dump employee_workpages_sum;

Output

(({ (Sarah), (Sarah), (Phil) ,(John) , (John) , (Ramesh) , (Joseph) }, 
{ (350) , (300) , (220) ,(100) , (170)  ,  (220)  , (250)  }),1610)

Related Searches to Apache Pig SUM() Function