What is Eval Functions in Apache Pig ?

  • Eval functions extend the Java class ‘org.apache.pig.EvalFunc and it is parameterized with the return type of the UDF which is a Java String which is a part of java class
  • Eval Functions will take one record and returns one result, which will be invoked for every record that will pass through the execution pipeline.
  • Eval Function takes the tuple which contains the fields for the script which is passed to the UDF (User Defined Function) as an input.
  • Eval function is invoked on every input tuple and the input is done into the function which is a tuple with input parameters where they are passed to the function which is done in the Pig script.
An example which is given below done to explain the Eval Functions


package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class UPPER extends EvalFunc<String>
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
String str = (String)input.get(0);
return str.toUpperCase();
}catch(Exception e){
throw new IOException("Caught exception processing input row ", e);

The table which is given below give us the eval functions and their description

Function Syntax Description
AVGAVG(expressionComputes the average of the numeric values in a single-column bag.
CONCATCONCAT (expression, expression)Concatenates two expressions of identical type.
COUNTCOUNT(expression)Computes the number of elements in a bag, it ignores null.
COUNT_STARCOUNT_STAR(expression)Computes the number of elements in a bag, it includes null.
DIFFDIFF (expression, expression)Compares two fields in a tuple, any tuples that are in one bag but not the other are returned in a bag.
IsEmptyIsEmpty(expression)Checks if a bag or map is empty
MAXMAX(expression)Computes the maximum of the numeric values or chararrays in a single-column bag
MINMIN(expression)Computes the minimum of the numeric values or chararrays in a single-column bag.
SIZESIZE(expression)Computes the number of elements based on any Pig data type. SIZE includes NULL values in the size computation
SUMSUM(expression)Computes the sum of the numeric values in a single-column bag.
TOKENIZETOKENIZE(expression [, ‘field_delimiter’]) Splits a string and outputs a bag of words.

