[Solved-2 Solutions] Pig: Hadoop jobs Fail ?

What is hadoop ?

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment

we have a pig script that queries data from a csv file.

The script has been tested locally with small and large .csv files.

In Small Cluster: It starts with processing the scripts, and fails after completing 40% of the call

The error is,

Failed to read data from "path to file"

An answer for the General Problem would be changing the errors levels in the Configuration Files, adding these two lines to mapred-site.xml

log4j.logger.org.apache.hadoop = error,A 
log4j.logger.org.apache.pig= error,A

It as a kind of an OutOfMemory Exception

To change the memory in Hadoop change the hadoop-env.sh file

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}"

For Apache PIG we have this in the header of pig bash file:

# PIG_HEAPSIZE The maximum amount of heap to use, in MB.
# Default is 1000.

So we can use export

$ export PIG_HEAPSIZE=4096MB

Eval Functions

Math-function