[Solved-1 Solution] Fine tuning PIG for local execution ?
Pig has two execution modes or exectypes:
- Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). Note that local mode does not support parallel mapper execution with Hadoop 0.20.x and 1.0.0. This is because the LocalJobRunner of these Hadoop versions is not thread-safe.
- Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce).
Is there any recommendations for fine tuning PIG for local execution ?
Pig's makes it clear that local operation is intended to be run single-threaded, taking different code paths for certain functions that would otherwise use distributed sort. As a result, optimizing for Pig's local mode seems like the wrong solution to the presented problem.
Have you considered running a local, "pseudo-distributed" cluster instead of investing in full cluster setup?
We can follow Hadoop's instructions for pseudo-distributed operation, then point Pig at localhost. This would have the desired result, at the expense of two-step startup and teardown.
You'll want to raise the number of default mappers and reducers to consume all cores available on your machine
mapred.tasktracker.reduce.tasks.maximum in your local copy of