[Solved-1 Solution] Configuring pig relation with Hadoop ?



Problem :

How to configuring pig relation with Hadoop ?

Solution:

Follow the below steps for configuring pig:

  • 1. Extract the tart file
> tar -xzf pig-x.y.z.tar.gz
  • 2. Add pig bin directory to the PATH variable
> export PIG_INSTALL=/home/pig/pig-x.y.z
> export PATH=$PATH:$PIG_INSTALL/bin
  • 3. We also need to configure the JAVA_HOME variable.

Pig can run in two modes:

  • 1. Local mode.
    • In this mode Hadoop cluster is not used at all. All processes run in single JVM and files are read from the local filesystem. To run Pig in local mode, use the command:
pig -x local 
  • 2. MapReduce Mode.
    • In this mode Pig converts scripts to MapReduce jobs and run them on Hadoop cluster. It is the default mode.
    • Cluster can be local or remote. Pig uses the HADOOP_MAPRED_HOME environment variable to find Hadoop installation on local machine
    • If we want to connect to remote cluster, its necessary to specify cluster parameters in the pig.properties file.

Example for MRv1

We can also specify remote cluster address at the command line:

pig -fs namenode_address:8020 -jt jobtracker_address:8021
  • Hence, we can install Pig to any machine and connect to remote cluster. Pig includes Hadoop client, there is no need toinstall Hadoop to use Pig.

Related Searches to Configuring pig relation with Hadoop