[Solved-3 Solutions] A way to export the results from Pig to a database ?

What is pig ?

Apache Pig is a high level data flow platform for execution Map Reduce programs of Hadoop. The language for Pig is pig Latin.
The Pig scripts get internally converted to Map Reduce jobs and get executed on data stored in HDFS. Every task which can be achieved using PIG can also be achieved using java used in Map reduce.

Why not reducers ?

There are two simple reasons why using too many mappers or reducers should be avoided. First, it can inhibit your MapReduce job’s performance.
It’s a myth that the more mappers and reducers we use, the faster our jobs will run-the fact is, each MapReduce task carries certain overhead, and the communication and data movement between mappers and reducers take resources and time.
Thus, tuning your job so that workload is evenly distributed across reducers, with as little skew as possible, is much more effective than blindly increasing the number of mappers or reducers.

Furthermore, the number of MapReduce tasks that can run simultaneously on each machine is limited.
Given these facts, using more mappers or reducers than you actually need will slow your job down rather than speed it up.

Problem :

Is there a way to export the results from Pig directly to a database like mysql ?

Solution 1:

The main problem we see is that each reducer is effectively going to insert into the database around the same time.
So we can write a custom Storage method that uses JDBC which helps to insert into the data directly.

Solution 2:

data = LOAD '...' AS (...);
...
STORE data INTO DBStorage('com.mysql.jdbc.Driver', 'dbc:mysql://host/db', 'INSERT ...');

Solution 3:

Sqoop is one of the good way to use, but it is difficult to set-up (IMHO) as all these Hadoop related projects.
Pig's DBStorage is working fine .

Register the PiggyBank and MySQL driver:

-- Register Piggy bank
REGISTER /opt/cmr/pig/pig-0.10.0/lib/piggybank.jar;

-- Register MySQL driver
REGISTER /opt/cmr/mysql/drivers/mysql-connector-java-5.1.15-bin.jar

Here is a sample solution:

-- Store a relation into a SQL table
STORE relation INTO 'unused' USING org.apache.pig.piggybank.storage.DBStorage('com.mysql.jdbc.Driver', 'jdbc:mysql://<mysqlserver>/<database>', '<login>', '<password>', 'REPLACE INTO <table> (<column1>, <column2>) VALUES (?, ?)');

Apache Pig Basics

Apache Pig - Filtering

Apache Pig - Operators

Apache Pig - Functions

Eval Functions

Bag-Tuple Functions

DateTime Function

User Defined Function

Load-store Function

Math-function

Apache Pig- Regex

Apache Pig - Running Scripts