[Solved-1 Solution] Storing data to SequenceFile from Apache Pig ?

Sequence File :

SequenceFile is a flat file consisting of binary key/value pairs. It is extensively used in MapReduce as input/output formats. It is also worth noting that, internally, the temporary outputs of maps are stored using SequenceFile

Problem:

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader

REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD '/data/logs' USING SequenceFileLoader AS (...)

Is there also a library out there that would allow writing to Hadoop sequence files from Pig ?

Solution 1:

This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.
The "Hadoop expansion pack" Twitter open-sourced at github , includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same - we already have those for sequence files, obviously).

Apache Pig Basics

Apache Pig - Filtering

Apache Pig - Operators

Apache Pig - Functions

Eval Functions

Bag-Tuple Functions

DateTime Function

User Defined Function

Load-store Function

Math-function

Apache Pig- Regex

Apache Pig - Running Scripts