[Solved-1 Solution] Storing data to SequenceFile from Apache Pig ?



Sequence File :

  • SequenceFile is a flat file consisting of binary key/value pairs. It is extensively used in MapReduce as input/output formats. It is also worth noting that, internally, the temporary outputs of maps are stored using SequenceFile

Problem:

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader

REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD '/data/logs' USING SequenceFileLoader AS (...)

Is there also a library out there that would allow writing to Hadoop sequence files from Pig ?

Solution 1:

  • This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.
  • The "Hadoop expansion pack" Twitter open-sourced at github , includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same - we already have those for sequence files, obviously).

Related Searches to Storing data to SequenceFile from Apache Pig