pig tutorial - apache pig tutorial - Apache Pig - Handling Compression - pig latin - apache pig - pig hadoop




How to Handling Compression in Apache Pig ?

  • PigStorage and TextLoader support gzip and bzip compression for both read (load) and write (store).
  • BinStorage does not support compression.
  • To work with gzip compressed files, input/output files need to have a .gz extension.
  • Gzipped files cannot be split across multiple maps; this means that the number of maps created is equal to the number of part files in the input location.

Example

  • Ensure that we have a file named wikitechy_emp.txt.zip in the HDFS directory /pigdata/.
  • Next, we can load the compressed file into pig as given below.
Using PigStorage: 
 
grunt> data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt.zip' USING PigStorage(','); 
 
Using TextLoader:
  
grunt> data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt.zip' USING TextLoader;
  • In the similar way, you can store the compressed files into pig as given below.
Using PigStorage:
grunt> store data INTO 'hdfs://localhost:9000/pig_Output/data.bz' USING PigStorage(' ,');

Related Searches to Apache Pig - Handling Compression

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

×