pig tutorial - apache pig tutorial - Apache Pig - Bin Storage - pig latin - apache pig - pig hadoop




What is BinStorage() in Apache Pig ?

  • The BinStorage() function is used to load and store the data into Pig using machine readable format.
  • BinStorge() in Pig is generally used to store temporary data generated between the MapReduce jobs.
  • It supports multiple locations as input.
  • Pig uses BinStorage to load and store the temporary data that is generated between multiple MapReduce.
  • BinStorage works with data that is represented on disk in machine-readable format. BinStorage does NOT support compression.
  • BinStorage supports multiple locations (files, directories, globs) as input.

Syntax

grunt> BinStorage();

Example

  • Ensure that we have a file named wikitechy_employee_data.txt in the HDFS directory /pig_data/ as shown below.

wikitechy_employee_data.txt

111,Anu,Shankar,23,9876543210,Chennai
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
114,Preethi,Antony,21,9876543213,Pune
115,Raj,Gopal,21,9876543214,Hyderabad
116,Yashika,Kannan,22,9876543215,Delhi
117,siddu,Narayanan,22,9876543216,Kolkata
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
  • Now let us load this data into Pig into a relation as given below
grunt> employee_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_data.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, age:int, city:chararray);
  • You can store this relation into the HDFS directory named /pig_data/ using the BinStorage() function.
grunt> STORE employee_details INTO 'hdfs://localhost:9000/pig_Output/mydata' USING BinStorage();
  • After executing the above statement, the relation is stored in the given HDFS directory.
  • You can see it using the HDFS ls command as given below.
$ hdfs dfs -ls hdfs://localhost:9000/pig_Output/mydata/
  
Found 2 items 
-rw-r--r--   1 Hadoop supergroup       0 2017-10-26 16:58
hdfs://localhost:9000/pig_Output/mydata/_SUCCESS

-rw-r--r--   1 Hadoop supergroup        372 2017-10-26 16:58
hdfs://localhost:9000/pig_Output/mydata/part-m-00000
  • Currently, load the data from the file part-m-00000.
grunt> result = LOAD 'hdfs://localhost:9000/pig_Output/b/part-m-00000' USING BinStorage();
  • Now verify the contents of the relation as given below,
grunt> Dump result; 
(111,Anu_Shankar,23,Chennai)
(112,Barvathi_Nambiayar,24,Chennai)
(113,Kajal_Nayak,24,Trivendram)
(114,Preethi_Antony,21,Pune)
(115,Raj_Gopal,21,Hyderabad)
(116,Yashika_Kannan,22,Delhi)
(117,siddu_Narayanan,22,Kolkata)
(118,Timple_Mohanthy,23,Bhuwaneshwar)

Related Searches to Apache Pig - BinStorage

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

×