pig tutorial - apache pig tutorial - Apache Pig - Bin Storage - pig latin - apache pig - pig hadoop



What is BinStorage() in Apache Pig ?

  • The BinStorage() function is used to load and store the data into Pig using machine readable format.
  • BinStorge() in Pig is generally used to store temporary data generated between the MapReduce jobs.
  • It supports multiple locations as input.
  • Pig uses BinStorage to load and store the temporary data that is generated between multiple MapReduce.
  • BinStorage works with data that is represented on disk in machine-readable format. BinStorage does NOT support compression.
  • BinStorage supports multiple locations (files, directories, globs) as input.

Syntax

grunt> BinStorage();

Example

  • Ensure that we have a file named wikitechy_employee_data.txt in the HDFS directory /pig_data/ as shown below.

wikitechy_employee_data.txt

111,Anu,Shankar,23,9876543210,Chennai
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
114,Preethi,Antony,21,9876543213,Pune
115,Raj,Gopal,21,9876543214,Hyderabad
116,Yashika,Kannan,22,9876543215,Delhi
117,siddu,Narayanan,22,9876543216,Kolkata
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
  • Now let us load this data into Pig into a relation as given below
grunt> employee_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_data.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, age:int, city:chararray);
  • You can store this relation into the HDFS directory named /pig_data/ using the BinStorage() function.
grunt> STORE employee_details INTO 'hdfs://localhost:9000/pig_Output/mydata' USING BinStorage();
  • After executing the above statement, the relation is stored in the given HDFS directory.
  • You can see it using the HDFS ls command as given below.
$ hdfs dfs -ls hdfs://localhost:9000/pig_Output/mydata/
  
Found 2 items 
-rw-r--r--   1 Hadoop supergroup       0 2017-10-26 16:58
hdfs://localhost:9000/pig_Output/mydata/_SUCCESS

-rw-r--r--   1 Hadoop supergroup        372 2017-10-26 16:58
hdfs://localhost:9000/pig_Output/mydata/part-m-00000
  • Currently, load the data from the file part-m-00000.
grunt> result = LOAD 'hdfs://localhost:9000/pig_Output/b/part-m-00000' USING BinStorage();
  • Now verify the contents of the relation as given below,
grunt> Dump result; 
(111,Anu_Shankar,23,Chennai)
(112,Barvathi_Nambiayar,24,Chennai)
(113,Kajal_Nayak,24,Trivendram)
(114,Preethi_Antony,21,Pune)
(115,Raj_Gopal,21,Hyderabad)
(116,Yashika_Kannan,22,Delhi)
(117,siddu_Narayanan,22,Kolkata)
(118,Timple_Mohanthy,23,Bhuwaneshwar)

Related Searches to Apache Pig - BinStorage