pig tutorial - apache pig tutorial - Apache Pig - Bin Storage - pig latin - apache pig - pig hadoop
What is BinStorage() in Apache Pig ?
- The BinStorage() function is used to load and store the data into Pig using machine readable format.
- BinStorge() in Pig is generally used to store temporary data generated between the MapReduce jobs.
- It supports multiple locations as input.
- Pig uses BinStorage to load and store the temporary data that is generated between multiple MapReduce.
- BinStorage works with data that is represented on disk in machine-readable format. BinStorage does NOT support compression.
- BinStorage supports multiple locations (files, directories, globs) as input.
- Ensure that we have a file named wikitechy_employee_data.txt in the HDFS directory /pig_data/ as shown below.
- Now let us load this data into Pig into a relation as given below
- You can store this relation into the HDFS directory named /pig_data/ using the BinStorage() function.
- After executing the above statement, the relation is stored in the given HDFS directory.
- You can see it using the HDFS ls command as given below.
- Currently, load the data from the file part-m-00000.
- Now verify the contents of the relation as given below,