pig tutorial - apache pig tutorial - Apache Pig - Pig Storage() - pig latin - apache pig - pig hadoop




What is Pig Storage() in Apache Pig ?

  • The PigStorage() function loads and stores data as structured text files.
  • It takes a delimiter using which each entity of a tuple is separated as a parameter.
  • By default, it takes ‘\t’ as a parameter.

Syntax

grunt> PigStorage(field_delimiter)

Example

  • Let us suppose we have a file named wikitechy_employee_data.txt in the HDFS directory named /data/ with the following content.
111,Anu,Shankar,23,9876543210,Chennai
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
114,Preethi,Antony,21,9876543213,Pune
115,Raj,Gopal,21,9876543214,Hyderabad
116,Yashika,Kannan,22,9876543215,Delhi
117,siddu,Narayanan,22,9876543216,Kolkata
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
  • We can load the data using the PigStorage function as given below.
grunt> employee = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_data.txt' USING PigStorage(',')
   as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
  • In the above example, we have seen that we have used comma (‘,’) delimiter.
  • Therefore, we have separated the values of a record using (,).
  • In the similar way, we can use the PigStorage() function to store the data into HDFS directory as given below.
grunt> STORE employee INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');
  • This will store the data into the given directory. You can verify the data as given below.

Verification

  • First of all, list out the files in the directory named pig_output using ls command as given below.
$ hdfs dfs -ls 'hdfs://localhost:9000/pig_Output/'
 
Found 2 items 
rw-r--r- 1 Hadoop supergroup 0 2017-10-05 13:03 hdfs://localhost:9000/pig_Output/_SUCCESS
 
rw-r--r- 1 Hadoop supergroup 224 2017-10-05 13:03 hdfs://localhost:9000/pig_Output/part-m-00000
  • We can perceive that two files were created after executing the Store statement.
  • Then, using the cat command, list the contents of the file named part-m-00000 as given below.
$ hdfs dfs -cat 'hdfs://localhost:9000/pig_Output/part-m-00000'
111,Anu,Shankar,9876543210,Chennai
112,Barvathi,Nambiayar,9876543211,Chennai
113,Kajal,Nayak,9876543212,Trivendram
114,Preethi,Antony,9876543213,Pune
115,Raj,Gopal,9876543214,Hyderabad
116,Yashika,Kannan,9876543215,Delhi
117,siddu,Narayanan,9876543216,Kolkata
118,Timple,Mohanthy,9876543217,Bhuwaneshwar

Related Searches to Apache Pig - Pig Storage()

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

×