pig tutorial - apache pig tutorial - Apache Pig - Pig Storage() - pig latin - apache pig - pig hadoop



What is Pig Storage() in Apache Pig ?

  • The PigStorage() function loads and stores data as structured text files.
  • It takes a delimiter using which each entity of a tuple is separated as a parameter.
  • By default, it takes ‘\t’ as a parameter.

Syntax

grunt> PigStorage(field_delimiter)

Example

  • Let us suppose we have a file named wikitechy_employee_data.txt in the HDFS directory named /data/ with the following content.
111,Anu,Shankar,23,9876543210,Chennai
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
114,Preethi,Antony,21,9876543213,Pune
115,Raj,Gopal,21,9876543214,Hyderabad
116,Yashika,Kannan,22,9876543215,Delhi
117,siddu,Narayanan,22,9876543216,Kolkata
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
  • We can load the data using the PigStorage function as given below.
grunt> employee = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_data.txt' USING PigStorage(',')
   as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
  • In the above example, we have seen that we have used comma (‘,’) delimiter.
  • Therefore, we have separated the values of a record using (,).
  • In the similar way, we can use the PigStorage() function to store the data into HDFS directory as given below.
grunt> STORE employee INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');
  • This will store the data into the given directory. You can verify the data as given below.

Verification

  • First of all, list out the files in the directory named pig_output using ls command as given below.
$ hdfs dfs -ls 'hdfs://localhost:9000/pig_Output/'
 
Found 2 items 
rw-r--r- 1 Hadoop supergroup 0 2017-10-05 13:03 hdfs://localhost:9000/pig_Output/_SUCCESS
 
rw-r--r- 1 Hadoop supergroup 224 2017-10-05 13:03 hdfs://localhost:9000/pig_Output/part-m-00000
  • We can perceive that two files were created after executing the Store statement.
  • Then, using the cat command, list the contents of the file named part-m-00000 as given below.
$ hdfs dfs -cat 'hdfs://localhost:9000/pig_Output/part-m-00000'
111,Anu,Shankar,9876543210,Chennai
112,Barvathi,Nambiayar,9876543211,Chennai
113,Kajal,Nayak,9876543212,Trivendram
114,Preethi,Antony,9876543213,Pune
115,Raj,Gopal,9876543214,Hyderabad
116,Yashika,Kannan,9876543215,Delhi
117,siddu,Narayanan,9876543216,Kolkata
118,Timple,Mohanthy,9876543217,Bhuwaneshwar

Related Searches to Apache Pig - Pig Storage()