pig tutorial - apache pig tutorial - Apache Pig Storing Data - pig latin - apache pig - pig hadoop

You can store the loaded data in the file system using the store operator.
A data store is a repository for persistently storing and managing collections of data which include not just repositories like databases, but also simpler store types such as simple files, emails etc.
Thus, any database or file is a series of bytes that, once stored, is called a data store.

STORE instruction:

Stores the relation into the local FS or HDFS (usually!)
Useful for debugging

learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig store operation

Syntax of the Store statement

STORE Relation_name INTO ' required_directory_path ' [USING function];

Example:

Assume we have a file student_data.txt in HDFS with the following content.

001, Aadhira,Arushi  ,9848022337, Delhi
002, Mahi,Champa,9848022338, Chennai
003, Avantika,charu,9848022339, Pune
004, Samaira,Hansa,9848022330, Kolkata
005, Abhinav,Akaash,9848022336,Bhuwaneshwar
006, Amarjeet,Aksat,9848022335, Hyderabad

And we have read it into a relation student using the LOAD operator as shown below.

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' 
   USING PigStorage(',')
   as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, 
   city:chararray );

Now, let us store the relation in the HDFS directory “/pig-Output/” as shown below.

grunt> STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');

Output:

After executing the store statement, you will get the following output.
A directory is created with the specified name and the data will be stored in it.

2015-10-05 13:05:05,429 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
MapReduceLau ncher - 100% complete
2015-10-05 13:05:05,429 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - 
Script Statistics:
   
HadoopVersion    PigVersion    UserId    StartedAt             FinishedAt             Features 
2.6.0            0.15.0        Hadoop    2015-10-0 13:03:03    2015-10-05 13:05:05    UNKNOWN  
Success!  
Job Stats (time in seconds): 
JobId          Maps    Reduces    MaxMapTime    MinMapTime    AvgMapTime    MedianMapTime    
job_14459_06    1        0           n/a           n/a           n/a           n/a
MaxReduceTime    MinReduceTime    AvgReduceTime    MedianReducetime    Alias    Feature   
     0                 0                0                0             student  MAP_ONLY 
OutPut folder
hdfs://localhost:9000/pig-Output/ 
 
Input(s): Successfully read 0 records from: "hdfs://localhost:9000/pig_data/student_data.txt"  
Output(s): Successfully stored 0 records in: "hdfs://localhost:9000/pig_Output"  
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0
Total records proactively spilled: 0
  
Job DAG: job_1443519499159_0006
  
2015-10-05 13:06:06,192 [main] INFO  org.apache.pig.backend.hadoop.executionengine
.mapReduceLayer.MapReduceLau ncher - Success!

Verification:

Verify the stored data as shown below.

Step 1:

First of all, list out the files in the directory named pig_output using the ls command as shown below.

hdfs dfs -ls 'hdfs://localhost:9000/pig_Output/'
Found 2 items
rw-r--r-   1 Hadoop supergroup          0 2015-10-05 13:03 hdfs://localhost:9000/pig_Output/_SUCCESS
rw-r--r-   1 Hadoop supergroup        224 2015-10-05 13:03 hdfs://localhost:9000/pig_Output/part-m-00000

You can observe that two files were created after executing the store statement.

Step 2:

Using cat command, list the contents of the file named part-m-00000 as shown below.

$ hdfs dfs -cat 'hdfs://localhost:9000/pig_Output/part-m-00000' 
001, Aadhira,Arushi  ,9848022337, Delhi
002, Mahi,Champa,9848022338, Chennai
003, Avantika,charu,9848022339, Pune
004, Samaira,Hansa,9848022330, Kolkata
005, Abhinav,Akaash,9848022336,Bhuwaneshwar
006, Amarjeet,Aksat,9848022335, Hyderabad

Using PigStorage:

Delimiter
Limitations

PigStorage is a built-in function of Pig, and one of the most common functions used to load and store data in pigscripts.
PigStorage can be used to parse text data with an arbitrary delimiter, or to output data in an delimited format.

Delimiter:

If no argument is provided, PigStorage will assume tab-delimited format.
If a delimiter argument is provided, it must be a single-byte character; any literal (eg: 'a', '|'), known escape character (eg: '\t', '\r') is a valid delimiter.

Example:

data = LOAD 's3n://input-bucket/input-folder' USING PigStorage(' ')
            AS (field0:chararray, field1:int);

The schema must be provided in the AS clause.
To store data using PigStorage, the same delimiter rules apply:

STORE data INTO 's3n://output-bucket/output-folder' USING PigStorage('\t');

Limitations:

PigStorage is an extremely simple loader that does not handle special cases such as embedded delimiters or escaped control characters; it will split on every instance of the delimiter regardless of context.
For this reason, when loading a CSV file it is recommended to use CSVExcelStorage <http://help.mortardata.com/integrations/amazon_s3/csv> rather than PigStorage with a comma delimiter.

pig tutorial - apache pig tutorial - Apache Pig Storing Data - pig latin - apache pig - pig hadoop

What is data storing?

Syntax of the Store statement

Example:

Output:

Verification:

Using PigStorage:

Delimiter:

Example:

Limitations:

Related Searches to Apache Pig Storing Data

Wikitechy

Workshop

Join our Community

Other Languages

pig tutorial - apache pig tutorial - Apache Pig Storing Data - pig latin - apache pig - pig hadoop

What is data storing?

Syntax of the Store statement

Example:

Output:

Verification:

Using PigStorage:

Delimiter:

Example:

Limitations:

Related Searches to Apache Pig Storing Data

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages