pig tutorial - apache pig tutorial - Apache Pig - Foreach Operator - pig latin - apache pig - pig hadoop
What is Foreach Operator in Apache Pig ?
- FOREACH gives us a simple way to apply transformations which is done based on columns.
- The FOREACH operator of Apache pig is used to create unique function as per the column data which is available.
- FOREACH operator evaluates an expression for each possible combination of values of some iterator variables, and returns all the results
- FOREACH operator generates data transformations which is done based on columns of data.
- The FOREACH operator is used to generate specified data transformations which is done based on the column data.
Pig Operations - Projection

- Generate new relations by projecting data of a relation


- Let us execute the instruction and… it seems that nothing happens!
- We had some tracing output with LOAD, DUMP, and ILLUSTRATE…
- Operate on data in bags inside a relation and then project

Syntax:
grunt> Relation_name2 = FOREACH Relatin_name1 GENERATE (required data);
Example:
wikitechy_student_details.txt
001,Suresh,Reddy,21,9848022337,Hyderabad
002,Arvin,Battacharya,22,9848022338,Kolkata
003,AanchalKhanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Vanitha,Mohanthy,23,9848022336,Bhuwaneshwar
006,Sruti,Mishra,23,9848022335,Chennai
007,Kamal,Nayak,24,9848022334,trivendram
008,Barath,Nambiayar,24,9848022333,Chennai
- And we have loaded this file into Pig with the relation name wikitechy_student_details which is given below:
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_student_details.txt' USING PigStorage(',') as shown below.
as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray);
- We need to get the id, age, and city values of each student from the relation wikitechy_student_details and store it into another relation named foreach_data which is given below:
grunt> foreach_data = FOREACH wikitechy_student_details GENERATE id,age,city;
Verification:
grunt> Dump foreach_data;
Output:
(1,21,Hyderabad)
(2,22,Kolkata)
(3,22,Delhi)
(4,21,Pune)
(5,23,Bhuwaneshwar)
(6,23,Chennai)
(7,24,trivendram)
(8,24,Chennai)
FOREACH ... GENERATE
