pig tutorial - apache pig tutorial - Apache Pig Distinct Operator - pig latin - apache pig - pig hadoop

The DISTINCT Operator is used to remove duplicated records and it works only on entire records, which does not work on individual fields.
The DISTINCT operators which are used in a SELECT statement filter the result set to remove duplicates
We can use DISTINCT operator in combination with an aggregation function, which is typically COUNT ().
The distinct operator is used to get the unique values by removing duplicates.
The DISTINCT operator is used to remove redundant tuples from a relation.

Pig Operations - Deduplication

learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig deduplication distinct

works only on entire records, not on individual fields

forces a reduce phase, but optimizes by using the combiner

DISTINCT instruction:

Only preserves unique tuples

Syntax

grunt> Relation_name2 = DISTINCT Relatin_name1;

Example:

wikitechy_student_details.txt

001,Sabrina,Reddy,9848022337,Hyderabad
002,Arvin,Battacharya,9848022338,Kolkata 
002,Arvin,Battacharya,9848022338,Kolkata 
003,Arun,Khanna,9848022339,Delhi 
003,Arun,Khanna,9848022339,Delhi 
004,Preethi,Agarwal,9848022330,Pune 
005,Sruti,Mohanthy,9848022336,Bhuwaneshwar
006,Vanitha,Mishra,9848022335,Chennai 
006,Vanitha,Mishra,9848022335,Chennai

And we have loaded this file into Pig with the relation name wikitechy_student_details which is given below:

grunt> wikitechy_student_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_student_details.txt' USING PigStorage(',') 
   as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);

We remove the redundant tuples from the relation which is name wikitechy_student_details using the DISTINCT operator, and store it as another relation which is called distinct_data which is given below:

We remove the redundant tuples from the relation which is name wikitechy_student_details using the DISTINCT operator, and store it as another relation which is called distinct_data which is given below:

grunt> distinct_data = DISTINCT wikitechy_student_details;

Verification

grunt> Dump distinct_data;

Output:

 (1,Sabrina,Reddy,9848022337,Hyderabad)
(2,Arvin,Battacharya,9848022338,Kolkata) 
(3,Arun,Khanna,9848022339,Delhi) 
(4,Preethi,Agarwal,9848022330,Pune) 
(5,Sruti,Mohanthy,9848022336,Bhuwaneshwar)
(6,Vanitha,Mishra,9848022335,Chennai)

Related Searches to Apache Pig Distinct Operator

apache pig commandspig operatorspig hadoop examplesthe pig grouppig distinctpig examples hadoophadoop pig script exampleapache pig exampleapache pig script examplespig count distinctpig distinct on single columnpig distinct columnpig distinct rowsgroup by in pigpig order bypig group by countflatten operator in pigpig tutorial apache pig tutorial hadoop pig tutorial pig latin tutorial learn pig pig hadoop pig tutorial point learn pig latin pig big data pig latin hadoop apache pig pig latin pig commands pig hive pig interview questions hadoop pig hive pig script how to learn pig latin pig and hive pig language pig tutorial pdf apache pig tutorial pdf hadoop pig examples pig store pig programming apache pig download pig data pig script example pig group pig storage pig in latin pig order what is apache pig how to read pig latin pig flatten pigstorage flatten in pig pig latin examples pig mapreduce apache pig commands pig commands pdf pig examples pig load pig code guide pig pig jobs store command in pig tutorial peppa pig peppa pig tutorial simple pig how to write in pig latin datapig pig latin program uses of pig

pig tutorial - apache pig tutorial - Apache Pig Distinct Operator - pig latin - apache pig - pig hadoop

What is Distinct Operator in Apache Pig ?

Pig Operations - Deduplication

Syntax

Example:

Verification

Output:

Related Searches to Apache Pig Distinct Operator

Wikitechy

Workshop

Join our Community

Other Languages

pig tutorial - apache pig tutorial - Apache Pig Distinct Operator - pig latin - apache pig - pig hadoop

What is Distinct Operator in Apache Pig ?

Pig Operations - Deduplication

Syntax

Example:

Verification

Output:

Related Searches to Apache Pig Distinct Operator

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages