pig tutorial - apache pig tutorial - Pig Latin Word Count ? - pig latin - apache pig - pig hadoop

Word Count in Pig Latin

Assume we have data in the file like below.

Here we want to generate output for count of each word like below

(a,2)
(is,2)
(This,1)
(class,1)
(hadoop,2)
(bigdata,1)
(technology,1)

lets see in steps how to generate the same using Pig latin.

1.Load the data from HDFS

We can use Load statement to load the data into a relation As keyword used to declare column names, as we dont have any columns, we declared only one column named line.

input = LOAD '/path/to/file/' AS(line:Chararray);

2. Convert the Sentence into words.

The data we have is in sentences. So we have to convert that data into words using

TOKENIZE Function.

(TOKENIZE(line));

(or)
If we have any delimeter like space we can specify as

(TOKENIZE(line,' '));

Output will be like this:

({(This),(is),(a),(hadoop),(class)})
({(hadoop),(is),(a),(bigdata),(technology)})

but we have to convert it into multiple rows like below

(This)
(is)
(a)
(hadoop)
(class)
(hadoop)
(is)
(a)
(bigdata)
(technology)

3.Convert Column into Rows

we have to convert every line of data into multiple rows ,for this we have function called FLATTEN in pig.
Using FLATTEN function the bag is converted into tuple, means the array of strings converted into multiple rows.

Words = FOREACH input GENERATE FLATTEN(TOKENIZE(line,' ')) AS word;

OUTPUT

(This)
(is)
(a)
(hadoop)
(class)
(hadoop)
(is)
(a)
(bigdata)
(technology)

4. Apply GROUP BY

We have to count each word occurance, for that we have to group all the words.

Grouped = GROUP words BY word;

5. Generate word count

wordcount = FOREACH Grouped GENERATE group, COUNT(words);

We can print the word count on console using Dump.
DUMP wordcount;

Output :

(a,2)
(is,2)
(This,1)
(class,1)
(hadoop,2)
(bigdata,1)
(technology,1)

pig tutorial - apache pig tutorial - Pig Latin Word Count ? - pig latin - apache pig - pig hadoop

Word Count in Pig Latin

1.Load the data from HDFS

2. Convert the Sentence into words.

3.Convert Column into Rows

OUTPUT

4. Apply GROUP BY

5. Generate word count

Output :

Related Searches to Pig Latin Word Count ?

Wikitechy

Workshop

Join our Community

Other Languages

pig tutorial - apache pig tutorial - Pig Latin Word Count ? - pig latin - apache pig - pig hadoop

Word Count in Pig Latin

1.Load the data from HDFS

2. Convert the Sentence into words.

3.Convert Column into Rows

OUTPUT

4. Apply GROUP BY

5. Generate word count

Output :

Related Searches to Pig Latin Word Count ?

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages