[Solved-1 Solution] Hadoop pig return top 5 rows ?
What is Group By ?
The GROUP operator is used to group the data in one or more relations. It collects the data having the same key.
What is Order By ?
- The ORDER BY operator is used to display the contents of a relation in a sorted order based on one or more fields.
If we want to return the top 5 rows of a group. Basically we have a table with some state names and their cities which is grouped by state name. we want to have the top 5 cities of that state and not all of them. How can we do this using pig?
- First we have GROUP BY the elements inside of a foreach.then
- We have to ORDER BY then LIMIT. This will sort the things in each group first by city size, then pulls the top 5.
The below code helps to returns top 5 rows