pig tutorial - apache pig tutorial - Apache Pig - Top() - pig latin - apache pig - pig hadoop
What is TOP() function in Apache Pig ?
- The TOP() function of Pig Latin is used to get the top N tuples of a bag.
- To this function, as inputs, we have to pass a relation, the number of tuples you need, and the column name whose values are being compared.
- This function will return a bag containing the required columns.
- Ensure we have a file named wikitechy_emp_details.txt in the HDFS directory /pig_data/, with the following content.
- You have loaded this file into Pig with the relation name emp_data as given below.
- Group the relation emp_data by age, and store it in the relation emp_group.
Now verify the relation emp_group using the Dump operator as given below.
Now, you can get the top two records of each group arranged in ascending order (based on id) as given below.
- In this instance we are retriving the top 2 tuples of a group having greater id.
- Then we are retriving top 2 tuples basing on the id, we are passing the index of the column name id as second parameter of TOP() function.
You can verify the contents of the data_top relation using the Dump operator as given below.