apache hive - HiveQL Select Order By - hive tutorial - hadoop hive - hadoop hive - hiveql



What is Select-Order By in HiveQL ?

  • The familiar ORDER BY clause of a SELECT statement sorts the result set based on the values from one or more columns.
  • The ORDER BY clause is used to retrieve the details based on one column and sort the result set by ascending or descending order.
  • For distributed queries, this is a relatively expensive operation, because the entire result set must be produced and transferred to one node before the sorting can happen. This can require more memory capacity than a query without ORDER BY.
apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Syntax of Order By:

  • The ORDER BY syntax in Hive QL is similar to the syntax of ORDER BY in SQL language.
colOrder: ( ASC | DESC )
colNullOrder: (NULLS FIRST | NULLS LAST)      -- (Note: Available in Hive 2.1.0 and later)
orderBy: ORDER BY colName colOrder? colNullOrder? (',' colName colOrder? colNullOrder?)*
query: SELECT expression (',' expression)* FROM src orderBy
Clicking "Copy Code" button will copy the code into the clipboard - memory. Please paste(Ctrl+V) it in your destination. The code will get pasted. Happy coding from Wikitechy hive tutorial team
  • There are some limitations in the "order by" clause.
  • In the strict mode (i.e., hive.mapred.mode=strict), the order by clause has to be followed by a "limit" clause.
  • The limit clause is not necessary if you set hive.mapred.mode to nonstrict.
  • The reason is that in order to impose total order of all results, there has to be one reducer to sort the final output.
  • If the number of rows in the output is too large, the single reducer could take a very long time to finish.
  • Note that columns are specified by name, not by position number. However in Hive 0.11.0 and later, columns can be specified by position when configured as follows:
    • For Hive 0.11.0 through 2.1.x, set hive.groupby.orderby.position.alias to true (the default is false).
    • For Hive 2.2.0 and later, hive.orderby.position.alias is true by default.
learn hive - hive tutorial - hive sql datatypes -  hive programs -  hive examples

learn hive - hive tutorial - hive sql datatypes - hive programs - hive examples

The default sorting order is ascending (ASC):

  • In Hive 2.1.0 and later, specifying the null sorting order for each of the columns in the "order by" clause is supported. The default null sorting order for ASC order is NULLS FIRST, while the default null sorting order for DESC order is NULLS LAST.
  • In Hive 3.0.0 and later, order by without limit in subqueries and views will be removed by the optimizer. To disable it, set hive.remove.orderby.in.subquery to false.

Example:

  • Let us take an example for SELECT...ORDER BY clause.
  • Assume employee table as given below, with the fields named Id, Name, Salary, Designation, and Dept.
  • Generate a query to retrieve the employee details in order by using Department name.
ID Name Salary Designation Dept
1201 Aarthi 45000 Technical manager TP
1202 Boomi 45000 Proofreader PR
1203 Dharsanya 40000 Technical writer TP
1204 Harikka 40000 Hr Admin HR
1205 Mirunalini 30000 Op Admin Admin
  • The following query retrieves the employee details using the above scenario:
hive> SELECT Id, Name, Dept FROM employee ORDER BY DEPT;
Clicking "Copy Code" button will copy the code into the clipboard - memory. Please paste(Ctrl+V) it in your destination. The code will get pasted. Happy coding from Wikitechy hive tutorial team
  • On successful execution of the query, you get to see the following response:
ID Name Salary Designation Dept
1205 Mirunalini 30000 Op Admin Admin
1204 Harikka 40000 Hr Admin HR
1202 Boomi 45000 Proofreader PR
1201 Aarthi 45000 Technical manager TP
1203 Dharsanya 40000 Technical writer TP
apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

JDBC Program:

Here is the JDBC program to apply Order By clause for the given example.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class WikitechyOrderBy{
   private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
   
   public static void main(String[] args) throws SQLException {
   
      // Register driver and create driver instance
      Class.forName(driverName);
      
      // get connection
      Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/userdb", "", "");
      
      // create statement 
      Statement stmt = con.createStatement();
      
      // execute statement
      Resultset res = stmt.executeQuery("SELECT * FROM employee ORDER BY DEPT;");
      System.out.println(" ID \t Name \t Salary \t Designation \t Dept ");
      
      while (res.next()) {
         System.out.println(res.getInt(1) + " " + res.getString(2) + " " + res.getDouble(3) + " " + res.getString(4) + " " + res.getString(5));
      }
      
      con.close();
   }
}
Clicking "Copy Code" button will copy the code into the clipboard - memory. Please paste(Ctrl+V) it in your destination. The code will get pasted. Happy coding from Wikitechy hive tutorial team
  • Save the program in a file named WikitechyOrderBy.java. Use the following commands to compile and execute this program.
$ javac WikitechyOrderBy.java
$ java WikitechyOrderBy
Clicking "Copy Code" button will copy the code into the clipboard - memory. Please paste(Ctrl+V) it in your destination. The code will get pasted. Happy coding from Wikitechy hive tutorial team

Output:

ID Name Salary Designation Dept
1205 Mirunalini 30000 Op Admin Admin
1204 Harikka 40000 Hr Admin HR
1202 Boomi 45000 Proofreader PR
1201 Aarthi 45000 Technical manager TP
1203 Dharsanya 40000 Technical writer TP
1204 Harikka 40000 Hr Admin HR

Wikitechy Apache Hive tutorials provides you the base of all the following topics . Enjoy learning on big data , hadoop , data analytics , big data analytics , mapreduce , hadoop tutorial , what is hadoop , big data hadoop , apache hadoop , apache hive , hadoop wiki , hadoop jobs , hadoop training , hive tutorial , hadoop big data , hadoop architecture , hadoop certification , hadoop ecosystem , hadoop fs , apache pig , hadoop cluster , cloudera hadoop , hadoop download , hadoop mapreduce , hadoop workflow , hive data types , hadoop hive , pig hadoop , hadoop administration , hadoop installation , hive hadoop , learn hadoop , hadoop for dummies , hadoop commands , hive definition , hiveql , learnhive , hive sql , hive database , hive date functions , hive query , apache hive tutorial , hive apache , hive wiki , what is a hive , hive big data , programming hive , what is hive in hadoop , hive documentation , how does hive work

Related Searches to HiveQL Select Order By