Hive

What is a tool for tuning hive queries ?

July 13, 2021 2 Min Read

143 0

Tool for tuning hive queries

tool for tuning hive queries

1. Enable Compression in Hive

By doing compression at various phases (i.e. on final output, intermediate data),we achieve performance improvement in Hive Queries.

2. Optimize Joins

We can improve the performance of joins.By enabling Auto Convert Map Joins and enabling optimization of skew join.

Table Of Content

Tool for tuning hive queries
1. Enable Compression in Hive
2. Optimize Joins
Auto Map Join
Skew joins
Enable Bucketed Map Joins
3. Enable Parallel Execution
4. Single Reduce for Multi Group BY
5. Enable Vectorization
6. Enable Cost Based Optimization

Auto Map Join
Skew Joins
Enable Bucketed Map Joins

Auto Map Join:

- Auto Map-Join is useful feature when joining a big table with a small table.
- If we enable this feature, the small table will be saved in the local cache on each node, joined with the big table in the Map phase.
- Enabling Auto Map Join provides 2 advantages.
- Primary,it loads a small table into cache will save read time on each data node.
- Secondary, it avoids skew joins in the Hive query, since the join operation has been already done in the Map phase for each block of data.

Skew joins:

- We enable skew joins by setting hive.optimize.
- Skew join property SET command in hive shell or hive-site.xml file.

Enable Bucketed Map Joins

- The tables as specific column and tables used in joins to improve performance bucketed map join is used.

3. Enable Parallel Execution

Hive converts a query into more stages.The MapReduce stage, sampling stage, a mergestage and a limit stage.
By default, Hive executes only one time for these satges.
A particular job may consist of some stages that are not dependent on each other and could be executed in parallel, possibly allowing the overall job to complete more quickly.

4. Single Reduce for Multi Group BY

The single reducer used for multi operations, it combine multiple GROUP BY operations in a query into a single MapReduce job

5. Enable Vectorization

Vectorization introduced into hive for the first time in hive-0.13.1 release only
It improve operations like scans, aggregations, filters and joins, batches of 1024 rows for each time.

6. Enable Cost Based Optimization

It provided the cost based optimization, based on query cost, resulting in different decisions: how to order joins, which type of join to perform and degree of parallelism.

Tags:

Accenture interview questions and answers Altimetrik India Pvt Ltd interview questions and answers ANI Technologies Pvt Ltd interview questions and answers Capgemini interview questions and answers CASTING NETWORKS INDIA PVT LIMITED interview questions and answers CGI Group Inc interview questions and answers Collabera Technologies interview questions and answers cost based query optimization in hive Dell International Services India Pvt Ltd interview questions and answers Flipkart interview questions and answers Genpact interview questions and answers hive performance tuning hortonworks hive performance tuning techniques hive query based interview questions hive query optimization parameters hive query optimization techniques hive scenario based interview questions how will you optimize hive performance IBM interview questions and answers Impetus Technologies interview questions and answers Indiabulls Technology Solutions Ltd interview questions and answers Mindtree interview questions and answers NetApp interview questions and answers pig interview questions Prokarma Softech Pvt Ltd interview questions and answers R Systems interview questions and answers Reliance Industries Ltd interview questions and answers Synechron Te interview questions and answers Tata Consultancy Service interview questions and answers Tech Mahindra interview questions and answers Trigent Software interview questions and answers UnitedHealth Group interview questions and answers Virtusa Consulting Services Pvt Ltd interview questions and answers Wells Fargo interview questions and answers will the reducer work or not if you use “limit 1” in any hiveql query ?Wipro Infotech interview questions and answers Wipro interview questions and answers Yash Technologies interview questions and answers Yodlee Infotech Pvt Ltd interview questions and answers

Author

Editor

Other Articles

Previous

What is a map join and a bucket join in Hive ?

Next

What is the version control tool for hive queries ?

No Comment! Be the first one.

Leave a Reply

Our site uses cookies. By using this site, you agree to the Privacy Policy and Terms of Use.