apache hive - Hive Performance - Hive Optimizations- hive tutorial - hadoop hive - hadoop hive - hiveql



apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Hive Performance - Hive Optimizations

Architecting Hive Data :

learn hive - hive tutorial - apache hive - hive dba - Architecting Hive Data -  hive examples

learn hive - hive tutorial - apache hive - hive dba - Architecting Hive Data - hive examples

apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

Column Pruning :

  • As name suggests – discard columns which are not needed
  • --------------- SELECT a,b FROM T WHERE e < 10;
    • T contains 5 columns (a,b,c,d,e)
    • Columns c,d are discarded
    • Select only the relevant columns
    • Enabled by default
    • hive.optimize.cp = true
    apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

    Predicate Pushdown :

  • Move predicate closer to the table scan only.
  • Enabled by default:
  • -----------hive.optimize.ppd = true
  • Predicates moved up across joins.
  • --------------- SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
    --------------- SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2) WHERE T1.c1 < 10
  • Special needs for outer joins:
    • Left outer join: predicates on the left side aliases are pushed
    • Right outer join: predicates on the right side aliases are pushed
    • Full outer join: none of the predicates are pushed
  • apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

    Partition Pruning :

  • Reduce list of partitions to be scanned
  • Works on parse tree currently
          SELECT * FROM
          (SELECT c1, COUNT(1) FROM T GROUP BY c1) subq
          WHERE subq.prtn = 100;

  • Below is the reordered query for the above query
          SELECT * FROM T1 JOIN
          (SELECT * FROM T2) subq ON (T1.c1=subq.c2)
           WHERE subq.prtn = 100;
  • hive.mapred.mode = nonstrict
  • Strict mode, scan of a complete partitioned table fails
  • apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql

    Tips to make fast with Hadoop - Data Layout Considerations for Fast Hive :

  • Skipping data:
    • Divide data among different files which can be pruned out.
    • Partitions, buckets and skews.
    • Skip records during scans using small embedded indexes.
    • Automatic when you use ORCFile format.
    • Sort data ahead of time.
    • Simplifies joins and skipping becomes more effective.
  • learn hive - hive tutorial - apache hive - hive performance -  hive examples

    learn hive - hive tutorial - apache hive - hive performance - hive examples

    learn hive - hive tutorial - apache hive - hive no compression -  hive examples

    learn hive - hive tutorial - apache hive - hive no compression - hive examples

    learn hive - hive tutorial - apache hive - hive column sorting to faciliate skipping -  hive examples

    learn hive - hive tutorial - apache hive - hive column sorting to faciliate skipping - hive examples


    Wikitechy Apache Hive tutorials provides you the base of all the following topics . Enjoy learning on big data , hadoop , data analytics , big data analytics , mapreduce , hadoop tutorial , what is hadoop , big data hadoop , apache hadoop , apache hive , hadoop wiki , hadoop jobs , hadoop training , hive tutorial , hadoop big data , hadoop architecture , hadoop certification , hadoop ecosystem , hadoop fs , apache pig , hadoop cluster , cloudera hadoop , hadoop download , hadoop mapreduce , hadoop workflow , hive data types , hadoop hive , pig hadoop , hadoop administration , hadoop installation , hive hadoop , learn hadoop , hadoop for dummies , hadoop commands , hive definition , hiveql , learnhive , hive sql , hive database , hive date functions , hive query , apache hive tutorial , hive apache , hive wiki , what is a hive , hive big data , programming hive , what is hive in hadoop , hive documentation , how does hive work

    Related Searches to Hive vs Mapreduce