What is a map join and a bucket join in Hive ?

Answer : Map join may be a little-known feature of Hive.

Map join:

  • Map join may be a little-known feature of Hive.
  • It allows a table to be loaded into memory so that a (very fast) join could be performed entirely within a mapper while not having to use a Map/Reduce step.
  • It directs Hive to load aliasname (which may be a table or alias of the query) into memory.
Map Join

Bucket-join:

  • A bucket map join is used when the tables are large and all the tables used in the join are bucketed on the join columns. during this type of join, one table should have buckets in multiples of the number of buckets in another table.
  • For example, if one table has two buckets then the other table must have either 2 buckets or a multiple of two buckets 2, 4, 6, and so on.
  • The preceding condition satisfied joining can done mapper side only, otherwise a normal inner join is performed.
  • It required buckets are fetched on the mapper side and not the complete table.
  • That is, only the matching buckets of all small tables are duplicated onto each mapper.
  • The efficiency of the query is improved to solve problem.In a bucket map join, data is not sorted.
Leave a Reply

Your email address will not be published.

You May Also Like