Tuesday, 6 August 2013

the executing process of common join in hive

the executing process of common join in hive

Suppose A join B on A.a=B.a, and both of them are big tables. Hive will
process this join operation through common join. The execution graph(given
by facebook):
But I'm confused by this graph, is there only on reducer?
In my understanding, the map output key is table_name_tag_prefix+join_key.
But in partition phase, it still uses the join_key to partition the
records. In reduce phase, each reducer reads the <join_key,value> which
have the same join key, the reducer needn't read all map splits.

No comments:

Post a Comment