Lucid DB Join Implementation&Optimization
2009-09-22 10:50:48| 分类:
dbms
| 标签:
|举报
|字号大中小 订阅
JOIN combines rows from two relational expressions to produce the result relation. A relational expression could be either a table scan or the result of other joins and aggregations. Join Condition matches rows from its input relations, and Join Types decide how result rows are formed.
The conditions in ON clause is divided into two categories:
1). Join Condition (if the both input to the JOIN are referenced)
2). Filter Condition(if only one input is referenced)
ANSI join semantics require ON clause conditions be applied before forming the join result, and WHERE clause conditions applied after the join and on the result of the join.
Filter Pushdown
For a Filter Condition, which is ON clause or WHERE clause condition that only references one input, it is often advantageous to "push down" the filter so that it is evaluated in the input to reduce the input size. However, not every Filter Condition can be pushed into the inputs of any type of join. The rule is that ON clause Filter Conditions can be pushed into the input on the non-preserved side(or the side that generates NULLs), and WHERE clause Filter Conditions can be pushed into the input on the preserved side.
Filters not pushed down are transformed into Join Conditions and evaluated .
Equi-Join Conditions are those that can be expressed as an equality check between an expression referencing columns from a row in the left-hand-side input with one that references a row from the right-hand side.
Inner Non Equi-Joins can be implemented either as Nested Loop Join(NLJ) or Cartesian Product with Filter. If the join is Left or Right Outer Join, only NLJ can be used.
For more general types of Outer Joins, there are two candidate join methods: Index Join and Nested Loop Join(Because RHS is scanned for each row from LHS).
Nested Loop Join(NLJ) is feasible for Left/Right Outer Join with certain join conditions. NLJ can only decide if a row from one input does not match any row from the other input when the other input is scanned completely. This means that the NULL generating side needs to be the RHS and the preserved side needs to be the LHS.
This is the most straight forward join algorithm. For each tuple from the left input, the entire right input is scanned for matching tuples. In this form, Nested Loop Join(NLJ) has comparable performance as Cartesian Product followed by filters. However, if the join condition is sargable, an index on the inner table can improve the performance.
If the join condition is not sargable, i.e. no part of the condition can be evaluated by any index lookup, for example when the query contains OR conditions
http://pub.eigenbase.org/wiki/LucidDbJoinImplementation
http://pub.eigenbase.org/wiki/LucidDbJoinOptimization
评论这张
转发至微博
转发至微博
评论