Join Plan Strategies

Join Plan Strategies

The type of join plan used by a DBMS depends on indexes, table sizes, and selectivity. (Selectivity is the measure of the distinct values in an index; see Chapter 9, "Indexes.") Figure shows the strategies normally used by the Big Eight.

Join Plans Used by DBMSs
  Nested-Loop Sort-Merge Hash
IBM Yes Yes Yes (hybrid)
Informix Yes Yes (hybrid) Yes
Ingres Yes Yes Yes
InterBase Yes Yes No
Microsoft Yes Yes Yes (hybrid)
MySQL Yes No Yes
Oracle Yes Yes Yes
Sybase Yes Yes Yes

Nested-Loop Joins

"A coward dies a thousand times before his death. The valiant never taste of death but once."

—William Shakespeare, Julius Caesar

A nested-loop join is based on some variation of the nested-loop pseudocode shown in Listing 5-1. Let's call the table in the outer loop the "outer table" and the table in the inner loop the "inner table." This is straightforward terminology, and better than the terms "driven table" and "driving table"—the term "driving table" is synonymous with "outer table," but we find that confusing. While studying Listing 5-1, keep this in mind:

  • Usually the word "matches" means "equal to" because the huge majority of joins are equijoins (joins using the equals operator).

  • Usually the word "pass" means "add values from both rows to a temporary table," but quite often the DBMS only needs to gather the row identifiers (ROWIDs) for the matching rows.

-1 Nested-Loop Pseudocode

for (each row in Table1) {                  /* outer loop */

  for (each row in Table2) {                /* inner loop */

    if (Table1 join column matches Table2 join column) pass

    else fail



Just using the technique shown in Listing 5-1 results in a huge improvement because the temporary table that results isn't a one-million-row Cartesian join—the code filters before adding rows. But there are still a million comparisons to do, so we must get even more sophisticated.

Consider that the important factor now is not the number of comparisons—comparisons are relatively cheap—but the number of page reads.[1] When we think of the loop in terms of pages, it looks like the pseudocode shown in Listing 5-2.

[1] A page is a fixed-size (usually 4KB, 8KB, or 16KB) disk storage unit holding multiple rows; see Chapter 8, "Tables."

-2 Nested-Loop Page Reads

for (each page in Table1) {              /* outer loop */

  for (each page in Table2) {            /* inner loop */

    for (each row in Table1-page) {      /* cheap stuff */

      for (each row in Table2-page) {

        if (join column matches) pass

        else fail





And now, a first aha! moment. If there are 11 pages in the DBMS cache, and the inner table (Table2) has only 10 pages, then after the DBMS has iterated once through the inner loop, the Table2-page is in the cache when the second iteration begins. In that case, the total number of page reads is not the product of the tables' sizes, but merely the sum of the number of pages in both Table1 and Table2. Result: end of geometric explosions. We can learn two lessons from this:

Lesson for the DBMS maker

When two tables differ greatly in size, the smaller table should be the inner table because there's more chance to fit all the pages in the cache, and caching matters only for the inner table.

Lesson for you

Fitting more rows into small tables has a major effect on some join speeds. Concentrate on making the "small" table smaller.

Actually, in a sequential pass through a file, it's common to read multiple pages at a time. That doesn't affect the logic of what we're saying here at all, but you may as well know that people don't talk about "paged nested-loop joins." Instead they talk about "blocked nested-loop joins" because a group of pages is often called a "block."

If the cache is too small, the DBMS can look for an index to alleviate the problem. If an index is doing its job, then there's no need to scan every row in the inner table. It's enough to look up the value in the table's index. If the table has a three-layer index, of which two layers are unlikely to be in the cache, then we can estimate that the number of page reads will be:

(number of pages in outer table) + (number of pages in inner table * 2)

And from this, we learn two more lessons:

Lesson for the DBMS maker

The table with the good index should be the inner table.

Lesson for you

Having an index on one table can help a lot—but having indexes on both tables is not quite as important. With a pure nested-loop join plan, one of the tables will be the outer table and the DBMS can go through all its pages sequentially.

What's a Good Index?

In Chapter 9, "Indexes," we'll talk about several kinds of indexes. Generally, the index that gets the most points for a nested-loop inner table is a B-tree with good selectivity, preferably with few layers, and preferably the cluster key on a clustered index or else a primary-key index.

So far we've looked only at the general case where entire tables are joined. That's frequent for reports, but a query in a short transaction usually takes this pattern:

SELECT * FROM Table1, Table2

   WHERE Table1.column1 = Table2.column1

     AND Table1.column2 = <literal>

The AND Table1.column2 = <literal> expression in this example is known as a restrictive expression. If you apply a restrictive expression, the number of rows decreases (because not all rows in a table will be true for the expression). In contrast, WHERE Table1.column1 = Table2.column1 is a join expression. If you apply a join expression, the number of rows increases (because several rows in Table2 might match the ones in Table1). Because of this, it's important to ensure that you restrict before you join.

Given that, which should be the outer table in this example? Answer: Table1, because if Table1 is the inner table, the DBMS would have to apply the restriction every time a row comparison is made in the inner loop—which is still (outer * inner rows) times. Two more lessons can be learned from this:

Lesson for the DBMS maker

If a table has a restrictive expression on it, that table should be the outer table.

Lesson for you

Use restrictive expressions on joins whenever you can, because that will reduce the number of iterations of the outer loop.

There are some joins that can be replaced by an INTERSECT . . . CORRESPONDING clause, and that works well enough if both tables have restrictive clauses (which can make nested-loop joining difficult because the DBMS has to apply a restriction every time a row comparison is made in the inner loop as well as the outer loop). However, because many DBMSs don't support INTERSECT, we won't discuss this option further.

Instead, let's take another look now at the join expression, which so far we've seen only as Table1.column1 = Table2.column1. Joins, of course, can get a little more complicated than that. First of all, there might be an ANDed expression, as in:

Swim with the Current

A DBMS will choose inner and outer tables based on these criteria (in order by ascending importance):

  • The smaller table should be inner, if it's very small.

  • The table with the best index should be inner.

  • The table with a restrictive clause should be outer.

Don't fight it. If Table1 is smaller and has a good index, then put your restrictive clauses on Table2Table2 should be the outer table anyway.

A good example of this situation is the case where a primary key must be joined to a foreign key:


   FROM Primary_key_table, Foreign_key_table

   WHERE Primary_key_table.key = Foreign_key_table.key

A salesperson will have more than one sale; a teller will have more than one transaction; a product will have more than one stock; and a primary key will have more than one foreign key—not a reliable rule, but a good guess. Another good guess is that the primary key has a unique index on the joining columns, while the foreign key doesn't. Most often, then, the DBMS will pick Primary_key_table as the inner table.

Swim Against the Current

In a nested-loop join, Big_table should be the outer table. But if there's a restrictive expression on Small_table, then the DBMS will choose Small_table as the outer table.

Some IBM experts propose fixing this by putting an unnecessary restrictive expression on Big_table. For example:

SELECT * FROM Small_table, Big_table

  WHERE Small_table.column1 = Big_table.column1

    AND Small_table.column2 > 10    /* this is meaningful */

    AND Big_table.column2 > ''      /* makes Big_table outer */

This tip comes from a reliable source, but as we saw in Chapter 2, "Simple Searches," redundant conditions cause losses more often than gains.

SELECT * FROM Table1, Table2

   WHERE Table1.column1 = Table2.column1

     AND Table1.column2 = Table2.column2

In this case, it's not enough to ensure that column1 and column2 in the inner table are indexed. Instead, you need to make a compound index on (column1, column2) in the inner table (GAIN: 7/8 if compound index rather than two separate indexes). It's also important to make sure that the joined columns in Table1 and Table2 have exactly the same data type and the same size (GAIN: 7/8 if the data types are exactly the same). That's not a trivial efficiency matter—many DBMSs won't even use the index if there's a size mismatch.

What if the inner table is too big to fit in a cache and has no useful index? Or what if there's a restrictive clause on the inner table too? Well, then you're in trouble! The only bright spot in this situation is if you're using Microsoft or Sybase. These DBMSs will internally "create temporary table which is clustered by the join key" then "populate temporary table by selecting the join columns and using the restrictive clause." Other DBMSs, more prosaically, simply make a temporary index on the inner table. That might sound like a solution, but it's better to have a permanent index than a temporary index that gets created and dropped whenever the query comes along.

Another thing you can do if both tables are indexed is encourage the searches to occur in index order. For example, suppose you have this query:

SELECT * FROM Table1, Table2

   WHERE Table1.column1 = Table2.column1

For this example, assume that Table1.column1 contains these values: {15, 35, 5, 7}. If lookups of Table2.column1 occur in that order, then the disk heads are jumping around. But if you force the DBMS to search Table2. column1 in numeric order, that problem goes away. To do this, change the query to:

SELECT * FROM Table1, Table2

   WHERE Table1.column1 = Table2.column1

     AND Table1.column1 > 0

GAIN: 4/7


Don't do this for Informix; it shows a loss. The gain shown is for only seven DBMSs.

Remember that this works only if Table1.column1 is indexed and the index is used by the DBMS. The values found are now {5, 7, 15, 35}, and the lookups in the inner loop will be going forward in the index, which is good for caching and which makes read-aheads more useful.

A variation of this trick when the inner table is not indexed, is to encourage the searches to occur in ascending ROWID order. This can be done if ROWIDs are accessible to the programmer, but a DBMS can also do this sort of thing automatically.

Here are our final three lessons about nested-loops:

When your join contains an ANDed expression, make a compound index on the inner table's join columns.

Make sure that the joined columns in your inner table and outer table have exactly the same data type and the same size.

Search in index order.

The Bottom Line: Nested-Loop Join Plans

The choice of inner/outer tables depends on restricting expressions, then index qualities, then table sizes.

You can influence the choice or you can go with the flow. When conditions are good, nested-loops can produce result rows only two or three times more slowly than selections from a single unjoined table.

Nested-loops are the only strategy that works with weird conjunctions, such as LIKE joins.

Because they're flexible and easy to implement, nested-loop joins are supported by all DBMSs and are the default choice in most situations.

Typical OLTP queries will involve nested-loop joins 100% of the time, and even reporting queries will involve nested-loop joins more than 50% of the time.

IBM's Hybrid Join

When it comes to tricks with ROWID order in the inner table, IBM is currently the most advanced DBMS. Consider this SQL statement:

SELECT * FROM Customer Outer, Invoice Inner

   WHERE Outer.customer_name = Inner.customer_name

     AND Outer.customer_name = 'SMITH'

To resolve this query, IBM does the following:

  • Sort data so outer table access will be in the order of the join column on the inner table.

  • When accessing the inner table, get RID (the IBM name for ROWID) and attach it to the row from the outer table. This results in a temporary table with {outer table rows, inner table RID}.

  • Sort the temporary table by inner table RID.

The result is that IBM repeatedly scans the inner table—but in order by inner table RID rather than rows. This means there will be prefetches and quick access.

Sort-Merge Joins

A sort-merge join, sometimes known as a merge scan or simply merge join, is a tad more complex than a nested-loop, but we can still describe sort-merge joins in the few lines of pseudocode shown in Listing 5-3.

-3 Sort-Merge Pseudocode

sort (Table1)

sort (Table2)

get first row (Table1)

get first row (Table2)

for (;;until no more rows in tables) {        /* merge phase */

  if (join-column in Table1 < join-column in Table2)

    get next row (Table1)

  elseif (join-column in Table1 > join-column in Table2)

    get next row (Table2)

  elseif (join-column in Table1 = join-column in Table2) {


    get next row (Table1)

    get next row (Table2)



Note: There is an assumption here that no duplicates exist.

Duplicates make the sort-merge plan more complex and slow.

In the merge phase of a sort-merge join, the DBMS is always going forward. It never needs to get the same row twice, so this is a one-pass rather than a nested-loop—a big advantage. The downside is the cost of sorting. However, efficient sorts should be faster than comparisons of everything to everything. In fact it's provable that sort-merge is faster than nested-loop if both tables are large—just compare the number of page reads.

Even so, DBMSs will prefer nested-loops because they require less RAM, are more flexible, and have no startup overhead. So all we'll look at here is the ideal sort-merge query. It looks like this:

SELECT * FROM Table1, Table2

   WHERE Table1.column1 = Table2.column1

What makes this example ideal is that the comparison operator is equals and there are no restrictive expressions—there are only join expressions, which is typical, in fact, of a query you'd use to produce a report. For such queries, you might gain with sort-merge but be warned that there's extra hassle, which might involve your DBA. (For example, with Sybase you need to explicitly "enable" sort-merge, and with Oracle you must have a large value for SORT_AREA_SIZE in init.ora.)

A wonderful opportunity for the sort-merge occurs when Table1 and Table2 are already in order by the same key. That would be true in cases where column1 is the clustered key (that is, the column chosen to be the index key for a clustered index; see Chapter 9, "Indexes") in both tables. However, if only one table is indexed, then nested-loop can take advantage of that index just as readily as sort-merge can.

The Bottom Line: Sort-Merge Join Plans

Sort-merge is excellent when the query type is right, the RAM is sufficient, and the two tables are so similar that neither is an obvious driver.

If DBMSs would improve support of sort-merge by taking more advantage of existing indexes, sort-merges would be more important for OLTP work than they are currently.

Hash Joins

A hash join is another method for producing a joined table. Given two input tables Table1 and Table2, processing is as follows:

  • For each row in Table1, produce a hash. (A hash is a number—often a 32-bit integer—that is derived from column values using a lossy compression algorithm.) Assign the hash to a hash bucket.

  • For each row in Table2, produce a hash. Check if the hash is already in the hash bucket. If it is, there's a join. If it is not, there's no join.

Thus, a hash join is a type of nested-loop join in which the inner table rows are looked up via a hash algorithm instead of via an index. Most DBMSs don't keep permanent hash tables so here we're talking about a temporary hash table that the DBMS creates in RAM just before beginning the loop and destroys after the end of the loop. For a hash join to work properly, the following conditions should all be true:

  • Condition #1: There is enough RAM set aside so that the hash table fits entirely in memory. (Usually there isn't much RAM. Cloudscape will avoid hash joins if more than 1MB is needed. Oracle allows setting a fixed size with yet another of its init.ora parameters: HASH_AREA_SIZE.)

  • Condition #2: The join is an equijoin.

  • Condition #3: Inner table join-column values are unique and can be uniquely hashed. With long character strings, collisions and uncertainties could wreck the plan.

  • Condition #4: The inner table is searched many times because the outer table is large and contains many rows that are not filtered out by whatever restrictive expression exists on the outer table.

It might appear at first glance that Condition #1 is a showstopper when the inner table has more than, say, 100,000 rows. In fact, though, all DBMSs will perform a trick to keep from overflowing the hash area—they will read partitions of the table. Thus, for example, a DBMS with a 400,000-row table to search reads only 100,000 rows at a time, makes the hash list, does the comparisons, then clears the hash list for the next read. It has to perform the nested-loop four times instead of just once, but that's a lot better than bringing in so many rows at once that the hash table overflows and disk swapping occurs.

The important condition is Condition #4. An in-memory hash lookup is always faster than an index lookup, true—but only by one or two I/Os for each iteration. Take our earlier example of a query with a restrictive expression:

SELECT * FROM Table1, Table2

   WHERE Table1.column1 = Table2.column1

     AND Table1.column2 = <literal>

If the AND Table1.column2 = <literal> expression is true 20 times, and it takes 20 I/Os to read Table2, then you're at breakeven point. The cost of setting up the hash list would not be made up by the savings in the inner loop when there are so few iterations of the inner loop.

The Bottom Line: Hash Join Plans

Hash joins beat sort-merge joins beat nested-loop joins when the join is an equijoin with no restrictive expression and the tables are large—even when there are indexes that hash joins and sort-merge joins ignore.

On the other hand, when there is any sort of restrictive expression, nested-loops can do the filtering before the joining, and will therefore beat both hash joins and sort-merge joins. Sort-merge is best only if there's lots of RAM for a sort, or there's been a presort, or if an efficient sorting mechanism is on disk.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows