How Does an Index Work?






How Does an Index Work?

SQL has several methods of accessing rows in a table. The two best known are the sequential access method (also called scanning or browsing) and the indexed access method.

The sequential access method is best described as "browsing through a table row by row." Each row in a table is read. If only one row has to be found in a table with many rows, this method is, of course, very time-consuming and inefficient. It is comparable to going through a telephone book page by page. If you are looking for the number of someone whose name begins with an L, you certainly do not want to start looking under the letter A.

When SQL uses the indexed access method, it reads only the rows that exhibit the required characteristics. To do this, however, an index is necessary. An index is a type of alternative access to a table and can be compared with the index in a book.

An index in SQL is built like a tree consisting of a number of nodes. Figure is a pictorial representation of an index. Notice that this is a simplification of what an index tree really looks like. Nevertheless, the example is detailed enough to understand how SQL handles indexes. At the top of the figure (in the light gray area) is the index itself, and at the bottom are two columns of the PLAYERS table: PLAYERNO and NAME. The nodes of the index are represented by the long rectangles. The node at the top forms the starting point of the index and is known as the root. Each node contains up to three values from the PLAYERNO column. Each value in a node points to another node or to a row in the PLAYERS table, and each row in the table is referenced through at least one node. A node that points to a row is called a leaf page. The values in a node have been ordered. For each node, apart from the root, the values in that node are always less than or equal to the value that points to that node. Leaf pages are themselves linked to one another. A leaf page has a pointer to the leaf page with the next set of values. In Figure, we represent these pointers with open arrows.

2. Example of an index tree


What does a pointer really look like? A pointer is nothing more than a row identification. We introduced this concept in the previous section. Because a row identification consists of two parts, the same also applies to an index pointer: the page in which the row occurs and the entity of the list that indicates the location of the row within the page.

Broadly speaking, SQL supports three algorithms for using indexes. The first algorithm is for searching rows in which a particular value occurs. The second algorithm is for browsing through an entire table or a part of a table via an ordered column. Finally, the third algorithm is used if several values of a column must be retrieved. We illustrate these algorithms with three examples. The first example is of how SQL uses the index to select particular rows.

1. Imagine that all rows with player number 44 must be found.

Step 1.
Look for the root of the index. This root becomes the active node.

Step 2.
Is the active node a leaf page? If so, continue with step 4. If not, continue with step 3.

Step 3.
Does the active node contain the value 44? If so, the node to which this value points becomes the active node; go back to step 2. If not, choose the lowest value that is greater than 44 in the active node. The node to which this value points becomes the active node; go back to step 2.

Step 4.
Look for the value 44 in the active node. Now this value points to all pages in which rows of the PLAYERS table appear where the value of the PLAYERNO column is 44. Retrieve all these pages from the database for further processing.

Step 5.
Find for each page the row where the value PLAYERNO column is equal to 44.

Without browsing through all the rows, SQL has found the desired row(s). In most cases, the time spent answering this type of question can be reduced considerably if SQL can use an index.

In the next example, SQL uses the index to retrieve ordered rows from a table.

2. Get all players ordered by player number.

Step 1.
Look for the leaf page with the lowest value. This leaf page becomes the active node.

Step 2.
Retrieve all pages to which the values in the active node are pointing for further processing.

Step 3.
If there is a subsequent leaf page, make this the active node and continue with step 2.

The disadvantage of this method is that if players are retrieved from disk, there is a good chance that a page must be fetched several times. For example, the second page in Figure must be fetched first to retrieve player 2. Next, the first page is needed for player 6, then the third page for player 7, and finally the fourth page for player 8. So far, there is no problem. However, if player 27 is to be retrieved next, the second page must be retrieved from disk again. Meanwhile, many other pages have been fetched, and because of that, the odds are that the second page is no longer in internal memory and, therefore, cannot be read again. The conclusion is that because the rows were not ordered in the file, many pages must be fetched several times, and that does not exactly improve the processing time.

To speed up this process, most products support clustered indexes. Figure contains an example. With a clustered index, the sequence of the rows in the file is determined by the index, and this can improve the execution time for the sorting process considerably. If we now retrieve the players from the file in an ordered way, each page will be fetched only once. SQL understands that when player 6 is retrieved, the correct page is already in the internal memory; the retrieval of player 2 caused this. The same applies to player 7.

3. Example of a clustered index


Clustered indexes offer no additional advantages for direct access (the first algorithm) to rows. Working with this index form is recommended when you want to retrieve ordered rows often.

The third algorithm is a combination of the first two.

3. Get all players with number 39 up to and including 95.

Step 1.
Look for the root of the index. This root becomes the active node.

Step 2.
Is the active node a leaf page? If so, continue with step 4. If not, continue with step 3.

Step 3.
Does the active node contain the value 39? If so, the node to which this value points becomes the active node; go back to step 2. If not, choose the lowest value that is greater than 39 in the active node. The node to which this value points becomes the active node; go back to step 2.

Step 4.
Look for the value 39 in the active node.

Step 5.
In the active node, retrieve all rows that belong to the values between 39 and 95. If 95 appears in this node, you are ready. Otherwise, continue with the following step.

Step 6.
If there is a subsequent leaf page, make this the active node and continue with step 5.

This algorithm can be useful when a SELECT statement contains conditions in which, for example, BETWEEN, a greater than operator, or certain LIKE operators occur.

Here are some remarks concerning indexes:

  • If values in a table are updated, or if rows are added or deleted, SQL automatically updates the index. So, the index tree is always consistent with the contents of the table.

  • In the previous table, an index was defined on the PLAYERNO column of the PLAYERS table. This is the primary key of this table and contains no duplicate values. An index can also be defined on a nonunique column, such as the NAME column. The result of this is that one value in a leaf page points to multiple rowsone pointer for each row in which the value occurs.

  • It is possible to define many indexes on a table, but because a clustered index affects the way in which rows are stored, each table may contain only one clustered index.

  • Indexes can also be defined on combinations of values. Those are called composite indexes. Each value in a node is then a concatenation of the individual values. The leaf pages point to rows in which that combination of values appears.

Several other important observations can be made about the use of indexes. The two most important are these:

  • Nodes of an index are just like rows in a table, stored in files. Therefore, an index takes up physical storage space (just like an index in a book).

  • Updates to tables can lead to updates to indexes. When an index must be updated, SQL tries, where it can, to fill the gaps in the nodes to complete the process as quickly as possible; however, an index can become so "full" that new nodes must be added. This can necessitate a total reorganization of the index which can be very time-consuming.

Several types of indexes exist. In this section, we discussed what is called the B-tree index. The letter B stands for "balanced." A characteristic feature of a B-tree index is that all the branches of the tree have roughly the same length. Later in this chapter, we describe other types of index.

As we already mentioned, this section presents a very simplified picture of the workings of an index. In practice, for example, a node in an index tree can accommodate not just three, but many values. For a more detailed description of indexes, see [DATE95].



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows