Processing a SELECT Statement: The Steps






Processing a SELECT Statement: The Steps

Chapter 5, "SELECTStatement: Common Elements," described which clauses are executed successively during the processing of a SELECT statement. These clauses form a basic strategy for processing a statement. In a basic strategy, we assume sequential access to the data. This section discusses how the use of an index can change the basic strategy to an optimized strategy.

SQL tries to choose the most efficient strategy for processing each statement. This analysis is performed by a module within SQL, called the optimizer. (The analysis of statements is also referred to as query optimization.) The optimizer defines a number of alternative strategies for each statement. It estimates which strategy is likely to be the most efficient, based upon factors such as the expected execution time, the number of rows, and the presence of indexes. (In the absence of indexes, this can be the basic strategy.) SQL then executes the statement according to its chosen strategy.

Following are some examples to show what optimized processing strategies can look like.

Figure. Get all information about player 44. (We assume that there is an index defined on the PLAYERNO column.)

SELECT *
FROM   PLAYERS
WHERE  PLAYERNO = 44

The FROM clause: Usually, all rows would be retrieved from the PLAYERS table. Speeding up the processing by using an index means that only the rows in which the value in the PLAYERNO column is 44 are fetched.

The intermediate result is:

PLAYERNO NAME  ...
-------- ----- ---
      44 Baker ...

The WHERE clause: In this example, this clause was processed simultaneously with the FROM clause.

The SELECT clause: All columns are presented.

The difference between the basic strategy and this "optimized" strategy can be represented in another way.

The basic strategy is:

RESULT := [];
FOR EACH P IN PLAYERS DO
   IF P.PLAYERNO = 44 THEN
      RESULT :+ P;
ENDFOR;

The optimized strategy is:

RESULT := [];
FOR EACH P IN PLAYERS WHERE PLAYERNO = 44 DO
   RESULT :+ P;
ENDFOR;

With the first strategy, all rows are fetched by the FOR EACH statement. The second strategy works much more selectively. When an index is used, only those rows in which the player number is 44 are retrieved.

Figure. Get the player number and town of each player whose number is less than 10 and who lives in Stratford; order the result by player number.

SELECT   PLAYERNO, TOWN
FROM     PLAYERS
WHERE    PLAYERNO < 10
AND      TOWN = 'Stratford'
ORDER BY PLAYERNO

The FROM clause: Fetch all rows where the player number is less than 10. Again, use the index on the PLAYERNO column. Fetch the rows in ascending order, thus accounting for the ORDER BY clause. This is simple because the values in an index are always ordered.

The intermediate result is:

PLAYERNO ... TOWN      ...
-------- --- --------- ---
       2 ... Stratford ...
       6 ... Stratford ...
       7 ... Stratford ...
       8 ... Inglewood ...

The WHERE clause: The WHERE clause specifies two conditions. Each row in the intermediate result satisfies the first condition, which has already been evaluated in the FROM clause. Now, only the second condition must be evaluated.

The intermediate result is:

PLAYERNO ... TOWN      ...
-------- --- --------- ---
       2 ... Stratford ...
       6 ... Stratford ...
       7 ... Stratford ...

The SELECT clause: Two columns are selected.

The intermediate result is:

PLAYERNO TOWN
-------- ---------
       2 Stratford
       6 Stratford
       7 Stratford

The ORDER BY clause: Because of the use of an index during the processing of the FROM clause, no extra sorting needs to be done. The end result, then, is the same as the last intermediate result shown.

Next, we show the basic strategy and the optimized strategy for this example.

The basic strategy is:

RESULT := [];
FOR EACH P IN PLAYERS DO
   IF (P.PLAYERNO < 10)
   AND (P.TOWN = 'Stratford') THEN
      RESULT :+ P;
ENDFOR;

The optimized strategy is:

RESULT := [];
FOR EACH P IN PLAYERS WHERE PLAYERNO < 10 DO
   IF P.TOWN = 'Stratford' THEN
      RESULT :+ P;
ENDFOR;

6. Get the name and initials of each player who lives in the same town as player 44.

SELECT  NAME, INITIALS
FROM    PLAYERS
WHERE   TOWN =
       (SELECT   TOWN
        FROM     PLAYERS
        WHERE    PLAYERNO = 44)

Here are both strategies.

The basic strategy is:

RESULT := [];
FOR EACH P IN PLAYERS DO
   HELP := FALSE;
   FOR EACH P44 IN PLAYERS DO
      IF (P44.TOWN = P.TOWN)
      AND (P44.PLAYERNO = 44) THEN
         HELP := TRUE;
   ENDFOR;
   IF HELP = TRUE THEN
      RESULT :+ P;
ENDFOR;

The optimized strategy is:

RESULT := [];
FIND P44 IN PLAYERS WHERE PLAYERNO = 44;
FOR EACH P IN PLAYERS WHERE TOWN = P44.TOWN DO
   RESULT :+ P;
ENDFOR;

These were three relatively simple examples. As the statements become more complex, it also becomes more difficult for SQL to determine the optimal strategy. This, of course, also adds to the processing time. There is a noticeable quality difference among the optimizers of the various SQL products. Some SQL products have reasonably good optimizers, but others seldom find an optimal strategy and choose the basic strategy.

If you want to know more about the optimization of SELECT statements, see [KIM85]. However, you do not actually need this knowledge to understand SQL statements, which is why we have given only a summary of the topic.

Exercise 20.1:

For the following two statements, write the basic strategy and an optimized strategy; assume that there is an index defined on each column.

  1. SELECT   *
    FROM     TEAMS
    WHERE    TEAMNO > 1
    AND      DIVISION = 'second'
    

  2. SELECT   P.PLAYERNO
    FROM     PLAYERS AS P, MATCHES AS M
    WHERE    P.PLAYERNO = M.PLAYERNO
    AND      BIRTH_DATE > '1963-01-01'
    



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows