Removing Duplicate Rows with DISTINCT






Removing Duplicate Rows with DISTINCT

A SELECT clause can consist of a number of expressions preceded by the word DISTINCT. (See the definition at the beginning of this chapter.) When DISTINCT is specified, SQL removes duplicate rows from the intermediate result.

Figure. Find all the different town names from the PLAYERS table.

SELECT   TOWN
FROM     PLAYERS

The result is:

TOWN
---------
Stratford
Stratford
Stratford
Inglewood
Eltham
Midhurst
Stratford
Inglewood
Stratford
Stratford
Douglas
Stratford
Eltham
Plymouth

In this result table, the towns Stratford, Inglewood, and Eltham appear seven, two, and two times, respectively. If the statement is expanded to include DISTINCT:

SELECT   DISTINCT TOWN
FROM     PLAYERS

it produces the following result, in which all duplicate rows are removed:

TOWN
---------
Stratford
Midhurst
Inglewood
Plymouth
Douglas
Eltham

5. Get every existing combination of street and town names.

SELECT   STREET, TOWN
FROM     PLAYERS

The result is:

STREET          TOWN
-------------- ---------
Stoney Road     Stratford
Haseltine Lane  Stratford
Edgecombe Way   Stratford
Station Road    Inglewood
Long Drive      Eltham
Old Main Road   Midhurst
Eaton Square    Stratford
Lewis Street    Inglewood
Edgecombe Way   Stratford
Magdalene Road  Stratford
High Street     Douglas
Haseltine Lane  Stratford
Stout Street    Eltham
Vixen Road      Plymouth

This result also contains duplicate rows; for example, Edgecombe Way and Haseltine Lane in Stratford are each mentioned twice. When DISTINCT is added:

SELECT DISTINCT STREET, TOWN
FROM PLAYERS

the result is:

STREET          TOWN
--------------  ---------
Edgecombe Way   Stratford
Eaton Square    Stratford
Haseltine Lane  Stratford
High Street     Douglas
Lewis Street    Inglewood
Long Drive      Eltham
Magdalena Road  Stratford
Old Main Road   Midhurst
Station Road    Inglewood
Stoney Road     Stratford
Stout Street    Eltham
Vixen Road      Plymouth

DISTINCT, then, is concerned with the whole row, and not only with the expression that directly follows the word DISTINCT in the statement. In these two constructs, the use of DISTINCT is superfluous (but not forbidden):

  • When the SELECT clause includes at least one candidate key for each table specified in the FROM clause, DISTINCT is superfluous. The most important property of a candidate key is that the set of columns that forms the candidate key never allows duplicate values, so a table that has a candidate key never has duplicate rows. The inclusion of a candidate key in the SELECT clause offers a guarantee that no duplicate rows will appear in the end result.

  • When the table expression results in none or only one row with values, DISTINCT is superfluous. For equal rows, at least two rows are necessary. For example, if you are looking for players with a certain player number (WHERE PLAYERNO = 45), the statement results in one row if that player number exists, and otherwise no rows.

The user is allowed to specify the word ALL in the same position in the statement as where DISTINCT appears. Note that ALL actually has the opposite effect to DISTINCT and does not alter the result of a "normal" table expression. In other words, the results of the following two statements are equivalent:

SELECT   TOWN
FROM     PLAYERS

and

SELECT   ALL TOWN
FROM     PLAYERS

Exercise 9.1:

In which of the following statements is DISTINCT superfluous?

  1. SELECT   DISTINCT PLAYERNO
    FROM     TEAMS
    

  2. SELECT   DISTINCT PLAYERNO
    FROM     MATCHES
    WHERE    TEAMNO = 2
    

  3. SELECT   DISTINCT *
    FROM     PLAYERS
    WHERE    PLAYERNO = 100
    

  4. SELECT   DISTINCT M.PLAYERNO
    FROM     MATCHES AS M, PENALTIES AS PEN
    WHERE    M.PLAYERNO = PEN.PLAYERNO
    

  5. SELECT   DISTINCT PEN.PAYMENTNO
    FROM     MATCHES AS M, PENALTIES AS PEN
    WHERE    M.PLAYERNO = PEN.PLAYERNO
    

  6. SELECT   DISTINCT PEN.PAYMENTNO, M.TEAMNO,
             PEN.PLAYERNO
    FROM     MATCHES AS M, PENALTIES AS PEN
    WHERE    M.PLAYERNO = PEN.PLAYERNO
    



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows