Linking Tables via Row Identifications






Linking Tables via Row Identifications

In OO databases, all the rows (or their equivalents) have a unique identification. The user does not generate this identification with, for example, an INSERT statement, but it is instead generated by the system itself. These identifications are often called row identifications, object identifiers, or surrogate keys. These unique row identifications can be used to link rows and to have rows refer to each other.

SQL also has adopted this concept. Here, a unique identification is assigned to each row. The row identifications are of no value to the users, but only to the system itself. Although they can be requested and printed, they bear no information. If a row receives an identification, it belongs to that row forever. If the row is removed, the matching identification will never be reused. Note that unique row identifications are not the same as primary keys (even though they do show a resemblance). Later in this chapter, we explain the differences.

Row identifications are stored together with the row, but it is not necessary to define columns for them explicitly. These column values are generated automatically. One could say that each table has a hidden column in which the row identifications are stored.

The row identification (or the value of the hidden column) can be requested with the REF function.

4. Get the Row Identification of the Player with number 6.

SELECT   REF(PLAYERS)
FROM     PLAYERS
WHERE    PLAYERNO = 6

The result is:

REF(PLAYERS)
--------------------------------------------------
000028020915A58C5FAEC1502EE034080009D0DADE15538856

Explanation The REF function has the name of a table as its only parameter and returns the row identification. How row identifications actually look (on disk) depends on the product. As an example, a possible Oracle row identification is displayed.

As said, row identifications can be used to "link" rows. The identification of one row is stored within another row. In other words, one row refers or points to another.

5. Define the tables of the tennis club again, but use row identifications this time.

CREATE   TABLE PLAYERS
        (PLAYERNO    INTEGER PRIMARY KEY,
         NAME        CHAR(15) NOT NULL,
         :           :
         LEAGUENO    CHAR(4))

CREATE   TABLE TEAMS
        (TEAMNO      INTEGER PRIMARY KEY,
         PLAYER      REF(PLAYERS) NOT NULL,
         DIVISION    CHAR(6) NOT NULL)

CREATE   TABLE MATCHES
        (MATCHNO     INTEGER PRIMARY KEY,
         TEAM        REF(TEAMS) NOT NULL,
         PLAYER      REF(PLAYERS) NOT NULL,
         WON         SMALLINT NOT NULL,
         LOST        SMALLINT NOT NULL)

CREATE   TABLE PENALTIES
        (PAYMENTNO   INTEGER PRIMARY KEY,
         PLAYER      REF(PLAYERS) NOT NULL,
         PAY_DATE    DATE NOT NULL,
         AMOUNT      DECIMAL(7,2) NOT NULL)

CREATE   TABLE COMMITTEE_MEMBERS
        (PLAYER       REF(PLAYERS) PRIMARY KEY,
         BEGIN_DATE   DATE NOT NULL,
         END_DATE     DATE,
         POSITION     CHAR(20))

Explanation: Wherever a foreign key occurred, a column is now specified that points to another table. These are called reference columns. The link that is created between, for example, the PLAYERS and TEAMS tables could now be represented as in Figure.

1. Reference columns


Reference columns must be filled with row identifications. The INSERT and UPDATE statements have been extended for this purpose.

6. Add a new team. The captain of this team is player 112.

INSERT   INTO TEAMS (TEAMNO, PLAYER, DIVISION)
VALUES   (3, (SELECT REF(PLAYERS)
              FROM   PLAYERS
              WHERE  PLAYERNO = 112), 'first')

Explanation: The SELECT statement retrieves the row identification of player 6 and then stores it in the PLAYER column.

7. The captain of team 1 is no longer player 6, but player 44.

UPDATE   TEAMS
SET      PLAYER = (SELECT REF(PLAYERS)
                   FROM   PLAYERS
                   WHERE  PLAYERNO = 44)
WHERE    TEAMNO = 1

The linking of tables with row identifications has a great influence on the way in which joins can be formulated. It becomes much easier to formulate most joins.

8. Find, for each team, the team number and the name of the captain.

SELECT   TEAMNO, PLAYER.NAME
FROM     TEAMS

Explanation: For each row in the TEAMS table, two values are printed: the value of the TEAMNO column and the value of the expression PLAYER.NAME. This is an expression that we have not discussed yet. Let us deal with it in more detail.

For the sake of convenience, we call this a reference expression. A reference expression always begins with a reference column. In this case, it is the column PLAYER. This column refers to the PLAYERS table. After the reference column is the NAME column from that table. The final result is that the name of the player, who is the captain of the team, is printed.

It looks as if this statement does not execute a join at all, but the join is hidden in the reference expression PLAYER.NAME. For each team, SQL finds the row identification of the player (the captain). This identification is stored in the column PLAYER. Next, SQL looks for the row with this identification in the hidden column of the PLAYERS table. If it is found, the value in the NAME column is retrieved. In other words, this statement does not specify a classical join indeed. Behind the scenes, the previous SELECT statement is converted to the following:

SELECT   TEAMS.TEAMNO, PLAYERS.NAME
FROM     TEAMS, PLAYERS
WHERE    TEAMS.PLAYER = REF(PLAYERS)

First, the table to which the PLAYER column refers is added to the FROM clause. Then, a join condition is added to the statement. In this join condition, TEAMS.PLAYER is compared to the REF of the table to which the column refers.

Two aspects should be taken into account. First, a reference column can contain a NULL value, of course. In that case, no join will be executed on the other table, and the value of the reference expression is then NULL. Second, the reference column can contain a row identification that does not occur in the other table. Imagine that the row identification in the PLAYER column of the TEAMS table does not occur in the PLAYERS table. In that case, that team would not occur in the result when an inner join is executed. However, for this type of expression, an outer join is always executed. So actually, the previous statement is not executed, but the following one is:

SELECT   TEAMS.TEAMNO, PLAYERS.NAME
FROM     TEAMS LEFT OUTER JOIN PLAYERS
         ON (TEAMS.PLAYER = REF(PLAYERS))

9. For each match played by someone from Eltham and for a team from the first division, find the match number, the name of the player, and the name of the captain of the team.

SELECT   MATCHNO, PLAYER.NAME, TEAM.PLAYER.NAME
FROM     MATCHES
WHERE    PLAYER.TOWN = 'Eltham'
AND      TEAM.DIVISION = 'first'

Explanation: The statement contains three reference expressions: PLAYER.NAME, TEAM.PLAYER.NAME, and PLAYER.TOWN. The first and last have well-known forms, a reference column followed by a "normal" column. However, the second expression has a new form. Here, the reference column TEAM is followed by another reference column, PLAYER, which is followed by NAME. This expression must be read as: Get, for each row concerned, the NAME of the PLAYER who is captain of the TEAM. In fact, this reference expression replaces a join specification of the MATCHES table by that of TEAMS first and then that of PLAYERS.

No restriction governs the length of reference expressions. The only restriction is that the last column cannot be a reference column.

10. Create two tables with employee and department data.

CREATE   TABLE EMPLOYEES
        (EMPLOYEENO    INTEGER PRIMARY KEY,
         NAME          CHAR(15) NOT NULL,
         DEPARTMENT    REF(DEPARTMENTS))

CREATE   TABLE DEPARTMENTS
        (DEPARTMENTNO  INTEGER PRIMARY KEY,
         NAME          CHAR(15) NOT NULL,
         BOSS          REF(EMPLOYEES))

The following statement is now valid:

SELECT   DEPARTMENTNO, BOSS.DEPARTMENT.BOSS.NAME
FROM     DEPARTMENTS

Explanation: For each department, we want to know the name of the boss of the department and where the boss of each department works.

Reference columns can also refer to the table of which they are a part.

Figure. Create the PLAYERS table with the new columns FATHER and MOTHER. These two columns are used if the father and mother are also members of the tennis club.

CREATE   TABLE PLAYERS
        (PLAYERNO   INTEGER PRIMARY KEY,
         NAME       CHAR(15) NOT NULL,
         FATHER     REF(PLAYERS),
         MOTHER     REF(PLAYERS),
         :          :
         LEAGUENO   CHAR(4))

12. For each player whose mother also plays for the tennis club, get the player number and the name of the father.

SELECT   PLAYERNO, FATHER.NAME
FROM     PLAYERS
WHERE    MOTHER IS NOT NULL

13. Get the player number of each player whose grandfather also plays for the tennis club.

SELECT   PLAYERNO
FROM     PLAYERS
WHERE    MOTHER.FATHER IS NOT NULL
OR       FATHER.FATHER IS NOT NULL

Using references has a number of advantages.

  • Advantage 1: It is not possible to make mistakes when assigning a data type to a foreign key. The data type of a foreign key must always be equal to that of the primary key. This cannot go wrong because for a reference column, only the table name is specified.

  • Advantage 2: Some primary keys are very wide with regard to the number of columns and the number of bytes. The effect is that the foreign keys (that refer to it) are also wide and take up a lot of storage space. When you work with reference columns, only the row identification is stored. This could be smaller and, thus, saves storage space.

  • Advantage 3: Primary keys can be changed. If that happens, the foreign keys should also be adjusted; see Chapter 16, "Specifying Integrity Constraints." This slows the update, of course. However, this does not apply to references because, first, the row identifications (the hidden columns) cannot be changed. Second, there will never be additional changes in the other tables as a result of this. When you want to change one value in the primary key, you can change only that value.

  • Certain SELECT statements become easier to formulate; see Examples 34.8 and 34.9.

However, the use of references also has a number of disadvantages:

  • Disadvantage 1 Certain update statements become more difficult to formulate; see Examples 34.6 and 34.7.

  • Disadvantage 2 With respect to linking tables, the reference offers only one-way traffic. It is now easy to retrieve data about players for each match, but not the other way round. We illustrate this with an example.

    14. For each player, find the number and the numbers of his or her matches.

    SELECT   P.PLAYERNO, M.MATCHNO
    FROM     PLAYERS AS P, MATCHES AS M
    WHERE    REF(P) = M.PLAYER
    

  • Disadvantage 3 Designing databases also becomes more difficult. Initially, there was only one method to define relationships between two tables, but now there are two. The question is, then, which of the two should you use, and when? And must you use the same method everywhere, or does it depend on the situation? If you do not use the same method everywhere, users will have to pay close attention when they formulate their SQL statements. The following always applies with respect to database design: The more choices, the more difficult database design becomes.

  • Disadvantage 4 A reference column is not the same as a foreign key. The population of a foreign key is always a subset of that of a primary key, but this does not apply to reference columns. For example, if a player is removed from the PLAYERS table, the row identifications occurring in the other tables are not removed as well. The impact is that the MATCHES table will contain so-called dangling references. Therefore, reference columns cannot enforce the integrity of data in the way foreign keys can.



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows