March 21, 2011, 9:32 p.m.
posted by telumel
Another reason for the appearance of unnecessarily wide tables (as in having too many attributes) is a lack of understanding of the true relationship between data items. Consider the example of subtypes . A company may have a mix of employees, some of whom are permanent, others who are contractors. They all have several properties in common (name, year of birth, department, room, phone number, and so forth), but there are also properties that are unique to each type of employee (for instance, hire date and salary for permanent employees, rate and contract reference for contractors). The manner in which the common attributes can be shared, while ensuring that the distinctive features are kept separate, introduces the topic of subtypes.
We can model this situation by defining three tables. First, the employee table contains all information that is common to every employee, regardless of their status. However, an attribute tells the status of each employee. It has as many distinct values as there are distinct employee types, for example "P" (for permanent employee), and "C" (for contract employee). This table uses an employee number as the primary key.
Next, we create additional tables, one for each employee type. In this case, there are two tables. Tables permanent and contract represent subtypes of the table employee, for example. Each permanent or contract employee inherits certain characteristics from the employee table, in addition to possessing unique characteristics, as defined in their own tables.
Now let's examine the creation of the primary keys between these two types of tables, as it's the primary key construct that implements the subtype relationships . The unique key for all tables is the unique identifier for each member of staffthe employee number. The set of primary keys of employee is the union of the primary keys of the various subtype tables, and the intersection of the primary keys of all subtype tables is by construction empty, because each employee belongs to just one, in this case, of the two categories. The primary keys of subtype tables are also foreign keys, referencing the primary key of employee.
Please note that assigning totally independent primary keys to the subtype tables would, of course, be a disastrous mistake. In the real world however, you will certainly find examples in which this disastrous mistake has been perpetrated. Note also that entity sub-types are not the same as master-detail relationships. They can quickly be distinguished on examination of their respective primary keys. For those who would think that this type of discussion is a bit academic (associating with the word "academic" some vague, slightly pejorative connotation), I'll just say that whenever different subtypes use a primary key that is not a subset of the primary key of the parent table, the result is almost invariably pathetic performance, from many points of view.
One of the main principles to follow in order to achieve efficient database access is a principle attributed to Philip II of Macedonia, father of Alexander the Great, and that principle is: Divide and Rule. It is quite likely that the vast majority of the queries executed by the HR department will belong to either of two categories: they will be either generic queries about all the people working in an organization or specific queries about one category of person. In both cases, by using subtypes correctly,[*] we will only need to examine that data which is most likely to provide the result that we require, and no time will be wasted examining irrelevant information. If we were to put everything into a single table, the most modest query would have to plow through a much greater quantity of data, most of which is useless in the context of that query.
Tables in which specific columns appear as null indicate the need for subtypes.