Symbol-Table Abstract Data Type





Symbol-Table Abstract Data Type

As with priority queues, we think of search algorithms as belonging to interfaces declaring a variety of generic operations that can be separated from particular implementations so that we can easily substitute alternative implementations. The operations of interest include

  • Insert a new item.

  • Search for an item (or items) having a given key.

  • Remove a specified item.

  • Select the kth largest item in a symbol table.

  • Sort the symbol table (show all the items in order of their keys).

  • Join two symbol tables.

As we do with many data types, we also need to add a construct operation (constructor), a test if empty operation, and perhaps a copy (clone) operation to this set. In addition, we often consider other practical modifications of the basic interface. For example, a search-and-insert operation is an attractive alternative for implementations where the search for a key, even if unsuccessful, nevertheless gives precisely the information needed to insert a new item with that key.

Program 12.1 ADT for symbol-table items

These ADT interfaces illustrate how to define symbol-table items, similar to the way that we defined items to be sorted in Section 6.2. Here, to emphasize the distinction between items and their keys, we define separately the key abstraction.

Our symbol-table implementations are KEY clients that use the methods equals and less to compare keys and also ITEM clients that use the method key to access keys in items, so they work with any implementations of ITEM and KEY (see text). The methods read, rand and toString are for use by clients.

class myItem implements ITEM // ADT interface 
  { // implementations and private members hidden 
    public KEY key() 
    void read() 
    void rand() 
    public String toString() 
  } 
class myKey implements KEY  // ADT interface 
  { // implementations and private members hidden 
    public boolean less(myKey) 
    public boolean equals(myKey) 
    void read() 
    void rand() 
    public String toString() 
  } 

We commonly use the term "search algorithm" to mean "symbol-table ADT implementation," although the latter more properly implies defining and building an underlying data structure for the symbol table and implementing ADT operations in addition to search. Symbol tables are so important to so many computer applications that they are available as high-level abstractions in many programming environments. Java has the Dictionary class as a standard utility, and the Hashtable class that extends it, using the approach that we shall cover in Chapter 14. As usual, it is difficult for a general-purpose implementation to meet the demanding performance needs of diverse applications. Our study of many of the ingenious methods that have been developed to implement the symbol-table abstraction will set a context to help us understand the characteristics of prepackaged implementations and to help us decide when to develop an implementation that is tailored to a particular application.

Program 12.2 Symbol-table key implementation example

This class implements the interface of Program 12.1 for records whose keys have integer values. It uses a constant M to specify an upper bound on key values, whose value is application-dependent and is omitted.

class myKey implements KEY 
  { 
    private int val; 
    public boolean less(KEY w) 
      { return val < ((myKey) w).val; } 
    public boolean equals(KEY w) 
      { return val == ((myKey) w).val; } 
    public void read() 
      { val = In.getInt(); } 
    public void rand() 
      { val = (int) (M * Math.random()); } 
    public String toString() 
      { return val + ""; } 
  } 

As we did with sorting, we will consider the methods without specifying the types of the items being processed, in the same manner that we discussed in detail in Section 6.2. But to emphasize the separate roles played by items and keys in search, we modify the approach that we used in Chapters 7 through 11 to define the item and key abstractions separately. For example, the myItem and myKey ADTs shown in Program 12.1 define the basic abstract operations that we want to perform. Using ADTs like these gives us the flexibility to implement and test various symbol-table implementations on various types of items and keys. The methods rand, read, and toString in Program 12.1 are for use by symbol-table clients, and the methods key, less, and equals are for use by symbol-table implementations.

Program 12.3 Symbol-table item implementation example

This class implements the myItem interface of Program 12.1 for records whose associated information is a floating-point number. The type of the key is determined by the implementation of myKey (see, for example, Program 12.2).

class myItem implements ITEM 
  { 
    private myKey val; 
    private float info; 
    myItem() 
      { val = new myKey(); } 
    public KEY key() { return val; } 
    void read() 
      { val.read(); info = In.getFloat(); } 
    void rand() 
      { val.rand(); info = (float) Math.random(); } 
    public String toString() 
      { return "(" + key() + " " + info + ")"; } 
  } 

Since symbol-table clients and symbol-table implementations both need to use classes like myItem and myKey, it is useful to distinguish their needs, we use the Java interface mechanism in the same way as we did in Section 6.2. Specifically, we define the interface

interface ITEM 
  { KEY key(); } 

so that search implementations can use ITEM types and the key method to access keys; and we define the interface

interface KEY 
  { 
    boolean less(KEY v); 
    boolean equals(KEY v); 
  } 

so that search implementations can use KEY types and the methods less and equals to compare them. Indeed, our symbol-table implementations only access items and keys through these methods—to use one of them, you only need to define appropriate classes that implement ITEM and KEY.

Program 12.2 is an example implementation for integer keys: it uses a constant M to specify the largest key value; in practice we might choose a more complicated interface to allow clients to specify this value. If, for example, keys are 9-digit social security numbers, we might use M = 109. Program 12.3 is an example implementation for items that can associate any type of key with a floating-point number. We might also use an object type instead of a primitive type for the associated information. Developing implementations like these for specific types of items and keys needed in practice is straightforward. For example, we could upgrade the item data type implementations for records and strings from Section 6.2 in order to use myKey in myItem for the key type, implement key() and change accesses to the key field to invoke key() instead, and implement a myKey class with appropriate less and equals methods.

We define less and equals as separate methods because several basic algorithms are naturally expressed in terms of these two separate primitives. Some of our implementations invoke less and equals successively for the same pair of keys, which might be wasteful if the comparisons are costly to perform. In such cases, it is worthwhile to switch to a three-way comparison method that returns -1 if the first key is less than the second, 0 if the two keys are equal, and 1 if the first key is greater than the second. Some other implementations do not use less or equals at all: for example, the first symbol-table implementation that we consider, in Section 12.2, uses integer keys as array indices but never explicitly compares them. In Chapters 14 and 15, our search algorithms are based on extracting pieces of keys using the basic radix operations that we used in Chapter 10. In all such cases, we omit or modify KEY and ITEM as appropriate.

As usual, defining classes for each abstraction leads to extra levels of indirection, so we might wish to avoid using the KEY interface or the myKey type, by replacing KEY in our code with the type name, as illustrated in Program 12.4. For primitive types, we can use the built-in operators < and == to compare keys. As for sorting algorithms, our implementations generally are written in terms of two-parameter static methods less and equals to compare two keys; this convention makes it easy to use either primitive or class types for keys. In some ap-plications, we may wish to make further adjustments to accommodate items that are primitive types.

Program 12.4 Symbol-table item with integer keys

When we want to use keys that are of a primitive type, we replace KEY in our code by the typename to avoid the use of an extra level of referencing, as illustrated in this implementation.

class intkeyItem 
  { 
    private int val; 
    private float info; 
    public int key() { return val; } 
    void read() 
      { val = In.getInt(); info = In.getFloat(); } 
    void rand() 
      { val = (int) (M * Math.random()); 
        info = (float) Math.random(); } 
    public String toString() 
      { return "(" + key() + " " + info + ")"; } 
  } 

Program 12.5 is an interface that defines the basic symbol-table operations (except join), in terms of the item and key abstractions that we just discussed. We shall use this interface between client programs and all the search implementations in this and the next several chapters.

We could define a version of the interface in Program 12.5 to manipulate handles (Object references) to items in a manner similar to Program 9.8 (see Exercise 12.7). In principle, the use of handles should obviate the need to search before removing, and so can admit faster algorithms. In practice, typical implementations do not retain sufficient structure to support efficient removal. For example, some implementations put items on linked lists—the lists would need to be doubly linked to support removal by reference. To avoid unnecessarily complicated code, we use removal by key and leave removal-by-reference implementations for exercises. The interface does not specify how we determine which item to remove, when duplicate keys are present. One reasonable interpretation would be "remove all items with the given key." Instead, most of our implementations use the interpretation "remove any item with the given key," with an implied search.

Program 12.5 Symbol-table ADT

This interface defines operations for a simple symbol table: initialize, return the item count, find an item with a given key, add a new item, remove an item with a given key, select the kth smallest item, and compute a string representation of the list of items in the table.

class ST // ADT interface 
  { // implementations and private members hidden 
    ST(int) 
    int count() 
    void insert(ITEM) 
    ITEM search(KEY) 
    void remove(KEY) 
    ITEM select(int) 
    public String toString() 
  } 

Some algorithms do not assume any implied ordering among the keys and therefore use only equals (and not less) to compare keys, but many of the symbol-table implementations use the ordering relationship among keys implied by less to structure the data and to guide the search. Also, the select and sort abstract operations explicitly refer to key order. The sort operation is packaged as a method that sends all the items in order to the output stream, without necessarily rearranging them. We can easily generalize sort implementations to make a method that visits the items in order of their keys, perhaps applying a method in an object passed as a parameter to each. Typically, we include toString implementations for symbol tables when they show the contents of the symbol table in sorted order (implement sort). Algorithms that do not use less do not require that keys be comparable to one another, and they do not necessarily support efficient implementations of select and sort, so we omit implementatons of select and toString in such cases.

As already noted with regard to the remove operation, the possibility of items with duplicate keys needs special consideration in symbol-table implementations. Some applications disallow duplicate keys so that keys can be used as handles. An example of this situation is the use of social security numbers as keys in personnel files. Other applications may involve numerous items with duplicate keys: for example, a bank may need to search its database for all transactions involving a particular customer. We can handle items with duplicate keys in one of several ways. One approach is to insist that the primary search data structure contain only items with distinct keys, and to maintain, for each key, a link to a list of items with duplicate keys. That is, we use items that contain a key and a link in our primary data structures and do not have items with duplicate keys. This arrangement is convenient in some applications, since all the items with a given search key are returned with one search or can be removed with one remove. From the point of view of the implementation, this arrangement is equivalent to leaving duplicate-key management to the client. The Java Dictionary interface uses this convention.

A second possibility is to leave items with equal keys in the primary search data structure and to return any item with the given key for a search. This convention is simpler for applications that process one item at a time, where the order in which items with duplicate keys are processed is not important. It may be inconvenient in terms of the algorithm design, because the interface might have to be extended to include a mechanism to retrieve all items with a given key or to call a specified method for each item with the given key.

A third possibility is to assume that each item has a unique identifier (apart from the key) and to require that a search find the item with a given identifier, given the key. Or, a more complicated mechanism might be necessary. These considerations apply to all the symbol-table operations in the presence of duplicate keys. Do we want to remove all items with the given key, or any item with the key, or a specific item (which requires an implementation that provides handles to items)? When describing symbol-table implementations, we indicate informally how items with duplicate keys might be handled, without necessarily considering each mechanism for each implementation.

Program 12.6 is a sample client that illustrates some of these conventions for symbol-table implementations. It uses a symbol table to find the distinct values in a sequence of keys (randomly generated or read from standard input), then prints them in sorted order.

Program 12.6 Example of a symbol-table client

This program is a client of our item, key, and symbol-table ADTs (see Programs 12.1 and 12.5) that uses a symbol table to omit items with duplicate keys from a sequence generated randomly or read from standard input. For each item, it uses search to check whether the key has been seen before. If not, it prints the item and inserts it into the symbol table.

class DeDup 
  { 
    public static void main(String[] args) 
      { int i, N = Integer.parseInt(args[0]), 
              sw = Integer.parseInt(args[1]); 
        ST st = new ST(N); 
        for (i = 0; i < N; i++) 
          { myItem v = new myItem(); 
            if (sw == 1) v.rand(); else v.read(); 
            if (st.search(v.key()) == null) 
              { st.insert(v); Out.println(v + ""); } 
          } 
        Out.print(N + " keys, "); 
        Out.println(N-st.count() + " dups"); 
      } 
  } 

As usual, we have to be aware that differing implementations of the symbol-table operations have differing performance characteristics, which may depend on the mix of operations. One application might use insert relatively infrequently (perhaps to build a table), then follow up with a huge number of search operations; another application might use insert and remove a huge number of times on relatively small tables, intermixed with search operations. Not all implementations will support all operations, and some implementations might provide efficient support of certain operations at the expense of others, with an implicit assumption that the expensive operations are performed rarely. Each of the fundamental operations in the symbol table interface has important applications, and many basic organizations have been suggested to support efficient use of various combinations of the operations. In this and the next few chapters, we shall concentrate on implementations of the fundamental operations construct, insert, and search, with some comment on remove, select, sort, and join when appropriate. The wide variety of algorithms to consider stems from differing performance characteristics for various combinations of the basic operations, and perhaps also from constraints on key values, or item size, or other considerations.

In this chapter, we shall see implementations where search, insert, remove, and select take time proportional to the logarithm of the number of items in the dictionary, on the average, for random keys, and sort runs in linear time. In Chapter 13, we shall examine ways to guarantee this level of performance, and we shall see one implementation in Section 12.2 and several in Chapters 14 and 15 with constant-time performance under certain circumstances.

Many other operations on symbol tables have been studied. Examples include finger search, where a search can begin from the point where a previous search ended; range search, where we want to count or show all the nodes falling within a specified interval; and, when we have a concept of distance between keys, near-neighbor search, where we want to find items with keys closest to a given key. In Part 7 we consider such operations in the context of geometric algorithms.

Exercises

graphics/icon01.gif 12.1 Write a myKey class implementation (similar to Program 12.2) to support having the symbol-table implementations process items with String keys.

12.2 Write an myItem class implementation that extends the Record class of Program 6.9 such that clients can build three symbol tables and search on any of the three keys.

graphics/icon01.gif 12.3 Use the symbol-table ADT defined by the interface Program 12.5 to implement stack and queue ADTs.

graphics/icon01.gif 12.4 Use the symbol-table ADT defined by the interface Program 12.5 to implement a priority-queue ADT that supports both remove-the-maximum and remove-the-minimum operations.

12.5 Use the symbol-table ADT defined by the interface Program 12.5 to implement an array sort compatible with those in Chapters 6 through 10.

graphics/icon01.gif 12.6 Add a clone method operator to Program 12.5 and make it Cloneable. (see Section 4.9).

12.7 Define an interface for a symbol-table ADT that allows client programs to remove specific items via handles and to change keys (see Section 9.5).

graphics/icon01.gif 12.8 Give an implementation of the myItem and myKey interfaces for items with two fields: a 16-bit integer key and a String object that contains information associated with the key.

graphics/roundbullet.gif 12.9 Give the average number of distinct keys that our example driver program (Program 12.6) will find among N random positive integers less than 1000, for N = 10, 102, 103, 104, and 105. Determine your answer empirically, or analytically, or both.


 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows