Heaps





Heaps

In this section, we'll look at a particular storage structure called a heap. A heap, or heap-organized table, is a structure for storing data in an unstructured manner. When you add something to a heap, it goes wherever free space is available, which probably means "at the end"—existing data is not moved to make free space available for new data. Heaps are the default structure. They're certainly the simplest type.

ROWID

Here is a nonstandard Oracle SQL-extension statement that seeks out a particular data row:


SELECT column1 FROM Table1

  WHERE ROWID = '00001F20.00A3.0001'

  FOR UPDATE OF column1

In this statement, the ROWID value has three parts: (a) the page number within a file, (b) the row number within the page, and (c) the file number within the list of database files that Oracle keeps in the system catalog. In other words, the Oracle ROWID is a row identifier—it uniquely describes a row of a heap-organized table to the DBMS. Other DBMSs have different formats and use different terms, as shown in Figure. But the name doesn't matter—the important thing is that most DBMSs have a row identifier that you can use in SQL statements, and we'll call this the "ROWID" throughout this book.

The question is—Should you?

The Oracle statement shown earlier is the fastest possible SELECT. With any other column, the DBMS has to take the column value and find the address by looking it up. With a ROWID, the column value is the address. Therefore, one way to go through a table is like this:


SELECT ROWID FROM Table1



SELECT * FROM Table1

  WHERE ROWID = ?

The first SELECT statement gets all the row identifiers into a program buffer so that you can then access each individual row with the second SELECT.

Using ROWID speeds up SELECT. On the other hand, the Oracle statement can also fail or can cause an error to appear in the database. The obvious danger is that another user could delete the row, which makes the row identifier invalid. The greater danger is that another user could delete the row, then INSERT a new row at the same location—which makes the row identifier a valid, but false, pointer. Some further dangers are specific to the DBMS or table type you're using:

  • Informix won't support true row identifiers if there's partitioning.

    DBMSs and ROWID
      ROWID Equivalent
    IBM RID
    Informix ROWID
    Ingres tid
    InterBase No support
    Microsoft RID
    MySQL _rowid
    Oracle ROWID
    Sybase RID

  • Microsoft will let rows change location if there's a clustered index.

  • PostgreSQL doesn't document how its row identifiers work so if something goes wrong you can't hold them responsible.

This is hardly surprising to anyone who understands relational theory. The fact is that row identifiers are not an SQL feature—they are a trap door for getting away from SQL. They are useful—if they weren't then DBMSs wouldn't allow them—but the only completely safe use of ROWID is inside a serialized transaction. (A serialized transaction is a transaction that prevents or avoids data changes by other users, and you shouldn't use one until you've read Chapter 15, "Locks.") Inside a transaction, row identifiers can be used for navigation—for example, to simulate subqueries and outer joins with application program code when the DBMS doesn't support such advanced features.

Incidentally, row identifiers are "pseudocolumns" rather than real columns. They take up no space in the row. They do not appear if you say SELECT * ... because SELECT * ... only gets defined columns. You must specify row identifiers explicitly to get them.

Migration

We saw in the last chapter that variable-length columns are common. Even if you think you use fixed-size data types, there's no way to avoid variable-length columns that works with all DBMSs.

Suppose you have a situation in which Row #1 on Page #1 contains one variable-length column with value NULL. This UPDATE is executed:


UPDATE Table1

  SET column1 = 'abcdefg'

This is called an expanding UPDATE because the row is clearly getting bigger. This is a problem because no free space is between Row #1 and Row #2 on the same page (because column1 is variable-length). Therefore, before the DBMS can modify Row #1, it must shift Row #2 down. Incidentally, because shifting takes time, it takes more time to update variable-length columns if you use large (16KB or more) page sizes. But that's not our immediate concern.

Our real concern is—What if Page #1 is already full? Pages are fixed-size—the DBMS can't make them bigger. And the DBMS can't simply shift rows forward in all subsequent pages—that could cause a single-row UPDATE to take hours. So the DBMS must find some free space in another page and put the new data in that page. In theory there are different ways to do this, but we found that DBMSs prefer these choices:

  • The row that gets moved is the row that's being updated—not the row that would otherwise be shifted past the end of the page. This choice saves time because if the DBMS moved a different row, it would have to get a lock on it before the UPDATE could proceed.

  • The new row location will probably be close to the old one, but it's not possible to guarantee that. Most DBMSs will either look for space within the same extent or will search through a list of pages that have free space in them (Oracle calls this list the freelist). However, if all pages are full, the DBMS has to allocate a new extent and put the changed row at the end.

  • The DBMS puts in a pointer at the original row location. The pointer is the row identifier of the new location. So even if the new row location is Page #7 Row #7, you can still access it at Page #1 Row #1 (the DBMS will automatically follow the pointer). That means if you accessed the row using a row identifier, you are safe because the original ROWID is still valid. There is also no need to change index keys because they can continue to point to the original location. However, all subsequent accesses of this row will take twice as long, until the table is reorganized.

The whole process of finding a home for an expanding UPDATE is called migration. As well as causing slow access due to pointers, migration causes space to be wasted. That's because the row is gone from the original page. All that's left is a pointer, so there's more free space now in the original page.

Fragmentation

In the long term, after a database has been subjected to data changes for a few weeks, it begins to show signs of fragmentation. A group of pages is fragmented if:

  • Excess free space is on many pages because of deletions or shrinking updates or migrations.

  • The logical order of rows (the ROWID order) is not the same as the physical arrangement because of pointers left by migrations.

  • Pages or extents for different tables are interleaved.

Fragmentation is easy to identify: SELECT all the ROWIDs and look at their page-number components. If some pages have surprisingly few rows, then be suspicious. Programs that do such monitoring are often supplied as DBMS utilities.

The DBMS can try to reduce fragmentation automatically, by reclaiming wasted space. For example, we know that Oracle keeps a freelist of reusable areas. But reclaiming only delays the evil day. It is not possible to stop fragmentation, and eventually you must rebuild the database or run a utility to reorganize the pages. (Incidentally, reorganizing will do nothing to reduce the fragmentation caused by the operating system. You must also run the operating system's defragger. You won't have to do this if you allocate huge extents at the start.)

The effect of fragmentation is that, on average, Row [n] gets moved farther away from Row [n + 1]. Is such a phenomenon important? That depends very much on your disk drive. Consider these specifications from a high-performance IBM drive:

  • Track-to-track seek time (to an adjacent track): 1 millisecond.

  • Full stroke seek time (from one end of disk to other end): 15 milliseconds.

  • Other factors (average latency, settle time, transfer rate): 7 milliseconds.

From these figures, we can calculate that a nearby disk access takes (7ms + 1ms) 8 milliseconds, while a far away disk access takes (15ms + 7ms) 22 milliseconds—or nearly three times as long. Such extreme cases would be rare though, because two randomly placed rows are normally separated by less than the width of the disk and also because elevator seeking happens. The term "elevator seeking" is an operating system term that means "traveling through a disk's physical locations in the manner of an elevator, instead of jumping backward or forward for each request." The term comes from the elevator analogy, which goes like this—A building is analogous to a disk drive, a building floor is analogous to a disk track, and a building elevator is analogous to a disk head. If you are alone in the building and you press "13th floor," the elevator goes directly to it[4] and the only factor is seek time. But when a dozen people are getting on or off and going up or down, the scenario changes: The elevator is constantly traveling from the building top to building bottom and back, opening on floors along the way. Then the major factor is not seek time (that's going to happen anyway) but the number of times that the doors open for floors you don't want. The analogous term for this factor is "head contention"—that is, the number of jobs that want to use the drive head at the same time.

[4] Unless you're in North America where, in the words of the poet:

"The architect he skipped direct/From twelve unto fourteen ..."

—Ogden Nash, "A Tale of the Thirteenth Floor"

The bottom line is that fragmentation does waste space and does cost time, but it would be more noticeable on a laptop (which is single-user) than on a very large machine with a heavy load. If there are occasional cleanups for both the DBMS and the operating system, and performance in single-user mode is not important, then your only remaining worry is that migrated rows take twice as long to read.

Free Page Space

When you create a table or index, you can usually set an attribute to control how much free space a page should contain initially. This attribute goes by various names. For example, Oracle calls it PCTFREE (percent to leave free), while Ingres calls it FILLFACTOR (percent to fill). Figure shows the PCTFREE equivalent for the Big Eight.

DBMSs and Free Space Attribute
  PCTFREE Equivalent
IBM PCTFREE
Informix FILLFACTOR
Ingres FILLFACTOR
InterBase No support
Microsoft FILLFACTOR
MySQL No support
Oracle PCTFREE
Sybase FILLFACTOR

Consider how Oracle does an INSERT.

  1. Calculate size of new row.

  2. Find the "last" page. (This is for the simplest case, a heap. When Oracle is adding to a heap, the DBMS is looking at pages in the last extent or the freelist.)

  3. Look at the page header to find out how much space is available.

  4. If (available space < PCTFREE)

    
    If (there are no more pages)
    
       Allocate new extent
    
       Goto Step #2
    
    
  5. Copy new row into page, update page header, and stop.

Given that PCTFREE just causes space to be wasted in every page, why would you want a PCTFREE greater than zero? That's easy: to avoid migration. If free space is in a page, then an expanding UPDATE will just grow into it, rather than migrating. It's generally accepted that migration is such an awful thing that you should be willing to accept 10% waste space in every page in return for reduced migration. Therefore, for example, Oracle's default PCTFREE setting is 10%—that is, Oracle reserves 10% of each page for updates to existing rows, by default.

Hold on, though. Remember that migration happens only if there is an expanding UPDATE. But expanding UPDATEs can't happen if (a) the table is read-only or (b) all the table's columns are fixed-size and there is no plan to ALTER TABLE to amend the column sizes. Under either of these conditions, you should override the default and set PCTFREE to equal zero (or FILLFACTOR to equal 100). Incidentally, if a table is read-only, then it's also good practice to use larger page sizes. Incidentally (again), if all columns are fixed-size, then the amount of per-page overhead space is slightly less. Remember that a column is not truly fixed-size unless it's defined as NOT NULL, and remember too that InterBase and Oracle use variable-length for almost every situation.

You can dispense with PCTFREE entirely if you allocate free space in selected rows at INSERT time—that is, by putting a filler in variable-length columns—as suggested in Chapter 7, "Columns."

The bottom line is: avoid migration. Here's how:

  • Add a bit of filler when you do the original INSERT.

  • Use PCTFREE (or its equivalent).

  • Don't do DELETE plus INSERT. Do UPDATE instead.

  • Put DELETEs and shrinking UPDATEs before expanding UPDATEs or INSERTs. Arrange batches in ROWID order.

  • If columns will be changed regularly, define them as fixed-size fields, not variable-length fields.

The Bottom Line: Heaps

Most DBMSs have a row identifier that you can use in SQL statements. While using ROWID allows you to write the fastest possible SELECT, ROWID also poses two dangers. The first is that another user could DELETE the row, which makes the row identifier invalid. The second, and greater, danger is that another user could DELETE the row and then INSERT a new row at the same location, which makes the row identifier a valid, but false, pointer.

The only completely safe use of ROWID is inside a serialized transaction.

The process of finding a home for an expanding UPDATE is called migration. Migration causes slow access due to pointers. It also wastes space.

A group of pages is fragmented if (a) excess free space is on many pages because of deletions, shrinking UPDATEs, or migrations, (b) the logical order of rows is not the same as the physical arrangement because of pointers left by migrations, or (c) pages or extents for different tables are interleaved. Fragmentation wastes space and costs time.

The DBMS can try to reduce fragmentation automatically by reclaiming wasted space. But it is not possible to stop fragmentation, and eventually you must rebuild the database or run a utility to reorganize the pages.

When you create a table or index, you can usually set an attribute to control how much free space a page should contain initially. Use this feature to avoid migration, which is a larger problem than wasted space.

Because expanding UPDATEs can't happen if (a) the table is read-only or (b) all the table's columns are fixed-size, set PCTFREE to zero (or FILLFACTOR to 100) in these cases.

If a table is read-only, it's good practice to use larger page sizes.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows