NULLs





NULLs

When you're creating a table, you can expect that each variable-length column you define will take at least one byte (and up to four bytes) extra storage for a size or offset value. The other likely per-column overhead is the storage for NULL. Because NULL is not equal to any other value, it can't be stored in the space allotted for the column—extra storage is required for a special flag indicator that says "this is NULL" or "this is not NULL." For example, in IBM or Ingres databases, this flag is one byte long and precedes the column, while in MySQL databases the flag is one bit per nullable column, rounded up to the nearest byte. And as we said earlier, nullable columns in Sybase databases and older versions of Microsoft are automatically variable-length.

This is horrific: What should be a mere "constraint" has become an unchangeable "physical storage descriptor." As a result, you will often have a hard time changing a column from nullable to NOT NULL, or vice versa.

NULL is the only constraint that affects storage. That's a little disappointing, because other constraints could also be used to good effect. For example:

  • If a column was defined with:

    
    ... DEFAULT 'a very long value'
    
    

    a BOOLEAN flag could be used to indicate that a value equals the DEFAULT value.

  • If a column was defined with a CHECK constraint, for example:

    
    ... CHECK (column1 in 'red', 'pink', 'fuschia')
    
    

    a TINYINT number could be used to map to each value, for example:

    
    0 (means 'red')
    
    

    The MySQL people already have moved one step in this direction with a data type that they call ENUM, because it's like C's enum operator.

Both of these suggestions are examples of storing a short code to represent a long value. This practice is rapidly becoming archaic because you can do a better job with a user-defined data type (UDT).

The Bottom Line: NULLs

NULL costs overhead. Because NULL is not equal to any other value, it can't be stored in the space allotted for the column—a special flag indicator that says "this is NULL" or "this is not NULL" must also be stored.

Microsoft and Sybase are anti-NULL DBMSs.

Recommendation: Prefer NOT NULL for performance and portability reasons.

Microsoft & Sybase versus NULL

—"NULL is not the same as a blank string."—Everybody

One could get the impression that Microsoft and Sybase databases are hostile environments for NULLs. Here are some indicators.

Nullable is not the default

With any other DBMS, and with Standard SQL, if you do this:


CREATE TABLE Table1 (

   column1 INTEGER)

then column1 can contain NULL by default. With Microsoft and Sybase, column1 is NOT NULL by default. To make column1 accept NULLs, you can create your table like this:


CREATE TABLE Table1 (

   column1 INTEGER NULL)

but this is not standard SQL syntax.[3]

NULL causes variable-length

We mentioned the implications of this early in this chapter.

NULL means blank

It's not possible to store a blank string in a Sybase database. If you do this:


UPDATE Table1 SET

   variable_column = ''    /* no space between '' */

Sybase will silently change the statement to:


UPDATE Table1 SET

   variable_column = ' '   /* one space between '' */

The reason? The only way to store a NULL is to store a completely empty string! (That's why nullable columns must be variable-length.) Therefore a truly blank column can't be stored because—contrary to what we all learned—blank means NULL. Previous versions of Microsoft also had this problem, as does Oracle.

Indications of this anti-NULL attitude crop up elsewhere. For example, a look at Microsoft's ODBC specification reveals that they love to return a blank string as a signal that a value is unknown.

This is a shame. The reasonable default assumption should be that a column can contain NULL. That is both sensible and conforms to the SQL Standard. Nevertheless, given Microsoft's and Sybase's attitude, NOT NULL must be the preference for performance reasons.

[3] Amusing side issue—Because the default can change if ANSI_NULL_DFLT is set, a script of CREATE TABLE statements could have different effects with different Microsoft databases. Therefore Microsoft books recommend you use NULL explicitly—for "portability."In other words, you must violate the SQL Standard so that you'll be portable.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows