UPDATE





UPDATE

The SQL Standard description of the typical UPDATE format is:


UPDATE <Table> SET

  { <column> = <column expression> [,...] |

    ROW = <row expression> }

    [ WHERE <search condition> ]

For example:


UPDATE Table1 SET

       column1 = 1,

       column2 = 2,

       column3 = 3

 WHERE column1 <> 1

    OR column2 <> 2

    OR column3 <> 3

This example updates multiple columns with the same SET clause, which is better than the alternative—using three UPDATE statements would reduce locking but increase logging (GAIN: –8/8 with three UPDATEs instead). A little-known fact about the SET clause is that evaluation must be from left to right. Therefore, if any assignment is likely to fail, put that assignment first in the SET clause. For example, if the column3 = 3 assignment in the last UPDATE is likely to fail, change the statement to:


UPDATE Table1 SET

       column3 = 3,

       column1 = 1,

       column2 = 2

 WHERE column1 <> 1

    OR column2 <> 2

    OR column3 <> 3

GAIN 6/8

With this change, the DBMS will fail immediately and not waste time setting column1 = 1 and column2 = 2.

The UPDATE example also contains a WHERE clause that specifies a precise exclusion of the conditions that are going to be true when the SET clause is done. This is like saying—Make it so unless it's already so. This WHERE clause is redundant, but you might be lucky and find that no rows need to be updated (GAIN: 5/8 if no data change required). By the way, this trick does not work if any of the columns can contain NULL.

Dependent UPDATE

Often two data changes occur in a sequence and are related. For example, you might UPDATE the customer balance then INSERT into a transaction table:


BEGIN

  UPDATE Customers

     SET balance = balance + 500

     WHERE cust_id = 12345;

  INSERT INTO Transactions

     VALUES (12345, 500);

END

This is improvable. First of all, it would be safer to say that the INSERT should only happen if the UPDATE changed at least one row (the number of changed rows is available to a host program at the end of any data-change statement). Second, if it's true that this is a sequence that happens regularly, then the INSERT statement should be in a trigger. (It goes without saying that the whole thing should be in a stored procedure; see Chapter 11, "Stored Procedures.")

When the sequence is two UPDATEs rather than an INSERT and an UPDATE, it sometimes turns out that the best optimizations involve ON UPDATE CASCADE (for a primary/foreign key relationship), or that the columns being updated should be merged into one table.

Batch UPDATE

An UPDATE statement contains a SET clause (the operation) and a WHERE clause (the condition). Which comes first, the condition or the operation?

You may not remember when batch processing was the norm. It's what used to happen in the days when a single sequential pass of the data was optimal, because of the nature of the devices being employed. The essential loop that identifies a batch-processor goes like this:


get next record, do every operation on that record, repeat

that is:


For each row {

  Do every operation relevant to this row }

This contrasts with the normal SQL set orientation, which goes like this:


find the records and then do the operations on them

that is:


For each operation {

  Do every row relevant to this operation }

You can change an SQL DBMS into a batch processor by using the CASE operator. For example:


/* the set-oriented method */

UPDATE Table1

  SET column2 = 'X'

  WHERE column1 < 100



UPDATE Table1

  SET column2 = 'Y'

  WHERE column1 >= 100

     OR column1 IS NULL



/* the batch-oriented method */

UPDATE Table1

  SET column2 =

    CASE WHEN column1 < 100 THEN 'X'

         ELSE 'Y'

    END

GAIN: 5/7

Portability

InterBase doesn't support CASE. The gain shown is for only seven DBMSs.


The batch-oriented method is reasonable if you are updating 100% of the rows in the table. Generally you should be leery of statements that select everything and then decide what to do based on the selection—such code might just be a transplant from a legacy system. In this example, though, there is an advantage because the WHERE clauses are dispensed with, and because the rows are processed in ROWID order.

The Bottom Line: UPDATE

Update multiple columns with the same UPDATE … SET clause, rather than with multiple UPDATE statements.

UPDATE … SET clause evaluation must be from left to right. If any assignment is likely to fail, put it first in the SET clause.

It can be helpful to add a redundant WHERE clause to an UPDATE statement, in cases where it's possible that no rows need to be updated. This won't work if any of the columns can contain NULL.

If you're updating all rows of a table, use batch processing for the UPDATE.

Check out ON UPDATE CASCADE (for a primary/foreign key relationship) if you find you're doing multiple UPDATE statements on related tables in sequence.

Consider whether columns belonging to multiple tables should belong to the same table if they're frequently being updated in sequence.

Put related—and frequently done—UPDATE statements into triggers and/or stored procedures.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows