Jan. 13, 2011, 7:44 a.m.
posted by madinteger
This Item illustrates one significant balance point between Items 8 and 9, "don't optimize prematurely" and "don't pessimize prematurely." That makes this a tough Item to write, lest it be misconstrued as "premature optimization." It is not that.
Here's the background and motivation: Memory and disk capacity continue to grow exponentially; for example, from 1988 to 2004 disk capacity grew by about 112% per year (nearly 1,900-fold growth per decade), whereas even Moore's Law is just 59% per year (100-fold per decade). One clear consequence is that whatever your code does today it may be asked to do tomorrow against more datamuch more data. A bad (worse than linear) asymptotic behavior of an algorithm will sooner or later bring the most powerful system to its knees: Just throw enough data at it.
Defending against that likely future means we want to avoid "designing in" what will become performance pits in the face of larger files, larger databases, more pixels, more windows, more processes, more bits sent over the wire. One of the big success factors in future-proofing of the C++ standard library has been its performance complexity guarantees for the STL container operations and algorithms.
Here's the balance: It would clearly be wrong to optimize prematurely by using a less clear algorithm in anticipation of large data volumes that may never materialize. But it would equally clearly be wrong to pessimize prematurely by turning a blind eye to algorithmic complexity, a.k.a. "big-Oh" complexity, namely the cost of the computation as a function of the number of elements of data being worked on.
There are two parts to this advice. First, even before knowing whether data volumes will be large enough to be an issue for a particular computation, by default avoid using algorithms that work on user data (which could grow) but that don't scale well with data unless there is a clear clarity and readability benefit to using a less scalable algorithm (see Item 6). All too often we get surprised: We write ten pieces of code thinking they'll never have to operate on huge data sets, and then we'll turn out to be perfectly right nine of the ten times. The tenth time, we'll fall into a performance pitwe know it has happened to us, and we know it has happened or will happen to you. Sure, we go fix it and ship the fix to the customer, but it would be better to avoid such embarrassment and rework. So, all things being equal (including clarity and readability), do the following up front:
Second, after measurements show that optimization is necessary and important, and especially if it's because data volumes are growing, focus on improving big-Oh complexity rather than on micro-optimizations like saving that one extra addition.
In sum: Prefer to use linear (or better) algorithms wherever possible. Avoid polynomial algorithms where reasonable. Avoid exponential algorithms with all your might.