May 17, 2011, 5:02 p.m.
posted by telumel
Closeness to the DBMS Kernel
The nearer to the DBMS kernel your code can execute, the faster it will run. This is where the true strength of the database lies. For example, several database management products allow you to extend them by adding new functions, which can sometimes be written in comparatively low-level languages such as C. The snag with a low-level language that manipulates pointers is that if you mishandle a pointer, you can end up corrupting memory. It would be bad enough if you were the only user affected. But the trouble with a database server is that, as the name implies, it can serve a large number of users: if you corrupt the server memory, you can corrupt the data handled by another, totally innocent program. As a consequence, responsible DBMS kernels run code in a kind of sandbox, where it can crash without taking everything with it in its downfall. For instance, Oracle implements a complicated communication mechanism between external functions and itself. In some ways, this process is similar to that which controls database links, by which communication between two (or more) database instances on separate servers is managed. If the overall gain achieved by running tightly tailored C functions rather than stored PL/SQL procedures is greater than the costs of setting up an external environment and context-switching, use external functions. But do not use them if you intend to call a function for every row of a very large table. It is a question of balance, of knowing the full implications of the alternative strategies available to solve any given problem.
If functions are to be used, try to always use those that are provided by the DBMS. It is not merely a matter of not reinventing the wheel: built-in functions always execute much closer to the database kernel than any code a third-party programmer can construct, and are accordingly far more efficient.
Here is a simple example using Oracle's SQL that will demonstrate the efficiencies to be gained by using Oracle functions. Let's assume we have some text data that has been manually input and that contains multiple instances of adjacent "space" characters. We require a function that will replace any sequence of two or more spaces by a single space. Ignoring the regular expressions available since Oracle Database 10g, our function might be written as follows:
create or replace function squeeze1(p_string in varchar2) return varchar2 is v_string varchar2(512) := ''; c_char char(1); n_len number := length(p_string); i binary_integer := 1; j binary_integer; begin while (i <= n_len) loop c_char := substr(p_string, i, 1); v_string := v_string || c_char; if (c_char = ' ') then j := i + 1; while (substr(p_string || 'X', j, 1) = ' ') loop j := j + 1; end loop; i := j; else i := i + 1; end if; end loop; return v_string; end; /
As a side note, 'X' is concatenated to the string in the inner loop to avoid testing j against the length of the string.
There are alternate ways of writing a function to eliminate multiple spaces, which can make use of some of the string functions provided by Oracle. Here's one alternative:
create or replace function squeeze2(p_string in varchar2) return varchar2 is v_string varchar2(512) := p_string; i binary_integer := 1; begin i := instr(v_string, ' '); while (i > 0) loop v_string := substr(v_string, 1, i) || ltrim(substr(v_string, i + 1)); i := instr(v_string, ' '); end loop; return v_string; end; /
And here's a third way to do it:
create or replace function squeeze3(p_string in varchar2) return varchar2 is v_string varchar2(512) := p_string; len1 number; len2 number; begin len1 := length(p_string); v_string := replace(p_string, ' ', ' '); len2 := length(v_string); while (len2 < len1) loop len1 := len2; v_string := replace(v_string, ' ', ' '); len2 := length(v_string); end loop; return v_string; end; /
When these three alternative methods are tested on a simple example, each behaves exactly as specified, and there is no visible performance difference:
SQL> select squeeze1('azeryt hgfrdt r') 2 from dual 3 / azeryt hgfrdt r Elapsed: 00:00:00.00 SQL> select squeeze2('azeryt hgfrdt r') 2 from dual 3 / azeryt hgfrdt r Elapsed: 00:00:00.01 SQL> select squeeze3('azeryt hgfrdt r') 2 from dual 3 / azeryt hgfrdt r Elapsed: 00:00:00.00
Assume now that this operation of stripping out multiple spaces is to be called many thousands of times each day. You can use the following code to create and populate a test table with random data, by which you can examine whether there are differences in performance among these three space-stripping functions under a more realistic load:
create table squeezable(random_text varchar2(50)) / declare i binary_integer; j binary_integer; k binary_integer; v_string varchar2(50); begin for i in 1 .. 10000 loop j := dbms_random.value(1, 100); v_string := dbms_random.string('U', 50); while (j < length(v_string)) loop k := dbms_random.value(1, 3); v_string := substr(substr(v_string, 1, j) || rpad(' ', k) || substr(v_string, j + 1), 1, 50); j := dbms_random.value(1, 100); end loop; insert into squeezable values(v_string); end loop; commit; end; /
This script creates a total of 10,000 rows in the test table (a fairly modest total when it is considered how many times some SQL statements are executed). The test can now be run as follows:
select squeeze_func(random_text) from squeezable;
When I ran this test, headers and screen display were all switched off. Getting rid of output operations ensured that the results reflected the space-reduction algorithm and not the time needed to display the results. The statements were executed several times to ensure that there was no caching effect.
Figure shows the results on the test machine.
Even though all functions can be called 10,000 times in under one second, squeeze3 is 1.8 times as fast as squeeze1, and squeeze2 almost 2.2 times as fast. Why? Simply because PL/SQL is not "as close to the kernel" as is a SQL function. The performance difference may look like a tiny thing when functions are executed once in a while, but it can make quite a difference in a batch programor on a heavily loaded OLTP server.
Code loves the SQL kernelthe closer they get, the hotter the code.