May 6, 2011, 9:12 p.m.
posted by sqrt
PHP Efficiency Issues
PHP's preg routines use PCRE, an optimized NFA regular-expression engine, so many of the techniques discussed in Chapters 4 through 6 apply directly. This includes benchmarking critical sections of code to understand practically, and not just theoretically, what is fast and what is not. Chapter 6 shows an example of benchmarking in PHP (☞ 234).
For particularly time-critical code, remember that a callback is generally faster than using the e pattern modifier (☞ 465), and that named capture with very long strings can result in a lot of extra data copying.
Regular expressions are compiled as they're encountered at runtime, but PHP has a huge 4,096-entry cache (☞ 242), so in practice, a particular pattern string is compiled only the first time it is encountered.
The S pattern modifier deserves special coverage: it "studies" a regex to try to achieve a faster match. (This is unrelated, by the way, to Perl's study function, which works with target text rather than a regular expression ☞ 359.)
The S Pattern Modifier: "Study"
Using the S pattern modifier instructs the preg engine to spend a little extra time[[Currently, the situations where study can and can't help are fairly well defined: it enhances what Chapter 6 calls the initial class discrimination optimization (☞ 247).
I'll start off first by noting that unless you intend to apply a regex to a lot of text, there's probably not a lot of time to save in the first place. You need to be concerned with the S pattern modifier only when applying the same regex to large chunks of text, or to many small chunks.
Standard optimizations, without the S pattern modifier
Consider a simple expression such as . Due to the nature of this regex, we know that every match must begin with the '<' character. A regex engine can (and in the preg suite's case, does) take advantage of that by presearching the target string for '<' and applying the full regular expression at those locations only (since a match must begin with , applying it starting at any other character is pointless).
This simple presearch can be much faster than a full regex application, and therein lies the optimization. Particularly, the less frequently the character in question appears in the target text, the greater the optimization. Also, the more work a regex engine must do to detect a first-character failure, the greater the benefit of the optimization. This optimization helps more than because in the first case, the regex engine would otherwise have to attempt four different alternatives before moving on to the next attempt. That's a lot of work to avoid.
Enhancing the optimization with the S pattern modifier
The preg engine is smart enough to apply this optimization to most expressions that have only a single character that must start any match, as in the previous examples. However, the S pattern modifier tells the engine to preanalyze the expression to enable this optimization for expressions whose possible matches have multiple starting characters.
Here are several sample expressions, some of which we've already seen in this chapter, that require the S pattern modifier to be optimized in this way:
When the S pattern modifier can't help
It's instructive to look at the type of expressions that don't benefit from the S pattern modifier:
It doesn't take long for the preg engine to do the extra analysis invoked by the S pattern modifier, so if you'll be applying a regex to relatively large chunks of text, it doesn't hurt to use it. If you think there's any chance it might apply, the potential benefit makes it worthwhile.