Repetitive Searches






Repetitive Searches

There are only two or three human stories, and they go on repeating themselves as fiercely as if they had never happened before.

O Pioneers!
WILLA CATHER

Did you spot the problem with the example program that searched for code snippets in a text file at the beginning of Chapter 18? In lines that have multiple code snippets, everything between the first "<CODE>" and the last "</CODE>" is listed as a single snippet. To separate multiple snippets, we first have to change the regular expression a bit so that it doesn't swallow multiple snippets. In this case, we can replace the ".*" with a nongreedy repetition:

string expr = " <CODE>(.*?) </CODE>";

Now, to resume searching after the text that matched, we have to change the code. To do that, instead of searching the entire line from the file, we use a pair of iterators that point at the contents of the line. After a match, we advance the first iterator to point at the character immediately following the match and search again.

Figure. Repeated Searches (regexiter/repeated.cpp)

#include <regex>
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h>
using std::tr1::regex; using std::tr1::regex_search;
using std::tr1::smatch;
using std::string; using std::ifstream; using std::cout;

static void show_matches (const char *fname)
  { //scan file named by fname, line by line
  ifstream input (fname);
  string str;
  smatch match;
  string expr = "<CODE>(.*?) </CODE>";
     regex rgx (expr, regex::icase);
     while (getline (input, str))
       { // check line for match
       string::const_iterator first = str.begin ();
       string::const_iterator second = str.end ();
       while (regex_search (first, second, match, rgx))
       { // show match, then skip past it
         cout << match[1]<< '\n';
         first += match.position () + match.length ();
         }
       }
     }

int main(int argc, char  *argv[])
  { // search for code snippets in text file
  if (argc != 2)
    { // wrong number of arguments
    cout << "Usage : snippets <filename>\n";
    return EXIT_FAILURE;
    }
  try
    { // search the file
    show_matches (argv [1]);
    }
  catch (...)
    { // something went wrong
    cout << "Error\n";
    return EXIT_FAILURE;
    }
  return 0;
  }

Don't be fooled, though: Repetitive searches aren't usually that easy to write. For example, if the regular expression begins with a "^", simply restarting the search after a match, as the previous example does, can lead to wrong answers. The following program searches the target text "abcdef" for subsequences that match the regular expression "^(abc|def)". The only one is the initial "abc", but the program finds two, reporting that "def" also matches.

Figure. Naive Search Doesn't Work (regexiter/naive.cpp)

#include <regex>
#include <iostream>
#include <string>
using std::tr1::regex; using std::tr1::regex_search;
using std::tr1::smatch;
using std::string; using std::cout;

int main()
   { // search for regular expression in text
     string str = "abcdef";
     string::const_iterator first = str.begin ();
     string::const_iterator second = str.end ();
     smatch match;
     string expr = "^(abc | def)";
     regex rgx(code);
     while (regex_search(first, second, match, rgx))
      { // check range for match
        cout << match [0] << '\n';
        first+=match.position () + match.length ();
        }
     return 0;
     }

In this chapter, we look first at the complications that any repetitive search has to allow for and the techniques for fixing problems (Section 19.1). Then we look at prewritten solutions, in the form of the class template regex_iterator (Section 19.2) and the class template regex_token_iterator (Section 19.3).



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows