The sub_match Class Template






18.2. The sub_match Class Template

A sub_match object holds a Boolean value named matched that is true if the sub_match object points to a character sequence that was part of a successful match. In that case, its two iterator members, first and second, point to the beginning of the sequence and one past the end of the sequence, respectively. That is, given a sub_match object sub, if sub.matched is true, the half-open sequence [sub.first, sub.second) delimits the matching character sequence. Your code can create sub_match objects, but ordinarily, you'll use the ones contained in a match_results object.

template<class BidIt>
  class sub_match : public std::pair<BidIt,BidIt>{
public:
  bool matched;


difference_type length() const;
basic_string <value_type> str() const;
operator basic_string<value_type>() const;


int compare(const sub_match& right) const;
int compare(const basic_string<value_type>& right) const;
int compare(const value_type *right) const;


typedef BidIt iterator;
typedef typename iterator_traits<BidIt>::value_type
  value_type;
typedef typename iterator_traits<BidIt>::difference_type
  difference_type;
};

The template argument BidIt must be a type that meets the requirements for a bidirectional iterator. Ordinarily, this argument comes from the template match_results that holds the sub_match objects, so as long as you provide a bidirectional iterator type to match_results, this requirement will be satisfied.

The class template sub_match<BidIt> is derived from std::pair<BidIt, BidIt>. This base class provides the two members, first and second, that hold the two iterator values. The class template also has a Boolean member, matched, that holds true if the iterators point to a character sequence that was part of a successful match. That sequence can be emptythat is, first and second are equalfor a zero-length match. The sequence will also be empty if the corresponding capture group was not part of a successful match. In this case, the member matched will hold the value false, and the members first and second will point to the end of the target sequence.

A zero-length match can occur when a capture group consists solely of an assertion or of a repetition that allows zero repeats. For example:

  • "^" matches the target sequence "". The sub_match object that designates the full match holds two iterators that both point to the first position in the target sequence, and its member matched holds true.

  • "a(b*)a" matches the target sequence "aa". The sub_match object that designates the capture group holds iterators that both point to the second character in the target sequence, and its member matched holds TRue.

  • "(a)|b" matches the target sequence "b". The capture group is not part of the match. The sub_match object that designates the capture group holds iterators that point to the end of the target sequenceand thus compare equaland its member matched holds false.

Several of the member functions of sub_match<BidIt> take arguments or return objects of type basic_string<value_type>. As we'll see, value_-type is a typedef for the character type that the iterators point to. So basic_string<value_type> is a basic_string object that holds characters. When the text you're searching consists of ordinary char objects, basic_-string<value_type> is basic_string<char>, or, more simply, string.

Nested Types

typedef BidIt iterator;
typedef typename iterator_traits<BidIt>::value_type
  value_type;
typedef typename iterator_traits<BidIt>::difference_type
  difference_type;

The first type is a synonym for the first template type argument. The second and third types are synonyms for the iterator type's associated value_type and difference_type, respectively.

These type names can be convenient when you need to peer into the contents of the matching text. The type name iterator names the type of the iterators that the sub_match type holds; value_type is the character type that the iterators point to; and difference_type can hold the difference between two iterator values. For example:

typedef std::tr1::sub_match<const char*> cmatch;
cmatch::iterator iter;       // iter has type const char*
cmatch::value_type ch;       // ch has type char
cmatch::difference_type d;    // d has type std::ptrdiff_t

Access

bool matched;
BidIt first;      // inherited from pair
BidIt second;     // inherited from pair

If the capture group corresponding to the sub_match object was part of a successful match, the member matched holds true, and the members first and second designate the character range in the target sequence that matched the capture group. If the capture group was not part of a successful match, the member matched holds false, and the members first and last point to the end of the target sequence.

A newly constructed sub_match object has not been part of a successful match, so its matched member will hold false. As we'll see later, a call to a search algorithm that doesn't find a match leaves the sub_match objects in a match_-results object in an unspecified state, so you cannot count on any particular pattern of values when a search fails. If a search succeeds, the member matched in each sub_match object that was part of the match holds TRue, and the member matched in each sub_match object that was not part of the match holds false.

Figure. Objects of type sub_match (regexres/subobjects.cpp)

#include <regex>
#include <algorithm>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <string>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::match_results; using std::tr1::sub_match;
using std::copy;
using std::ostream_iterator; using std::string;
using std::cout;using std::setw;


template <class BidIt>
void show(const char *title,const sub_match <BidIt>& sm)
  {
  typedef sub_match<BidIt>::value_type MyTy;
  cout << setw(20) << title << ": ";
  if (sm.matched)
    copy(sm.first , sm.second,
      ostream_iterator<MyTy>(cout));
  else
    cout << "[no match]";
  cout << '\n';
  }


int main()
  {
  regex rgx("(a+)|(b+)");
  string tgt("bbb");
  match_results<string::iterator> match;
  show("no search" , match[0]);
  if (!regex_match(tgt.begin(), tgt.end(), match, rgx))
    cout << "search failed\n";
  else
    { // search succeeded, capture group 1 not part of match
    show("full match" , match[0]);
    show("capture group 1", m[1]);
    show("capture group 2", m[2]);
    }
  return 0;
  }

In this example, the expression match[0] returns a reference to the sub_-match object that represents the full match, and match[1] and match[2] return references to the sub_match objects that represent the subsequences that matched the first and second capture groups, respectively.

difference_type length() const;

The member function returns 0 if the member matched holds false; otherwise, distance(first, second).

This function returns the number of characters in the matching sequence delimited by [first, second) and returns 0 if the corresponding capture group was not part of the match. The function also returns 0 for a zero-length match, so don't use this return value to distinguish between those two cases. Use the member matched.

basic_string<value_type> str() const;
operator basic_string<value_type>() const;

The first member function returns an empty string object if matched holds false; otherwise, it returns basic_string<value_type>(first, second). The second member function returns str().

These member functions convert the matching sequence into a basic_string object. This will often be more convenient than using the raw iterators first and second. Here's the previous example, with the function show rewritten to use str().

Figure. String Conversions (regexres/strings.cpp)

#include <regex>
#include <iomanip>
#include <iostream>
#include <string>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::match_results; using std::tr1::sub_match;
using std::string;
using std::cout; using std::setw;


template <class BidIt>
void show(const char *title,const sub_match<BidIt>& sm)
  {
  cout << setw(20) << title << ":";
  if (sm.matched)
    cout << sm.str() << '\n';
  else
    cout << "[no match]\n";
  }

int main()
  {
  regex rgx("(a+)|(b+)");
  string tgt("bbb");
  match_results<string::iterator> m;
  show("no search", m[0]);
  if (!regex_match(tgt.begin() ,  tgt . end() , m ,  rgx))
    cout << "search failed\n";
  else
    { // search succeeded, capture group 1 not part of match
    show("full match", m[0]);
    show("capture group 1", m[1]);
    show("capture group 2", m[2]);
    }
  return 0;
  }

Comparison

Member Functions
int compare(const sub_match& right) const;
int compare(const basic_string<value_type>& right) const;
int compare(const value_type *right) const;

The first member function returns str().compare(right.str()). The second and third member functions return str().compare(right).

That is, these functions do a lexicographical comparison of the matched sequence and their argument,[1] returning a negative value if the matched sequence comes before the argument, zero if they are equal, and a positive value if the matched sequence comes after the argument.

[1] Technically, this comparison requires converting the sub_match object to a basic_string object, then calling its compare member function. That's a fairly expensive operation, which can usually be skipped. For sequences of characters of type char and wchar_t, the corresponding string types are basic_string<char> and basic_string<wchar_t>. Portable code can't tell, in these cases, whether the conversion to string was done, so under the as-if rule, the implementation doesn't have to do the conversion so long as it returns the right answer. For user-defined character types, the conversion is necessary because users are allowed to specialize basic_string for user-defined types. Such a specialization could make notes about whether it was used, so the as-if rule can't be used to eliminate the conversion.

Figure. The compare Member Functions (regexres/compare.cpp)

#include <regex>
#include <iostream>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::csub_match; using std::tr1::cmatch;
using std::cout;

static char *blocked_sites[] =
{ // block list; any resemblance between the names here
  // and real URLs is probably accidental
"www.idontwantmykidshere.com",
"www.lotsofxxxstuff.com",
"www.nra.org"
};
const int nsites = sizeof(blocked_sites)
  / sizeof(*blocked_sites);

bool allow(const csub_match& match)
  { // return false if match is on the blocked list
  for (int i = 0; i < nsites; ++i)
    if (match.compare(blocked_sites[i]) == 0)
      return false;
  return true;
  }

bool check_url(const char *url)
  { // return false if URL is not a valid HTTP URL or
    // if the hostname is on the blocked list
  regex rgx("http://([^/: ]+)(:(\\d+))?(/.*)?");
  cmatch match;
  return regex_match(url , match , rgx) && allow(match[1]);
  }

void connect(const char *url)
  { // connect to valid, unblocked URL
  if (check_url(url))
    {
    cout << "Okay to connect: " << url << '\n';
    // remainder of connection code left as exercise for the reader
    }
  else
    cout << "Invalid or blocked URL: "  << url << '\n';
  }

int main()
  { // connect to a couple of sites
  connect("http://www.xxx.com/risque/index.html");
  connect("http://www.petebecker.com/tr1book");
  connect("http:/invalid , for many reasons");
  return 0;
  }

In this example, I simplified the code by using some of the built-in typedefs instead of using the full names of the template instantiations. We'll look at these typedefs later. For now, cmatch is a synonym for match_results<const char*>, which is the appropriate type to hold the results of a search through an array of char. An object of type cmatch, in turn, holds objects of type sub_match<const char*>; the synonym for that one is csub_match.

The function allow does a linear search of the list of blocked URLs, to see whether the hostname passed to it is on the list. The function check_url checks whether its argument is a valid HTTP URL, and, if so, extracts the hostname and calls allow.[2]

[2] That rather hairy regular expression is taken from [Fri02], which explains its limitations and discusses possible improvements.

Nonmember Operators
template<class BidIt>
    bool operator==(const sub_match<BidIt>& left,
      const sub_match<BidIt>& right);

    // also operator!=, operator<, operator<=, operator>, operator>=
template<class BidIt /* maybe more */>
    bool operator==(
      various types left, const sub_match<BidIt>& right);
    // also operator!=, operator<, operator<=, operator>, operator>=
template<class BidIt /* maybe more */>
    bool operator==(
      const sub_match<BidIt>& left, various types right);
    // also operator!=, operator<, operator<=, operator>, operator>=

Each function template operator== returns true only if the argument left designates the same characters, in the same order, as the argument right.

Each function template operator!=(left, right) returns !(left == right).

Each function template operator< returns TRue only if the argument left designates a sequence of characters that lexicographically precedes the sequence of characters designated by the argument right.

Each function template operator<=(left, right) returns !(right < left).

Each function template operator>(left, right) returns right < left.

Each function template operator>=(left, right) returns !(left < right).

In addition to the overloaded member functions named compare, there's along list of operators for comparing sub_match objects to various representations of character sequences. Rather than list all six comparison operators for each pair of types,[3] the preceding synopsis gives the declaration for operator==. The remaining five operators are all declared in the obvious way.

[3] If you want to see the full list, it's in Appendix A.1.

The argument types referred to as various types can be any of the following, where Ty is iterator_traits<BidIt>::value_type:

  • An object of type basic_string<Ty, Traits, Alloc>

  • A pointer of type Ty*

  • A reference to type Ty

That is, you can compare a sub_match<BidIt> object to another sub_-match<BidIt> object, to a basic_string object that holds the same character type, to a null-terminated character string, and to a single character. Of course, the sub_match<BidIt> object can be on either side of the comparison.

Figure. Comparison Operators (regexres/operators.cpp)

#include <regex>
#include <iostream>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::csub_match; using std::tr1::cmatch;
using std::cout;

static char *blocked_sites[] =
{ // block list; any resemblance between the names here
  // and real URLs is probably accidental
"www.idontwantmykidshere.com",
"www.lotsofxxxstuff.com",
"www.nra.org"
};
const int nsites = sizeof(blocked_sites)
  / sizeof(*blocked_sites);

bool allow(const csub_match& match)
  { // return false if match is on the blocked list
  for (int i = 0; i < nsites; ++i)
    if (match == blocked_sites[i])
      return false;
    else if (match < blocked_sites[i])
      return true;
  return true;
  }

bool check_url(const char *url)
  { // return false if URL is not a valid HTTP URL or
    // if the hostname is on the blocked list
  regex rgx("http://([^/:]+)(:(\\d+))?(/.*)?");
  cmatch match;
  return regex_match(url , match , rgx) && allow(match[1]);
  }

void connect(const char *url)
  { // connect to valid, unblocked URL
  if (check_url(url))
    {
    cout << "Okay to connect: "<< url <<'\n';
    // remainder of connection code left as exercise for the reader
    }
  else
    cout << "Invalid or blocked URL: " << url << '\n';
  }

int main()
  { // connect to a couple of sites
  connect("http://www.xxx.com/risque/index.html");
  connect("http://www.petebecker.com/tr1book");
  connect("http:/invalid,for many reasons");
  return 0;
  }

This example is a lot like the previous one but with two differences, both in the function allow. First, this example uses operator== to check whether the hostname is in the blocked list. Second, this example uses operator< to take advantage of the list's being in alphabetical order to cut the linear search short when it reaches a name that comes after the target hostname.



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows