Engine Templates in the TR1 Library






Engine Templates in the TR1 Library

The TR1 library provides six templates that define engines. Sections 13.2.1 13.2.4 describe the basic engines; Sections 13.2.5 and 13.2.6, compound engines. The six templates satisfy all the requirements for engines, so you can use them as patterns if you write your own engine. You don't have to follow any of these patterns, though, so long as your engine provides all the required operations.

The library also provides a class named random_device (Section 13.3) and nine predefined engines, in the form of typedefs for particular template specializations (Section 13.4).

In the discussion that follows, some of the details are in the template declaration itself. In the text following the declaration, I'll explain the things that need more details than can easily fit in the declaration, including the details of the algorithm that the template implements and the contents of the engine's state.

Every engine has a state that is used to generate successive values. The size of an engine's state and its text representation, written by the stream inserter and read by the stream extractor, are exactly specified by TR1.[8] As a result, applications compiled with different implementations of the library and running on different hardware can share an engine's state. This is used for parallelizing calculations and for checkpointing. Parallelizing a calculation means distributing parts of the calculation to different processors, which requires being able to start an engine at a known state, even with a different implementation of that engine. Checkpointing means saving a long-running application's state periodically so that the calculation can be resumed on the same machine if it gets interrupted. This does not require portability across systems.

[8] Of course, this doesn't mean that every implementation must exactly follow these details, just that it must act as if it did. In particular, several of the TR1 engines can be implemented more efficiently by holding more state values than TR1 requires. However, in all cases, the text written to a stream must be as specified.

13.2.1. Basic Engine linear_congruential

template<class UType, UType A, UType C, UType M>
  class linear_congruential {
public:
  // engine interface
  typedef UType result_type;
  linear_congruential() { seed(); }
  explicit linear_congruential(unsigned long x0) { seed(x0); }
  template<class Seeder>
    linear_congruential (Seeder& s) { seed(s); }
  void seed(unsigned long x0 = 1);
  template<class Seeder>
    void seed(Seeder& s) { seed(s()); }
  result_type min() const;
  result_type max() const;
  result_type operator()();

  // type-specific members
  static const UType multiplier = A;
  static const UType increment = C;
  static const UType modulus = M;
  };

The class template describes a simple engine that produces values of type UType, using the recurrence relation xi=(A*xi-1+C) mod M.

If the value of the template argument M is 0, it is treated as if it were std::numeric_limits<UType>::max() + 1.[9] The type argument UType must be an unsigned integral type that is large enough to store values up to M - 1. The values of the template arguments A and C must be in the range [0, M-1].

The size of the engine's state is 1, and its value is the last value returned by operator() or, if no call has been made to operator(), the seed value.

The member function seed(unsigned long x0) sets the state of the engine to the value x0, unless x0 is 0 and C mod M is 0, in which case it sets the state to 1.[10]

[9] Conceptually, that is. That value can't be represented in an object of type UType.

[10] This avoids the degenerate case of a generator that produces nothing but 0 values.

This template implements the commonly used linear congruential algorithm. If you choose the template arguments carefully, it can produce long random sequences. If you choose badly, the sequences can be short, or they can show other patterns. The predefined generators minstd_rand0 (see Section 13.4.1) and minstd_rand (see Section 13.4.2) are well understood and have good properties.

Linear congruential engines as a rule are fast, especially if the value of M is the same as the size of an unsigned integral type, which avoids having to do the final division on most hardware. The implementation should use an efficient integral type to hold intermediate values when it computes a new value. However, if M is larger than the square root of the system's largest unsigned integral value, the engine can be quite slow, because the implementation may have to resort to multiple-precision arithmetic.

13.2.2. Basic Engine mersenne_twister

template<class UType, int W, int N, int R,
    UType A, int U, int S,
    UType B, int T, UType C, int L>
  class mersenne_twister {
public:
  // engine interface
  typedef UType result_type;
  mersenne_twister() { seed(); }
  explicit mersenne_twister(unsigned long x0) { seed(x0); }
  template<class Seeder>
    mersenne_twister(Seeder& s) { seed(s); }
  void seed();
  void seed(unsigned long x0 = 5489);
  template<class Seeder>
    void seed(Seeder& s);
  result_type min() const;
  result_type max() const;
  result_type operator()();

  // type-specific members
  static const int word_size = W;
  static const int state_size = N;
  static const int shift_size = M;
  static const int mask_bits = R;
  static const int UType parameter_a = A;
  static const int output_u = U;
  static const int output_s = S;
  static const UType output_b = B;
  static const int output_t = T;
  static const UType output_c = C;
  static const int output_l = L;
  };

The class template describes a simple engine that produces values of type UType by applying the following recurrence relation:

  • yi = (UM &xi-N)|(LM &xi-(N-1))

  • If the lowest bit of yi is set, xi = xi-(N-M) Λ(yi >>1)ΛA

  • Otherwise, xi = xi-(N-M) Λ(yi >>1)

then computing the result, o(xi), from the new value of xi, as follows:

  • z1i = xi Λ(xi >>U)

  • z2i = z1i Λ((z1i <<S)&B)

  • z3i = z2i Λ(z2i <<T)&C)

  • o(xi)= z3i Λ(z3i >>1)

where all the computations are performed modulo 2W, UM is a value of type UType with only its upper W-R bits set, and LM is a value of type UType with only its lower R bits set.

The type argument UType must be an unsigned integral type whose width is at least W bits; that is, it must be able to hold values up to 2W - 1. In addition:

  • 1 M N

  • 0 R, U, S, T, L W

  • 0 A, B, C 2 W - 1

The size of the engine's state is N, and its value is the sequence of values xiN, ...,xi1, in that order.

The member function seed(unsigned long x0) sets x-N to x0 mod 2W, then, iteratively, sets

x-N+i = [i+1812433253((x-N+i-1>>(W -2))Λ(x-N+i-1))] mod 2W

for i = 1 ...N - 1.

The template member function seed(Gen& g) sets the object's state to the successive values g() mod 2W, respectively.

The template implements the Mersenne Twister algorithm. The most commonly used version of the Mersenne Twister is implemented by mt19937 (see Section 13.4.3).

The algorithm is fast, but it has a large state, requiring N*W bits of storage.[11] An implementation typically updates all the state values at once, rather than changing them one at a time, as the algorithm specifies. This approach is faster, but it requires twice as much storage space.

[11] For example, the most common form of the Mersenne Twister, embodied in mt19937, requires 624 32-bit values to store its state.

13.2.3. Basic Engine subtract_with_carry

template<class IType, IType M, int S, int R>
  class subtract_with_carry {
public:
  // engine interface
  typedef IType result_type;
  subtract_with_carry() { seed(); }
  explicit subtract_with_carry(unsigned long x0) { seed(x0); }
  template<class Seeder>
    subtract_with_carry(Seeder& s) { seed(s); }
  void seed(unsigned long x0 = 19780503);
  template<class Seeder>
    void seed(Seeder& s);
  result_type min() const;
  result_type max() const;
  result_type operator()();

  // type-specific members
  static const IType modulus = M;
  static const int short_lag = S;
  static const int long_lag = R;
  };

The class template describes a simple engine that produces values of type IType by applying the recurrence relation xi = (xi-S - xi-R - cyi-1) mod M, where cyi = 1 if xi-S - xi-R - cyi-1 < 0; otherwise, cyi = 0.

The type argument IType must be an integral type large enough to hold values up to M. In addition, 0 < S < R.

The size of the engine's state is R, and its value is the sequence of values xi-R, ...,xi-1, cyi-1, in that order.

The member function seed(unsigned long x0) creates a generator object, lcg, of type linear_congruential<unsigned long, 2147483563, 40014, 0>. It initalizes it with the value x0 if x0 is not zero; otherwise, with the value 19780503. It then generates successive state values x-R,...,x-1 with lcg() mod M, then sets cy-1 to 1 if x-1 is 0; otherwise, to 0.

The member function template seed(Gen& g) accumulates unsigned 32-bit values produced by calling g() to generate successive state values x-R,...,x-1, then sets cy-1 to 1 if x-1 is 0, otherwise to 0. Formally, with N=(M+31)/32, the function uses g() to generate N*R successive values Z0,Z1,...,ZN*R-1 and initializes the engine's state x-R,...,x-1 to the values (Z0· 2 0+· · · +ZN-1· 232(N-1)) mod M, ...,(ZN*R-N 20+... +ZN*R-1 232(N-1)) mod M.

The engine generates values without multiplication, so it is quite fast.[12] With suitable lags, it generates pretty good random sequences. It can be improved significantly by discarding some of the generated values, as done by ranlux3(see Section 13.4.4) and ranlux4 (see Section 13.4.5).

[12] Don't worry about the modulus operation; mathematically, that requires division, but in fact, it's simply a test and a subtraction.

13.2.4. Basic Engine subtract_with_carry_01

template<class RType, IType W, int S, int R>
  class subtract_with_carry_01 {
public:
  // engine interface
  typedef RType result_type;
  subtract_with_carry_01() { seed(); }
  explicit subtract_with_carry_01 (unsigned long x0)
    { seed (x0); }
template<class Seeder>
  subtract_with_carry_01 (Seeder& s) { seed (s); }
void seed(unsigned long x0 = 19780503);
template<class Seeder>
  void seed(Seeder& s);
result_type min() const;
result_type max() const;
result_type operator() ();

// type-specific members
static const IType word_size = W;
static const int short_lag = S;
static const int long_lag = R;
};

The class template describes a simple engine that produces values of type RType by applying the recurrence relation xi = (xiS - xiR - cyi1) mod 1, where cyi = 2-W if xiS - xiR - cyi1 < 0; otherwise, cyi = 0.

The type argument RType must be a floating-point type with enough precision to hold all the bits in the generated values. In addition, 0 < S < R.

The size of the engine's state is R. With N = |(W + 31)/32 |, each value xi-k can be represented as (zk,0 + zk,1 ·2 32 + ... + zk,N1 ·232 (N-1))/2W . The textual representation of the state is the textual representation of the values zR,0, ...,zR,N-1, ...,z1,0, ...,z1,N-1,cyi-1 ·2W.

The member function seed(unsigned long x0) creates a generator object, lcg, of type linear_congruential<unsigned long, 2147483563, 40014, 0>. It initalizes it with the value x0 if x0 is not zero; otherwise, with the value 19780503. It then generates successive state values x-R, ...,x-1 with lcg()· 2W mod 1, then sets cy-1 to 2-W if x-1 is 0; otherwise, to 0.

The member function template seed(Gen& g) accumulates unsigned 32-bit values produced by calling g() to generate successive state values x-R, ...,x-1, then sets cy-1 to 2-W if x-1 is 0, otherwise to 0. Formally, with N=(M+31)/32, the function uses g() to generate N*R successive values Z0,Z1,...,ZN*R-1 and initializes the engine's state x-R,...,x-1 to the values (Z0·20+· · ·+ZN-1·232(N-1))·2-W mod 1,...,(ZN*R-N ·20+· · · +ZN*R-1 232(N-1)) 2-W mod 1.

The engine generates floating-point values in the range [0.0, 1.0). If you wade through the details of the specification, you'll see that with a binary floating-point implementation, the returned values have no more than W bits in their fraction. Other floating-point representations may need more bits but must return the same values as the binary representation.

Just like subtract_with_carry, this algorithm is quite fast. Its predefined versions are ranlux_base_01 (Section 13.4.6) and ranlux64_base_01(Section 13.4.7). Also like subtract_with_carry, with suitable lags, it generates pretty good random sequences, which can be improved by discarding some of the values, as done by ranlux3_01 (see Section 13.4.8) and ranlux4_01 (see Section 13.4.9).

13.2.5. Compound Engine discard_block

template<class Eng, int P, int R>
  class discard_block {
public:
  // engine interface
  typedef typename base_type::result_type result_type;
  discard_block() : eng(), n(0) {}
  explicit discard_block(const base_type& e)
    : eng(e), n(0) {}
  explicit discard_block(unsigned long x0)
    : eng(x0), n(0) {}
  template<class Gen>
    discard_block (Gen& g) : eng(g), n(0) {}
  void seed() { eng. seed(); }
  template<class Gen>
    void seed(Gen& g) { eng.seed (g); }
  const base_type& base() const { return eng; }
  result_type min() const { return eng.min(); }
  result_type max() const { return eng.max(); }
  result_type operator()();

  // type-specific members
  typedef Eng base_type;
  static const int block_size = P;
  static const int used_block = R;

  // exposition only:
private:
  Eng eng;
  int n;
  };

The class template describes a compound engine whose operator() returns the first P values returned by the engine eng, then discards the next P - R values returned by eng, and then repeats this cycle as necessary.

The type argument Eng must be an engine type that satisfies the requirements set out at the beginning of this chapter. In addition, 0 < R <= P.[13]

The size of the engine's state is one more than the size of eng's state. Its textual representation is the textual representation of eng, followed by the textual representation of the number of calls to operator() since the beginning of the current cycle.

[13] The TR requires 0 <= R <= P, but passing 0 for the template argument R has the same effect as passing 1. There's no benefit from using 0.

The engine can sometimes be used to improve the quality of another engine.[14] See, for example, ranlux3 (Section 13.4.4), ranlux4 (Section 13.4.5), ranlux3_01 (Section 13.4.8), and ranlux4_01 (Section 13.4.9).

[14] But heed Knuth's warning that modifying random number generators often makes them worse. See [Knu98a, 26].

13.2.6. Compound Engine xor_combine

template<class Eng1, int S1, class Eng2, int S2>
  class xor_combine {
public:
  // engine interface
  typedef see below result_type;
  xor_combine() : eng1 (), eng2() {}
  xor_combine (const base1_type& e1, const base2_type& e2)
     : eng1 (e1), eng2 (e2) {}
  xor_combine(unsigned long x0) : eng1 (x0), eng2 (x0 + 1) {}
  template <class Gen>
     xor_combine(Gen& g) : eng1(g), eng2(g) {}
  void seed() { eng1.seed(); eng2.seed(); }
  template<class Gen>
    void seed(Gen& g) { eng1.seed(g); eng2.seed(g); }
  const base1_type& base1() const { return eng1; }
  const base2_type& base2() const { return eng2; }
  result_type min() const;
  result_type max() const;
  result_type operator()();
  // type-specific members
  typedef Eng1 base1_type;
  typedef Eng2 base2_type;
  static const int shift1 = S1;
  static const int shift2 = S2;

  // exposition only:
private:
  Eng1 eng1;
  Eng2 eng2;
  };

The class template describes a compound engine whose operator() returns (eng1() << shift1) Λ (eng2() << shift2).

The type arguments Eng1 and Eng2 must be engine types that satisfy the requirements set out at the beginning of this chapter. Each type's nested member typedef result_type must be an integral type. The template's nested type xor_combine::result_type is the one of those two nested types that provides the most storage. In addition, 0 <= s1 and 0 <= s2.

The size of the engine's state is the sum of the sizes of the states of eng1 and eng2. Its textual representation is the textual representation of eng1, followed by the textual representation of eng2.

The engine combines the results of two engines, using left shifts and bitwise exclusive OR. Except in unusual circumstances, at least one of the two shift values should be 0. For best results, the engine whose value is not shifted should produce values ranging from 0 to 2n - 1, and the shifted values from the other engine should be somewhere in that range.



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows