June 23, 2011, 7:24 a.m.
posted by virtualbrain
Engine Templates in the TR1 LibraryThe TR1 library provides six templates that define engines. Sections 13.2.1 13.2.4 describe the basic engines; Sections 13.2.5 and 13.2.6, compound engines. The six templates satisfy all the requirements for engines, so you can use them as patterns if you write your own engine. You don't have to follow any of these patterns, though, so long as your engine provides all the required operations. The library also provides a class named random_device (Section 13.3) and nine predefined engines, in the form of typedefs for particular template specializations (Section 13.4). In the discussion that follows, some of the details are in the template declaration itself. In the text following the declaration, I'll explain the things that need more details than can easily fit in the declaration, including the details of the algorithm that the template implements and the contents of the engine's state. Every engine has a state that is used to generate successive values. The size of an engine's state and its text representation, written by the stream inserter and read by the stream extractor, are exactly specified by TR1.^{[8]} As a result, applications compiled with different implementations of the library and running on different hardware can share an engine's state. This is used for parallelizing calculations and for checkpointing. Parallelizing a calculation means distributing parts of the calculation to different processors, which requires being able to start an engine at a known state, even with a different implementation of that engine. Checkpointing means saving a long-running application's state periodically so that the calculation can be resumed on the same machine if it gets interrupted. This does not require portability across systems.
13.2.1. Basic Engine linear_congruentialtemplate<class UType, UType A, UType C, UType M> class linear_congruential { public: // engine interface typedef UType result_type; linear_congruential() { seed(); } explicit linear_congruential(unsigned long x0) { seed(x0); } template<class Seeder> linear_congruential (Seeder& s) { seed(s); } void seed(unsigned long x0 = 1); template<class Seeder> void seed(Seeder& s) { seed(s()); } result_type min() const; result_type max() const; result_type operator()(); // type-specific members static const UType multiplier = A; static const UType increment = C; static const UType modulus = M; };
This template implements the commonly used linear congruential algorithm. If you choose the template arguments carefully, it can produce long random sequences. If you choose badly, the sequences can be short, or they can show other patterns. The predefined generators minstd_rand0 (see Section 13.4.1) and minstd_rand (see Section 13.4.2) are well understood and have good properties. Linear congruential engines as a rule are fast, especially if the value of M is the same as the size of an unsigned integral type, which avoids having to do the final division on most hardware. The implementation should use an efficient integral type to hold intermediate values when it computes a new value. However, if M is larger than the square root of the system's largest unsigned integral value, the engine can be quite slow, because the implementation may have to resort to multiple-precision arithmetic. 13.2.2. Basic Engine mersenne_twistertemplate<class UType, int W, int N, int R, UType A, int U, int S, UType B, int T, UType C, int L> class mersenne_twister { public: // engine interface typedef UType result_type; mersenne_twister() { seed(); } explicit mersenne_twister(unsigned long x0) { seed(x0); } template<class Seeder> mersenne_twister(Seeder& s) { seed(s); } void seed(); void seed(unsigned long x0 = 5489); template<class Seeder> void seed(Seeder& s); result_type min() const; result_type max() const; result_type operator()(); // type-specific members static const int word_size = W; static const int state_size = N; static const int shift_size = M; static const int mask_bits = R; static const int UType parameter_a = A; static const int output_u = U; static const int output_s = S; static const UType output_b = B; static const int output_t = T; static const UType output_c = C; static const int output_l = L; };
The template implements the Mersenne Twister algorithm. The most commonly used version of the Mersenne Twister is implemented by mt19937 (see Section 13.4.3). The algorithm is fast, but it has a large state, requiring N*W bits of storage.^{[11]} An implementation typically updates all the state values at once, rather than changing them one at a time, as the algorithm specifies. This approach is faster, but it requires twice as much storage space.
13.2.3. Basic Engine subtract_with_carrytemplate<class IType, IType M, int S, int R> class subtract_with_carry { public: // engine interface typedef IType result_type; subtract_with_carry() { seed(); } explicit subtract_with_carry(unsigned long x0) { seed(x0); } template<class Seeder> subtract_with_carry(Seeder& s) { seed(s); } void seed(unsigned long x0 = 19780503); template<class Seeder> void seed(Seeder& s); result_type min() const; result_type max() const; result_type operator()(); // type-specific members static const IType modulus = M; static const int short_lag = S; static const int long_lag = R; };
The engine generates values without multiplication, so it is quite fast.^{[12]} With suitable lags, it generates pretty good random sequences. It can be improved significantly by discarding some of the generated values, as done by ranlux3(see Section 13.4.4) and ranlux4 (see Section 13.4.5).
13.2.4. Basic Engine subtract_with_carry_01template<class RType, IType W, int S, int R> class subtract_with_carry_01 { public: // engine interface typedef RType result_type; subtract_with_carry_01() { seed(); } explicit subtract_with_carry_01 (unsigned long x0) { seed (x0); } template<class Seeder> subtract_with_carry_01 (Seeder& s) { seed (s); } void seed(unsigned long x0 = 19780503); template<class Seeder> void seed(Seeder& s); result_type min() const; result_type max() const; result_type operator() (); // type-specific members static const IType word_size = W; static const int short_lag = S; static const int long_lag = R; };
The engine generates floating-point values in the range [0.0, 1.0). If you wade through the details of the specification, you'll see that with a binary floating-point implementation, the returned values have no more than W bits in their fraction. Other floating-point representations may need more bits but must return the same values as the binary representation. Just like subtract_with_carry, this algorithm is quite fast. Its predefined versions are ranlux_base_01 (Section 13.4.6) and ranlux64_base_01(Section 13.4.7). Also like subtract_with_carry, with suitable lags, it generates pretty good random sequences, which can be improved by discarding some of the values, as done by ranlux3_01 (see Section 13.4.8) and ranlux4_01 (see Section 13.4.9). 13.2.5. Compound Engine discard_blocktemplate<class Eng, int P, int R> class discard_block { public: // engine interface typedef typename base_type::result_type result_type; discard_block() : eng(), n(0) {} explicit discard_block(const base_type& e) : eng(e), n(0) {} explicit discard_block(unsigned long x0) : eng(x0), n(0) {} template<class Gen> discard_block (Gen& g) : eng(g), n(0) {} void seed() { eng. seed(); } template<class Gen> void seed(Gen& g) { eng.seed (g); } const base_type& base() const { return eng; } result_type min() const { return eng.min(); } result_type max() const { return eng.max(); } result_type operator()(); // type-specific members typedef Eng base_type; static const int block_size = P; static const int used_block = R; // exposition only: private: Eng eng; int n; };
The engine can sometimes be used to improve the quality of another engine.^{[14]} See, for example, ranlux3 (Section 13.4.4), ranlux4 (Section 13.4.5), ranlux3_01 (Section 13.4.8), and ranlux4_01 (Section 13.4.9).
13.2.6. Compound Engine xor_combinetemplate<class Eng1, int S1, class Eng2, int S2> class xor_combine { public: // engine interface typedef see below result_type; xor_combine() : eng1 (), eng2() {} xor_combine (const base1_type& e1, const base2_type& e2) : eng1 (e1), eng2 (e2) {} xor_combine(unsigned long x0) : eng1 (x0), eng2 (x0 + 1) {} template <class Gen> xor_combine(Gen& g) : eng1(g), eng2(g) {} void seed() { eng1.seed(); eng2.seed(); } template<class Gen> void seed(Gen& g) { eng1.seed(g); eng2.seed(g); } const base1_type& base1() const { return eng1; } const base2_type& base2() const { return eng2; } result_type min() const; result_type max() const; result_type operator()(); // type-specific members typedef Eng1 base1_type; typedef Eng2 base2_type; static const int shift1 = S1; static const int shift2 = S2; // exposition only: private: Eng1 eng1; Eng2 eng2; };
The engine combines the results of two engines, using left shifts and bitwise exclusive OR. Except in unusual circumstances, at least one of the two shift values should be 0. For best results, the engine whose value is not shifted should produce values ranging from 0 to 2^{n} - 1, and the shifted values from the other engine should be somewhere in that range. |
- Comment