July 8, 2011, 6:37 a.m.

posted by sunny

### Bits, Bytes, and Words

The key to understanding radix sorts is to recognize that (i) computers generally are built to process bits in groups called machine words, which are often grouped into smaller pieces call bytes; (ii) sort keys also are commonly organized as byte sequences; and (iii) byte values can also serve as array indices or machine addresses. Therefore, it will be convenient for us to work with the following abstractions.

**Definition 10.1**
A
byte
is a fixed-length sequence of bits; a
string
is a variable-length sequence of bytes; a
word
is a fixed-length sequence of bytes.

In radix sorting, depending on the context, a key may be a word or a string. Some of the algorithms that we consider in this chapter depend on the keys being fixed length (words); others are designed to adapt to the situation when the keys are variable length (strings).

A typical machine might have 8- or 16-bit bytes and 32- or 64-bit words, and Java has built-in primitive data types whose numbers of bits are explicitly specified, but we use the terms byte, string, and word in a generic sense in this chapter, because it will be convenient for us to consider various other byte and word sizes as well (generally small integer multiples or fractions of built-in machine sizes).

Thus, we use machine- and application-dependent defined constants for the number of bits per word and the number of bits per byte; for example,

static final int bitsword = 32; static final int bitsbyte = 8; static final int bytesword = bitsword/bitsbyte; static final int R = 1 << bitsbyte;

Also included in these definitions for use when we begin looking at radix sorts is the constant `R`, which is the number of different byte values. When using these definitions, we generally assume that `bitsword` is a multiple of `bitsbyte`, that the number of bits per machine word is not less than (typically, is equal to) `bitsword`, and that bytes are individually addressable.

Most computers have bitwise and and shift operations, which we can use to extract bytes from words. In Java, we can directly express the operation of extracting the `B`th byte of an integer key as follows:

static int digit(int key, int B) { return (key >> bitsbyte*(bytesword-B-1)) & (R-1); }

For example, this method would extract byte 2 (the third byte) of a 32-bit number by shifting right 32 - 3 * 8 = 8 bit positions, then using the mask `00000000000000000000000011111111` to zero out all the bits except those of the desired byte, in the 8 bits at the right.

Different computers have different conventions for referring to their bits and bytesâ€”we are considering the bits in a word to be numbered, left to right, from `0` to `bitsword-1`, and the bytes in a word to be numbered, left to right, from `0` to `bytesword-1`. In both cases, we assume the numbering to also be from most significant to least significant.

Another option is to arrange things such that the radix is aligned with the byte size, and therefore a single access will get the right bits quickly. This operation is supported directly for `String` objects in Java: We take `R` to be 2^{16} (since `String` objects are sequences of 16bit Unicode characters) and can access the `B`th character of a `String`
`st` either with the single method invocation `st.charAt(B)` or (after initially using `toCharArry` to convert each string to a key that is a character array) a single array access. In Java this approach could be used for numbers as well, because we are guaranteed that numbers will be represented the same way in all virtual machines. We also need to be aware that byte-access operations of this type might be implemented with underlying shift-and-mask operations similar to the ones in the previous paragraph in some implementations.

At a slightly different level of abstraction, we can think of keys as numbers and bytes as digits. Given a (key represented as a) number, the fundamental operation needed for radix sorts is to extract a digit from the number. When we choose a radix that is a power of 2, the digits are groups of bits, which we can easily access directly using one of the macros just discussed. Indeed, the primary reason that we use radices that are powers of 2 is that the operation of accessing groups of bits is inexpensive. In some computing environments, we can use other radices as well. For example, if a is a positive integer, the bth digit of the radix-R representation of a is

On a machine built for high-performance numerical calculations, this computation might be as fast for general R as for R = 2.

Yet another viewpoint is to think of keys as numbers between 0 and 1 with an implicit decimal point at the left, as shown in Figure. In this case, the bth digit of a is

If we are using a machine where we can do such operations efficiently, then we can use them as the basis for our radix sort. This model also applies when keys are variable length (strings).

Thus, for the remainder of this chapter, we view keys as radix-`R` numbers (with `bitsword`, `bitsbyte`,and `R` not specified) and use a `digit` method to access digits of keys, with confidence that we will be able to develop appropriate implementations of `digit` for particular applications. As we have been doing with `less`, we might want to implement `digit` as a single-parameter class method in our `myItem` class, then implement a two-parameter static `digit` which invokes that method; or, for primitive-type items, we can substitute the primitive type name for item type names in our code and use a direct implementation of `digit` like the one given earlier in this section. For clarity, we use the name `bit` instead of `digit` when `R` is 2.

**Definition 10.2**
A
key
is a radix-R number, with digits numbered from the left (starting at 0).

In light of the examples that we just considered, it is safe for us to assume that this abstraction will admit efficient implementations for many applications on most computers, although we must be careful that a particular implementation is efficient within a given hardware and software environment.

We assume that the keys are not short, so it is worthwhile to extract their bits. If the keys are short, then we can use the key-indexed counting method of Chapter 6. Recall that this method can sort N keys known to be integers between 0 and R
- 1 in linear time, using one auxiliary table of size R for counts and another of size N for rearranging records. Thus, if we can afford a table of size 2^{w}, then w-bit keys can easily be sorted in linear time. Indeed, key-indexed counting lies at the heart of the basic MSD and LSD radix-sorting methods. Radix sorting comes into play when the keys are sufficiently long (say, w = 64) that using a table of size 2^{w} is not feasible.

#### Exercises

10.1 How many digits are there when a 32-bit quantity is viewed as a radix-256 number? Describe how to extract each of the digits. Answer the same question for radix 2

^{16}.10.2 For N = 10

^{3}, 10^{6}, and 10^{9}, give the smallest byte size that allows any number between 0 and N to be represented in a 4-byte word.10.3 Implement a class

wordItemthat extends themyItemADT of Section 6.2 to include adigitmethod as described in the text (and the constantsbitsword,bitsbyte,bytesword, andR), for 64-bit keys and 8-bit bytes.10.4 Implement a class

bitsItemthat extends themyItemADT of Section 6.2 to include abitmethod as described in the text (and the constantsbitsword,bitsbyte,bytesword,andR), for 10-bit keys and 1-bit bytes.10.5 Implement a comparison method

lessusing thedigitabstraction (so that, for example, we could run empirical studies comparing the algorithms in Chapters 6 and 9 with the methods in this chapter, using the same data).10.6 Design and carry out an experiment to compare the cost of extracting digits using bit-shifting and arithmetic operations on your machine. How many digits can you extract per second, using each of the two methods? Note: Be wary; your compiler might convert arithmetic operations to bit-shifting ones, or vice versa!

10.7 Write a program that, given a set of N random decimal numbers (R = 10) uniformly distributed between 0 and 1, will compute the number of digit comparisons necessary to sort them, in the sense illustrated in Figure. Run your program for N = 10

^{3}, 10^{4}, 10^{5}, and 10^{6}.10.8 Answer Exercise 10.7 for R = 2, using random 32-bit quantities.

10.9 Answer Exercise 10.7 for the case where the numbers are distributed according to a Gaussian distribution.

- Comment