# Functional Correctness

### Functional Correctness

Some constructs in a design can be analyzed just by examining the code locally without knowing how it is connected to the rest of the circuit, whereas others require a more global scope. For example, the assignment in the sample in the previous section reveals a potential error. But an error of a loop consisting of combinational gates cannot be detected just by looking at a gate in the loop; the error can only be discovered by examining how the gates are connected (in other words, the structure of the circuits must be checked). Therefore, we classify checks into two categories: syntactical and structural. The first category can be analyzed locally and the second must be done globally.

#### Syntactical Checks

Because design codes are examined not just for errors but also for potential errors or undesirable coding styles, which may vary from project to project, instead of having a set of fixed rules, each project has its own set of design guidelines. It is not possible nor is it fruitful to enumerate all possible rules. Instead, let's study some common rules that all projects should enforce.

##### Operands of unequal width

All operands of an operator must be of equal width. Although most language such as Verilog automatically pad the shorter operands with zeros to make them of equal length, it's a good practice to have the designer make this explicit to avoid careless mistakes. An example of equal and unequal operand widths is shown here. Two operators are presented. The first one is a bitwise AND and the second is the assignment. The first assign is correct because the operands X, Y, and Z are all declared to be 32 bits wide. The second assign is also correct, because the operands X[10:2], Y[8:0], and Z[11:3] are all 9 bits wide, where X[10:2] means the part of bus X from bit 2 to bit 10. The third assign violates our equal-width guideline, because Z is 32 bits wide but the two operands on the right side are 9 bits and 8 bits wide respectively. Finally, in the last assignment, zeros are padded to an operand to make the operands equal width, where {} is a concatenation operator that combines the 1-bit constant 1'b0 with the 8-bit signal Y[8:1] to form a 9-bit signal:

```reg [31:0] X;
reg [31:0] Y;
reg [31:0] Z;

Z = X & Y; //all operands have equal width
Z[11:3] = X[10:2] & Y[8:0]; // all operands have equal width
Z = X[9:1] & Y[8:1]; // error: unequal operand width
Z[8:0] = X[9:1] & {1'b0,Y[8:1]}; // pad with zeros for equal width
```

While operands should have equal width, the width of the variable holding the operation result must be able to accommodate the result. In a 64-bit adder, the width of the sum is 65 bits, with the MSB holding carry-out. Similarly, in a 64-bit multiplier, the width of the product is 128 bits.

##### Implicitly embedded sequential state

Implicitly embedded sequential states result from a specification that has memory but does not explicitly create states. One such case is incompletely specified conditional statements. Conditional statements such as if-then-else and case statements need to have all cases specified. Failure to do so may result in unexpected circuit behavior. Consider the following description:

```case(X) // a 2-bit variable
2'b00: Q = 1'b1;
2'b11: Q = 1'b0;
endcase
```

In this case statement, the value of the 2-bit variable X is compared first with value 2'b00 (a 2-bit value of 00). If it matches, variable Q is assigned a value of 1. Next, X is compared with value 2'b11. If it matches, Q takes on value 0. Here, only two of four possible values are specified. Thus, if X takes on value 01 or 10, Q retains the old valuethat is, this code segment has memory. Thus the code, although it has the appearance of a multiplexor, is of a sequential nature because of the missing two cases (01 and 10). This implied sequential behavior should be made explicit if it is intended to be so; otherwise, the implied sequential behavior should be removed by completing the cases, as shown here:

```case(X) // a 2-bit variable
2'b00: Q = 1'b1;
2'b11: Q = 1'b0;
default: Q = 1'b0;
endcase
```

Similar situations apply to if-then-else statements. This implied sequential behavior is also known as the inferred latch phenomenon, because a latch or sequential element will be needed to preserve the value of Q in the absence of the default statement. To make this inferred latch phenomenon more concrete, let us construct a circuit implementing the previous description. An implementation is shown in Figure, where the flip-flop (FF) is required to preserve the state. Here we assume the case statement resides inside an always block with clock clk.

##### 1. An FF is required to preserve state in an incomplete case statement.

To avoid inferred latches, one can either fully specify all possible values of the conditioning variable or use a directive to instruct the synthesis tool not to generate inferred latches and to treat the unspecified values as don't-cares. This directive is a special comment that the synthesis tool understands and is often termed full_case, meaning that the case statement should be treated as if it were fully specified; if the conditioning variable takes on the unspecified cases, the output variable can take on any value (in other words, don't-cares for the synthesis tool). Note that because this directive appears as a comment, it has no effect on the simulator. If a full-case directive were used in Figure, then it would mean that when X is either 01 or 10, Q can be any value. In other words, with these unspecified cases taking on don't-care values implicitly, the conditional statement is now fully specifiedit has all (full) cases. When a synthesis tool sees a full_case directive, it can assign any value to the unspecified cases, as opposed to retaining the current value, and thus no latches are required to retain the current value. When a don't-care is encountered, a synthesis tool uses it for optimization (for example, producing a smaller circuit). The following example illustrates the use of don't-cares in synthesis.

Consider the following code segment and two versions of its implementation, one using the unspecified cases for the purpose of minimizing gate count and the other not:

```always @(posedge clock) begin
case (S):
3'b000: Q = a;
3'b011: Q = b;
3'b101: Q = a;
3'b111: Q = b;
endcase

assign F = ( (S == 3'b100) | (S == 3'b001) ) ? (a | b) : Q;
```

An implementation is shown in Figure, where the FF and the multiplexor implement the incomplete case statement and other gates, the assign statement.

##### 2. A circuit with an inferred FF implementing the HDL specification with incomplete cases

When we place a full_case directive next to the case statement:

```case (S): // synthesis full case
```

Then the synthesis tool recognizes the don't-care values of Q in the unspecified cases and uses them to optimize gate count. One optimization technique is to choose Q to be (a | b) when S is either 001 or 100. So, Q' and F', the resulting Q and F, become

```case (S): // synthesis full case
3'b000: Q' = a;
3'b011: Q' = b;
3'b101: Q' = a;
3'b111: Q' = b;
3'b001: Q' = a | b;
3'b100: Q' = a | b;
3'b010: 0;
3'b110: 0;
endcase
```

Note that Q' can be written as (S==3'b100)|(S==3'b001))?(a|b):Q, which is just F. Therefore, assign

```F = Q';
```

With this choice of don't-care, the new description yields the much simpler circuit shown in Figure.

##### 3. An optimized circuit making use of unspecified cases

Although using a full_case directive has the advantage of giving a synthesis tool more freedom for optimization, it has the side effect of creating two different versions from the same design description. One is the simulation version, for which the simulator assigns the current value to the output if the unspecified case is encountered. The second is the synthesized circuit, for which the output takes on whatever value the synthesis tool deems optimal. The danger of having a simulation version that is different from the synthesis version is obvious: You are not verifying what you are designing.

Another common source of creating an implied sequential state is a variable read in a block that is not on the block's sensitivity list, and none of the variables on the sensitivity list is a clock. (If one variable on the sensitivity list is a clock, the block models a latch, and is less likely a user error.) This type of description can result from a user who wants to model a combinational block but inadvertently leaves out a variable on the sensitivity list. To see why an incomplete sensitivity list gives rises to a sequential circuit, consider the following code. A change in X does not cause variable Y to be updated until there is a transition on Z. Meaning, before the transition of Z, Y still holds the old value of X; therefore, the block exhibits memory and thus is sequential:

```always @(Z)
Y = X & Z;
```

A warning needs to be issued if such a situation occurs. However, if a warning is issued for every such occurrence, there would be many warnings for intended FFs and other sequential elements. To avoid flooding of warnings, intended sequential elements should always be instantiated from library cells for which such check should be skipped.

##### Overlapping conditional cases

Another anomaly of case statements results from the order the condition variable compares against the case items. In the following case statement, condition variable S compares its value against the case items from top to bottomnamely, first 1?? and then ??1, where ? is a wild card that matches either 1 or 0. When a match is found, the corresponding statement is executed and the remaining case items are skipped. Therefore, when S is 101, S matches 1??, and therefore Q is assigned 0 even though 101 also matches the second case item ??1:

```Casex (S)
3'b1??: Q = 1'b0;
3'b??1: Q = 1'b1;
endcase
```

This first matched/first assigned nature implies a priority encoder in the case statement (in other words, the earlier case items have higher priorities). An N-bit encoder takes in 2N bits, of which only 1 bit is active (for example, 1), and it produces a binary code for the bit (for example, 101 is the output if the fifth bit is 1). A priority encoder can accept more than one active bit, but it only produces the code for the active bit with the highest priority.

If this case statement is given to a synthesis tool, a priority encoder will be produced. A portion of the circuit is shown in Figure. (This case statement is incomplete; therefore, a sequential element is needed to model completely the specification. For clarity, we only show the portion involving the priority encoder.) As dictated by the description, if the MSB of S, S [2], is 1, Q is assigned 0 regardless of the value of its least significant bit (LSB). In this case, the priority encoder gives 1, which selects 0 in the multiplexor. Having a priority encoder for a case statement may not be what the designer had in mind. If S[0] is 1 and S[2] is 0, then the priority encoder gives 0, which makes the multiplexor produce a 1. Note that if we can guarantee that only one case item will match the case variable, then a simple encoder can be used.

##### 4. A priority encoder is included to implement a case statement.

Therefore, if the designer is certain that variable S can never have a value straddling the two ranges of the case items, then she can instruct a synthesis tool not to use a priority encoder. Having only one match also means that comparisons with case items can be done in parallel. To relay this information to a synthesis tool, the designer can place a parallel-case directive (for example, //synthesis parallel case) next to the case statement.

Although the parallel-case directive rescues designers from having an implied priority encoder, there are side effects. First, the directive affects only synthesis tools, not simulators. Thus, two models exist: A simulator sees a prioritized case statement and a synthesis tool interprets a parallel case statement. Again, the model being verified is not what is being designeda dangerous double standard. What would happen if a parallel-case pragma is specified anyway, even though the case items overlap? The synthesized circuit may vary from one synthesis tool to another. An intelligent synthesis tool will use a priority encoder to resolve the conflict; others may just follow the pragma to use an encoder. Therefore, to ensure that the synthesis model and simulation model are consistent, case items should always be mutually exclusive.

##### Connection rules

There are ways to connect and access components, but explicit ones are preferred, thus reducing the chances of inadvertent mistakes. In Verilog there are two ways to connect ports during module instantiation. The implicit way connects instantiated ports with declared ports in a one-to-one correspondence in order. For example, in Figure, module Gate declares ports in order Q1, Q2, and Q3, and instance M1 declares ports in order P1, P2, and P3. Thus, connection by order connects P1 with Q1, P2 with Q2, and P3 with Q3.

##### 5. Port connection by order

Explicit connection connects instantiated ports and declared ports by name. Continuing the previous example, instance M2 has the same port connection as M1, even though its port order is different:

```Gate M2 (.Q2(P2), .Q3(P3), .Q1(P1)); // explicit port connection
```

Instance FF in Figure uses explicit connection, and the block diagram shows the actual connections. Explicit connection is less error prone and is preferred over connection by order.

Furthermore, no expression is allowed during port connection. For example, instead of having

```Gate M2 (.Q2(P2&R), .Q3(P3), .Q1(P1));
```

where input port Q2 is fed with express P2&R. One should create a variable that computes the expression and then gets connected to port Q2, as in

```assign V = P2 & R;
Gate M2 (.Q2(V), .Q3(P3), .Q1(P1));
```

The rationale is to separate logic computation and port connection for a better structured design.

Besides going through a port, the so-called hierarchical path is another way to access a signal. In a design there are usually several levels of hierarchies (or levels of module instantiation). For example, in a central processing unit (CPU) design, the top module is a CPU that contains several modules, such as a floating point unit (FPU), and a memory management unit (MMU). In turn, the FPU may contain several modules, such as an Arithmetic Logic Unit (ALU) and others. One can access net N inside the ALU from anywhere in the design through a hierarchical path, without going through the sequence of ports from the CPU to the FPU to the ALU. To write the value of N to another net M, one simply uses

```assign M = CPU.FPU.ALU.N; // access by hierarchical path
```

In a project, two types of HDL code exist. One type belongs to the design (the code that constitutes the chip) and the other belongs to the test bench (the code that forms the testing structure). It's strongly recommended that hierarchical accesses exist only in the test bench. In other words, the design should contain no hierarchical paths. The rationale is that access in hardware is always through ports; therefore, the design description should reflect this to be consistent. However, a test bench is used only for testing purpose. No real hardware will be fabricated from it. Therefore, it is reasonable to use hierarchical access.

Finally, it's a good practice to require that the top-level module have only module instantiations. That is, no logic circuitry. The reason for this is that having logic at the top level is indicative of poor design partitioningnamely, the logic not properly assigned to any submodules. Furthermore, the partition of modules at the RTL should correspond to the physical partition. This simplifies the equivalence checks between the RTL and the physical design.

##### Preferred design constructs

Some legal constructs are discouraged for several good reasons. Because it is not possible to enumerate all of them (some are project specific), let's take a look at the common ones. Loop constructs such as FOR/WHILE/REPEAT are discouraged for design code (test bench code has more relaxed rules). The reason for this is that the loop structure is more of a software entity than a hardware entity. Using them in the RTL creates a less direct correspondence between the RTL and the real hardware, making verification more complex. Sometimes loop constructs are used to create regular structures. An example is a memory array, which consists of a large, repetitive regular structure. A macro should be used instead, because directly coding a large, repetitive structure is error prone, whereas automatic macro generation gives the user an easy way to inspect for correctness. Macros can be written in Perl, m4, or C and, when called during preprocessing, will write out a Verilog description of the code.

When coding at the RTL, lower level constructs such as gate and transistor primitives should be avoided, some of which are AND, NOR, NOT, PMOS, NMOS, CMOS, trAN, trIREG, PULLUP, and PULLDOWN. The first three are logic gates, the next five are transistors, and the last two are resistive PULLUPs and PULLDOWNs. Furthermore, Verilog has a syntax for signal strength that is used to resolve conflict when a net is driven to opposite values at the same time. The signal with the greater strength wins. Strength constructs belong to transistor-level design and should be avoided in RTL code.

In addition, some constructs are mainly for test benches, for which no hardware counterpart is produced and thus have a coding style that can be more software oriented than hardware oriented. Examples of such Verilog constructs are force/release, fork/join, event, and wait.

Finally, user-defined primitives (UDPs), especially sequential ones, should be avoided as much as possible, because their functionality is not immediately obvious and they often hide unexpected side effects. The workaround is to replace UDPs with small circuits made of standard library cells.

#### Structural Checks

Structural errors and warnings result from connections between components. To detect them, code from multiple designers may be required for examination, and they may be much less obvious to individual designers.

##### Loop structure

A loop structure is a loop made of circuit components with outputs that feed to the inputs of the next component in the loop. An example is shown in Figure. A loop structure is detrimental if, at a moment in time, the signal can travel along the loop uninhibitedthat is, not blocked by an FF or a latch. A case in point is a loop made of only combinational gates. Another situation is a loop made of latches that can all become transparent at the same time. The latch loop in Figure can become transparent if clock1 is high and clock2 is low.

##### 6. A circuit with a combinational loop and a latch loop

A combinational loop can be detected by following combinational gates from input to output. When you arrive at the same net without going through a sequential device, you have found such a loop. A latch loop is more difficult to discover because, besides finding a loop of latches and combinational gates, you need to determine whether all latches in the loop can become transparent at the same time, and this decision is computationally expensive. In practice, the designer assists the checker by telling her about the phases of latch clocks. Even with hints, the checker may still make pessimistic assumptions and may have the designer validate or invalidate the latch loops.

##### Bus operation

A bus is a communication channel among devices that have the capability to relinquish the bus by driving its outputs to high impedance Z. Thus, all gates connected to a bus must be tristate devices. In theory, a bus should be in high impedance state when it is not driven. In practice, some designs pull up (or down) a floating bus with a weak resistor to avoid being a potentially problematic dangling net. The strength of the pull-up or pull-down resistor should be chosen to be just right to not have the bus stuck at one logic level. Given this complication, warnings should be issued in this situation.

Furthermore, drivers to a bus should be checked for mutual exclusion of their enable signals (in other words, at most, one driver of the bus can be active) at any time. A conclusive check is computationally expensive and belongs to the domain of formal verification. A compromise is to check mutual exclusion during simulation runs. Then, of course, the result is data dependent and indicates only partial validation. It is also desirable to ensure that when a bus receiver is enabled, a driver is already driving, or when a driver is active, at least one receiver is listening. Again, these checks should be done with formal verification tools or during simulation runs.

##### FF and latch configuration

Sequential elements are latches and FFs. How they are connected requires particular attention. Active high latches should not drive active high latches, and positive-edge trigger FFs should not drive active high latches, because the driven latches simply pass whatever the driving latches/FFs store, behaving just like buffers. This kind of configuration should invoke warnings, but not errors, because the driven latches could probably be used for other purposes than storage, such as for scanning. Similarly, an active low latch should not be driven by another active low latch or negative-edge trigger FF. Therefore, phases of sequential elements should alternate along any path.

Checking for sequential element configuration becomes complicated when there are multiple clock domainsnamely, parts of the design are driven by different clocks. Then the simple phase-alternating rule would generalize to check whether a driven sequential element retains its storage functionality. Again, it is not possible to be conclusive about whether a violation is necessarily an error; hence, a warning should be issued.

##### Sequential element reset

All sequential elements in control logic should be able to be reset or be able to be driven to a known state. The reason is that when the design is first powered on, the control logic should be in a known state. However, such a requirement is not necessary for the sequential elements in data path logic, which, if designed properly, is supposedly drivable to a known state.

## Rules to Check

1. All operands of an operator have equal length.

2. Conditional statements have complete and nonoverlapping cases.

3. If full_case or parallel_case directives are found, warn about the possible mismatching simulation and synthesis models.

4. Warn if an always block has an incomplete sensitivity list.

5. The explicit connection rule is obeyed and the use of hierarchical paths is forbidden in a design.

6. Certain constructs, although legal, will be warned. A list of such constructs consists of transistors, gate primitives, strength, delays, events, time, UDPs, and test bench constructs (for example, force/release, loops, deassign, fork/join).

7. Memory arrays should be derived from a macro library.

8. An error is issued when a combinational or a simultaneously opened latch loop is detected.

9. Issue a warning when a bus with a pull-up or pull-down resistor is detected.

10. Use runtime checks to discover mutual exclusion of bus drivers, and coordination of bus drivers and receivers.

11. Only alternating types of FF and latches should be cascaded.

12. All control logic can be reset.