Simulator Operations and Applications

Simulator Operations and Applications

This section studies typical features in a simulator and where these features are applicable. This section is not meant to be a substitute for simulator manuals, but rather it serves to introduce the concepts and commands that are available in a simulator. It is not feasible to cover all simulator commands, because of their enormous number and variation over simulators and time. However, simulator commands are just an embodiment of more fundamental concepts. It is these concepts that are precisely the focus of our study in this section. Therefore, the commands in this section are pseudocommands. Besides explaining functions in command categories, we will also discuss how commands are applied.

The Basic Simulation File Structure

Every simulator has a directory structure for input, output, and command files. The input directory, which usually has subdirectories, holds HDL design files, include files, library files, Makefiles, compiled images, and sometimes C/C++ files for PLIs. The HDL design directory often has subdirectories corresponding to the functional units of the design. Within a functional unit subdirectory are design files that contain RTL code, along with macros, parameters, and constant definitions, which reside in include files. The library file contains cell definitions, such as FFs and encoders. A cell is defined to be the module lowest in the design hierarchy and there is no module instantiation inside it. Makefiles perform various tasks such as compiling C/C++ code and linking object code to produce executables, expanding macros to generate HDL files, and compiling the design for simulation. Compiled images are files produced by the simulator's compiler and are input to the simulator. The output directory, possibly having a couple layers of subdirectories, contains log files generated from simulation runs, signal tracing files, error logs, and others. The command directory has script files for simulation compilation, simulation run, debugging options, and others. An example simulation directory organization is shown in Figure.

22. Simulation directory structure

To guide a simulator or compiler to search for a file or a directory, the information is passed through runtime options. For example, to specify an input file or to designate an output file, a full path to the file is specified on the command line, or the directory holding the files is passed as option arguments and the simulator or compiler searches the files. If directories are passed, the compiler searches files in the current working directory and then the specified directories. A typical command line for compilation may look like

compile -f filelist -y srcDirectory +option1 +define+MyVar=1
-output=logFile -o sim

where filelist contains paths to the HDL design files and include directories. The following is an example:


The first two lines specify the include directories to be /home/design/include and /home/design/macro/ so that when an include file is encountered during compilation, these two directories will be searched. The remaining files are HDL design files. The argument after -y is the directory for library cells. The next item, +option1, can be any option known to the compiler. The next item, +define+MyVar=1, sets the value of the compile time variable MyVar to 1, so that whenever MyVar is encountered in the source files during compilation, it is replaced by 1. The next item designates logFile to be the output file. Finally, the last item specifies the name of the compiled image to be sim. After compilation, the simulator can be invoked using a command such as

simulate -image sim +option2

where the simulator loads the compiled image file sim and takes in runtime option option2.

Performance and Debugging

In this section we discuss simulator options for enhancing performance and debugability of the circuit. Options for performance and for debugability have opposite effects on simulation: High performance means less debugability and vice versa. This is because to increase simulation speed, the circuit representation often needs to be restructured. For instance, buffers and inverters are combined with other gates and are eliminated, bus bits are aggregated, redundant logic is pruned, and blocks with the same sensitivity are merged. Consequently, the eliminated nodes are not observable, and the resulting structure is not easily recognizable to the user, making the circuit more difficult to debug.

Most simulators have several levels of performance optimization. We assume the highest level means the highest performance and hence the lowest degree of debugability. Debugability usually refers to how the user may inquire or manipulate circuit nodes or variables during simulation runtime or through user PLI tasks. Different modules in a design can be tailored to have different levels of optimization so that the well-tested modules can be optimized to the greatest extent. At different levels of optimization, the corresponding debugging restrictions imposed at each level vary. An example guideline follows. At the highest level, nodes or variables can only be read. At the next level, values of nodes and variables can be modified, and delays of gates can be altered. Changing a node value can be done, for example, by using Verilog's force construct, or PLI's tf_put, or by assigning to a new value during an interactive simulation session. At the lowest level, all performance optimizations are disabled, and everything is readable, writable, and traceable. Traceable means that the circuit structure can be traversed through PLI routines (for example, inquiring about fanouts or fanins of a node through PLI's acc_next_driver or VPI's vpi_iterate). Obviously, to enable traceability, the simulator must maintain a mechanism to support the PLI or the VPI routines, which slows down performance. If a node or a variable is accessed at a level not permissible by the optimization option (for example, if it is assigned a new value while the highest performance option is specified), an error will result. An example compile command with tailored optimization options is as follows:

compile -f filelist +optimize+level2+file=ALU.v +optimize+level1

where the first optimization option specifies that file ALU.v be optimized at level 2 and the rest optimized at level 1.

To debug a circuit, viewing signal waveforms is a necessity. A common practice is to dump out signal traces during a simulation run and view them later with a waveform viewer. Using this method, the user can debug off-line and free up the simulator for others. Unless all resources have been exhausted, it is inefficient to dump out all signal traces in the design, especially when the design is large. Instead, only a portion of the design is selected for dumping, and this selection can be made by the user during compilation or simulation. To implement selective dumping at compilation, Verilog's ifdef guards a dumping code segment that can be activated to dump signals in a functional unit. When the variable of ifdef is defined, the dumping code is activated. If the code is not activated, signals from the unit are not dumped. For example, to create selective dumping for functional unit ALU, the following code is used:

'ifdef DEBUG_ALU
   $dumpvar(0, alu);

System task $dumpvar dumps out all node values inside module alu in value change dump (VCD) format. The first argument, 0, means that all levels of hierarchy inside alu are dumped. To activate this task at compile time, the following command is used, which defines variable DEBUG_ALU:

compile -f filelist +define+DEBUG_ALU ...

Because variable DEBUG_ALU is defined, the code $dumpvar(alu) is compiled with the rest of the circuit, and dumping is activated. Dumping can also be activated during simulation runtime and it is done via plusarg (short for +argument). Change the previous ifdef to the following if statement:

if($test$plusargs(debug_alu == 1))
   $dumpvar(0, alu);

where task $test$plusargs checks the value of argument debug_alu. If it is equal to 1, the following line will be executed. To invoke this dumping at runtime, the simulator is invoked with the plus argument +debug_alu+1:

simulate -image sim +debug_alu+1

plusarg +debug_alu+1 defines the value of the argument to be 1, and hence turns on dumping of alu.

The differences between compilation time and simulation time selection are the size of the compiled image and the ability to select dumping based on actual simulation results. If a dumping code is implemented as a compilation time option, the decision to dump (or not) must be made at compilation time. Once compiled, it cannot be changed without recompilation. The advantage is that, if selected not to dump, the resulting compiled image is smaller. On the hand, if it is implemented as a simulation time option, what to dump can be decided when a bug shows up, without recompiling. The disadvantage is that the code has already been compiled, even though it is selected not to dump.

Figure summarizes the effects of simulator options on compilation and simulation speed, as well as debugging capability.

Effects of Simulator Options on Compilation and Simulation Speed and Debugability

Option Type


Enable read, write, and connectivity trace

Slow down compilation and simulation but increase debugging capability

Enable two-state simulation

Speed up both compilation and simulation but decrease debugability

Disable timing checks

Speed up simulation but decrease debugability

Use a zero-delay or a unit-delay model

Speed up both compilation and simulation but decrease debugability

Perform structural optimization (combine bits, eliminate buffers)

Slow down compilation, speed up simulation, and decrease debugability

Enable interactive simulation

Slow down compilation and simulation but increase debugability

Timing Verification

To verify timing properties, a delay model for the circuit must first be chosen. One delay model is a zero-delay model, in which all gates and blocks, specified explicitly or not, are assumed to have zero delays. This delay model does not reveal much timing property about the circuit and thus is used mainly for functional verification. A zero-delay model produces the fastest simulation speed compared with other delay models. Another model is a unit-delay model for which all gates and blocks have a delay of one, and all specified delays are converted to unit delays. This delay model is not realistic, but it is a reasonable compromise between a realistic but slow delay model and the zero-delay model. Its main application is in detecting hazards. Finally, a full-delay model allows each gate, block, and interconnect to have its own delay. The delay values are usually extracted from the layout of the design and are back annotated to the design for timing simulation. This full model has the most accurate timing information, but it runs the slowest. It is used for timing closure verification after functional verification is completed.

To build a full-delay model, delay information on gates and interconnects is computed based on the cell library and RC extractions from the design's layout. The delay numbers used in timing simulation are interconnect delays and gate propagation delays.

Interconnect delays are calculated from the interconnect's physical dimension and the resistive and capacitive parameters of the IC fabrication process. Gate delay is determined by three variables: input transition speed, delay equation of the gate, and output capacitive load. A steeper input transition produces a smaller gate delay. A larger capacitive load causes a larger gate delay. A delay equation of a gate takes in an input transition speed and an output load, and produces the gate's delay and the output transition speed. A gate's delay equation is obtained by characterizing the gate using a transistor-level simulator, such as SPICE. The characterization process simulates and measures the gate's propagation delays and output transition speeds for a range of input transition speeds and output capacitive loads. The measures are then fit into a set of equations.

To calculate a gate's delay in a layout, the gate's input and output capacitance are first extracted from the layout. Next, the input transition speed is calculated by computing the output transition speed of the driver on the gate's input capacitance using the driver's delay equation. With this input speed and the gate's output capacitance, the gate's propagation delay is calculated using the gate's delay equation. This iterative process is captured in Figure.

23. Calculating gate propagation delay from a delay equation

The calculated delays numbers, gate and interconnect, are then stored in standard delay file (SDF) format. The exact format can be found in the OVL Standard Delay File (SDF) Format Manual. These delays are then written and annotated to the gate or block models using Verilog's #delay construct or specify/endspecify construct.

A delay model can be selected as an option in the command line or as a compiler directive. When both are present, the former takes precedence over the latter. The exact syntax for delay model selection is not an IEEE standard, and thus is simulator specific. An example of delay model selection is as follows:

compile -f filelist -unit_delay_model // command-line option
selecting unit delay model

or as compiler directive 'use_unit_delay_model inside the HLD code. Both keywords unit_delay_model and use_unit_delay_model are understood by the compiler to choose the unit-delay model.

So far, we have assumed a single number of gate delays. As a part of the IEEE standard, a delay can have three possible values: minimum, typical, and maximum. A gate with a delay triplet is declared as follows, in which its minimum is 1; typical, 1.2; and maximum, 1.5:

buffer #(1:1.2:1.5) gate1(...);

Which delay value is to be used in simulation can be selected by passing a simulator-specific option to the compiler, such as

compile -f filelist -typical_delay
// command option selecting typical delay among maximum,
typical, and minimum delays

Once a delay model is selected, a simulator can be configured to verify various timing properties. Some common properties are timing checks, race check, and narrow pulse check. There are eight IEEE standard built-in timing checks in Verilog: $setup, $hold, $setuphold, $width, $period, $skew, $nochange, and $recovery (based on IEEE 13641995). These timing checks perform three tasks: (1) determine the elapsed time between two events, (2) compare the elapsed time with a specified limit, and (3) report a timing error if the limit is violated. For example, $setup(data_in, psedge clock, 1) compares the time elapsed between a transition of signal data_in and a rising edge of clock. If the elapsed time is less than one unit of time, a violation will be reported. The same applies to $hold and $setuphold.$width checks for pulses with a width narrower than a specified limitglitch detection. $period flags an error if the period of the signal is smaller than a specified limit. $skew issues an error if the time interval between two transitions (the skew) is greater than a specified limit. Finally, $recover checks for the recovery time of a changed signal, whereas $nochange checks for a steady-state value of a signal within a time interval. For instance, $nochange (posedge clock, data, 0, 0) issues an error if data changes while clock is rising. For a more detailed description of these checks, please refer to IEEE 1364-1995 or later version. A simulator can be configured to perform timing checks on selected modules. For example, the following command passes in a file, timing_file, that specifies which modules should be skipped for timing checks or which block delays should be replaced with zero delays:

compile -f filelist -timing timing_file

A typical format for timing_file is

<module path> <timing specification>

an example of which is

top_module.* no_timing_checks,

meaning all submodules under top_module should be skipped for timing checks.

In a real circuit, every transition has either a nonzero rise time or a nonzero fall time, and consequently it is possible that the finite rise and fall times shape a narrow pulse so that it does not have enough energy to propagate through a gate, as seen in Figure. This phenomenon is called narrow pulse filtering.

24. Effect of nonzero rise and fall times on narrow pulses. (A) A narrow pulse is filtered out. (B) Two closely spaced transitions fail to propagate the glitch.

RTL simulators combine rise and fall times with gate propagation delay, and use the combined delay as the overall gate delay. Effectively, all transitions have zero rise and fall times. Most simulators have a mechanism to detect narrow pulses. First, let us define some terms. The gate delay measured from an input transition to an output transition is called the transport delay. The minimum width a pulse must have to propagate to an output is called the inertial delay. The narrow pulse filtering effect is modeled by inertial delay. A common practice is to filter out automatically pulses of a width less than or equal to the delay of the gate (transport delay = inertial delay). To override this, the user can pass, in compile time, options specifying a limit on the minimum pulse width, in terms of a percentage of gate delay. Furthermore, this option can be applied to selected modules or paths. An example command follows:

compile -f filelist -pulse_width_limit=50 -pathpulse ...

where -pulse_width_limit=50 sets the minimum pulse width to be 50% of the gate delay, and -pathpulse enables module path-specific pulse control rules. Module path-specific pulse control rules specify pulse widths for paths inside a module. The rules are embedded in RTL code within specparam with a keyword such as PATHPULSE$ = 5, meaning the minimum pulse width for the module is five units of time.

When a pulse violates a pulse control restriction (for example, a pulse width is narrower than the inertial delay) the output of the gate becomes unknown. When this situation occurs, the time that the output becomes unknown can be determined using two methods. The first method, called alert on output transition, sets the time of unknown output to be the time when the first edge of the input pulse appears at the output. The rationale is that this is the time the output recognizes the input pulse. The second method, called alert on detection, sets the time of unknown output to be at the moment the input pulse is determined to be in violation. The rationale here is that this is the time the violation occurs. Most simulators allow the user to choose either method of reporting. Figure illustrates the two reporting methods. The pulse at input in of the invertor has a width of 2, whereas the gate has an inertial delay of 3. Therefore, this pulse will flag an error. The transport delay of the gate is 3. View A in Figure illustrates the rule of method 1. It produces an unknown output (shaded region) when the first transition of the input pulse has propagated to the output, which happens at time 5. The unknown value lasts until the second transition gets to the output, which occurs at time 7. View B illustrates the rule of method 2. It sends the output to unknown once the pulse is detected to be narrow. The detection time is when the second edge of the pulse arrives at the input, which is at time 3. This unknown value persists until the second edge of the pulse has reached the output at time 7.

25. Two different alert systems: (A) on first output transition and (B) on pulse violation detection

Design Profiling

Design profiling gathers information on how simulation time is distributed throughout the design and underlying supporting operating system (OS). The main purpose of using design profiling is to find simulation bottlenecks and optimize them for performance improvement. Activating profiling slows down simulation speed.

Profiling results can be collected at various levels of the design hierarchy. A profiling result in a design scope is sometimes called a view. One view is the overall summary of computing times spent on the design, the OS kernel, PLI calls, and signal dumping. An example of an overall view is shown in Figure, where the design took 313.2 seconds, about 36% of the total simulation time. OS kernel time is the time spent on calling OS system tasks such as those for file I/O. PLI task time is that used by PLI tasks. Signal trace dumping is a major consumption of simulation time (for example, dumping VCD files).

Example Simulation Profiling Summary


Total Time, sec

Time, %




OS kernel



PLI tasks



Signal trace dumping



Inside a design view, there can be more scope. Examples include the module view, where runtime distribution statistics on modules are collected, and the construct view, where statistics on always blocks, continuous assignments, functions/tasks, timing checks, UDPs, and other constructs are gathered. In the construct view, each construct is identified by filename and line number. An example of a construct view is shown in Figure. For example, 2.9% of time is spent on an always block in file chip.v on lines 122 to 244.

Profiling Statistics of Constructs of a Design



Time, %

always block

chip.v: 122-244

chip.v: 332-456



initial block

reset.v: 100-144



ecc.v 320-544

m ask.v: 124-235




cache.v: 212-326


Timing check

pipeline.v: 32


To activate design profiling, an argument is passed to the compiler so that the mechanism to collect the statistics can be constructed and compiled during compilation, such as

      compile -f filelist +profiling ...

Two-State and Four-State

Two-state simulation is faster but four-state simulation detects unknown and high-impedance states better. Some simulators allow users to specify at compilation time with an option such as +two_state whether two-state or four-state simulation is to be executed. When simulating in two state, some simulators convert the entire design to a two-state version by replacing x and z with 0, and ignoring the unconvertible constructs. Therefore, for these simulators, the result from the unconvertible constructs may be wrong. Some simulators, on the other hand, preserve certain constructs that are inherently four state. For these simulators, the acceleration is less. Therefore, when coding for simulation performance, it is important to know what constructs are inherently four state. The following is a list of four-state constructs.

  1. Strength data types. Verilog data types tri1 and tri0 model nets with implicit resistive pull-up and pull-down. If no driver is driving tri1, the value of TRi1 is 1 with strength of pull. Similarly, tri0 is 0 if it is not driven. Data type trireg models a net with charge storage with three storage strengths. These three data types should be preserved in two-state simulation; otherwise, the wrong result will occur. This is because in two-state simulation, there is no concept of strength. All strengths are the same. Therefore, when converting to two state, the implicit pull-up in tri1 is mapped to 1 and hence causes bus contention when tri1 is driven to 0. Consequently, all strength-related constructs should be preserved. Some such constructs are pull-ups, pull-downs, and primitives such as tran, rtran, and their relatives, which propagate signals with strength. Also parameters with Xs and Zs should be preserved.

  2. Four-state expressions. Verilog operators such as ===, !===, casex, and casez operate on four-state expressions and hence should be preserved.

  3. User-defined four-state data type. Some simulators allow users to define the four-state date type. An example is shown here, where wire w is defined through the stylized comment to be a four-state wire and hence should be preserved:

    wire /* four_state */ w;

  4. Any constructs connected to the previous four-state constructs or variables that propagate four-state data should be considered secondary four-state constructs and hence preserved. Consider the following:

    wire /* four_state */ a;
    assign b = a;
    buffer gate1(.in(b), .out(c));

    where wire a is declared as a four-state variable using a simulator pragma. Wires b and c should be preserved as four state because they form a conductive path for wire a. Any four-state value coming from wire a will be propagated to wires b and c.

To preserve four-state constructs, simulators allow the user to select modules to be simulated in four state or two state, and the selections are made through a configuration file. A configuration file contains module identifications and designation of four state or two state. For example, the following line specifies that module mod1 and mod2 be simulated in four-state mode:

module {mod1, mod2} {four-state}.

The configuration file 4state.config is then passed to the compiler on the command line:

compiler -f filelist +2_state +issue_2_state_warning

which invokes a two-state simulation compilation, issues warnings on constructs that may cause simulation differences arising from conversion of four-state constructs to two-state constructs (as indicated by +issue_2_state_warning), and simulates some modules in four-state mode as specified in configuration file 4state.config.

Cosimulation with Encapsulated Models

Encapsulated models, arising mainly from IPs, and reused and shared libraries, are precompiled object models that offer simulation capability while concealing their functionality. An encapsulated model has an HDL wrapper that defines the model's interface and design parameters. An encapsulated model also provides view ports through which the user can access certain internal nodes for read or write (for example, loading to memory inside the model or reading some control and status registers). To use an encapsulated model, it is first linked with the simulator and is then instantiated in the design through its wrapper interface. A simulator communicates with an encapsulated model through its wrapper interface. Two standard interfaces are open model interface (OMI) and SWIFT. To use an encapsulated model, the following steps are required:

Install the encapsulated model.

Link the simulator with an interface to the encapsulated model. The interface passes and retrieves values through the model ports.

Modify the library path to include the installed directory.

Instantiate the model wrapper in the design, then compile and simulate.

Hardware emulators can also be interfaced as an encapsulated object. Instead of having the wrapper interface talking to precompiled code, the wrapper communicates with the hardware emulator itself.

Figure shows simulation with two encapsulated models: One is precompiled object code and the other is a hardware emulator. For the hardware emulator, sometimes an additional interface is needed between the emulator and the standard interface. Whenever the interface wrapper is encountered during simulation, the wrapper simply collects the inputs and passes them to the encapsulated model, which executes and returns the outputs to the interface wrapper, which in turn passes up to the design.

26. Cosimulation with an encapsulated model

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows