Returning to the Viterbi decoder example (from Section 12.4), we first set the environment for the design using the following worst-case conditions: a die temperature of 25 ∞ C (fastest logic) to 120 ∞ C (slowest logic); a power supply voltage of V DD = 5.5 V (fastest logic) to V DD = 4.5 V (slowest logic); and worst process (slowest logic) to best process (fastest logic). Assume that this ASIC should run at a clock frequency of at least 33 MHz (clock period of 30 ns). An initial synthesis run gives a critical path delay at nominal conditions (the default setting) of about 25 ns and nearly 35 ns under worst-case conditions using a high-density 0.6 m m standard-cell target library.
Estimates (using simulation and calculation) show that data arrives at the input pins 5 ns (worst-case) after the rising edge of the clock. The reset signal arrives 10 ns (worst-case) after the rising edge of the clock. The outputs of the Viterbi decoder must be stable at least 4 ns before the rising edge of the clock. This allows these signals to be driven to another ASIC in time to be clocked. These timing constraints are particularly devastating. Together they effectively reduce the clock period that is available for use by 9 ns. However, these figures are typical for board-level delays.
The logic synthesizer can do little or no optimization across these module boundaries. The next step, then, is to rearrange the design hierarchy for synthesis. Flattening ( merging or ungrouping) the six modules into a new cell, called critical , allows the synthesizer to reduce the critical path delay by optimizing one large module.
At present the last module in the critical path is output_decision . This combinational logic adds 2–3 ns to the output delay requirement of 4 ns (this means the outputs of the module metric must be stable 6–7 ns before the rising clock edge). Registering the output reduces this overhead and removes the module output_decision from the critical path. The disadvantage is an increase in latency by one clock cycle, but the latency is already 12 clock cycles in this design. If registering the output decreases the critical path delay by more than a factor of 12 / 13, performance will still improve.
These changes move the performance closer to the target. Prelayout estimates indicate the die perimeter required for the I/O pads will allow more than enough area to hold the core logic. Since there is unused area in the core, it makes sense to switch to a high-performance standard-cell library with a slightly larger cell height (96 l versus 72 l ). This cell library is less dense, but faster.
Typically, at this point, the design is improved by altering the HDL, the hierarchy, and the synthesis controls in an iterative manner until the desired performance is achieved. However, remember there is still no information from the layout. The best that can be done is to estimate the contribution of the interconnect using wire-load models. As soon as possible the netlist should be passed to the floorplanner (or the place-and-route software in the absence of a floorplanner) to generate better estimates of interconnect delays.
Delay information 1
Table 12.13 is a timing report for the Viterbi decoder, which shows the critical path starts at a sequential logic cell (a D flip-flop in the present example), ends at a sequential logic cell (another D flip-flop), with 37 other combinational logic cells in-between. The first delay is the clock-to-Q delay of the first flip-flop. The last delay is the setup time of the last flip-flop. The critical path delay is 24.56 ns, which gives a slack of 0.44 ns from the constraint of 25 ns (reduced from 30 ns to give an extra margin). We have met the timing constraint (otherwise we say it is violated ).
In Table 12.13 all instances in the critical path are inside instance v_1.u100 . Instance name u100 is the new cell (cell name critical ) formed by merging six blocks in module viterbi (instance name v_1 ).
The second column in Table 12.13 shows the timing arc of the cell involved on the critical path. For example, CP --> QN represents the path from the clock pin, CP , to the flip-flop output pin, QN , of a D flip-flop (cell name dfctnb ). The pin names and their functions come from the library data book. Each company adopts a different naming convention (in this case CP represents a positive clock edge, for example). The conventions are not always explicitly shown in the data books but are normally easy to discover by looking at examples. As another example, B0 --> CO represents the path from the B input to the carry output of a 2-bit full adder (cell name ad02d1 ).
The fifth column ( trs ) describes whether the transition at the output node is rising ( R ) or falling ( F ). The timing analyzer examines each possible combination of rising and falling delays to find the critical path.
The last column ( cell ) is the cell name (from the cell-library data book). In this library suffix 'd1' represents normal drive strength with 'd0' , 'd2 ', and 'd5' being the other available strengths.