We return to the comparator/MUX example to see how timing analysis is applied to sequential logic. We shall use the same input code ( comp_mux.v in Section 13.2 ), but this time we shall target the design to an Actel FPGA.
The estimated prelayout critical path delay is nearly 30 ns including the I/O-cell delays (ACT 3, worst-case, standard speed grade). This limits the operating frequency to 33 MHz (assuming we can get the signals to and from the chip pins with no further delays—highly unlikely). The operating frequency can be increased by pipelining the design as follows (by including three register stages: at the inputs, the outputs, and between the comparison and the select functions):
- Paths that start at an input pad and end on the data input of a sequential logic cell (the D input to a D flip-flop, for example). We might call this an entry path (or input-to-D path) to a pipelined design. The longest entry delay (or input-to-setup delay) is 4.52 ns.
- Paths that start at a clock input to a sequential logic cell and end at the data input of a sequential logic cell. This is a stage path ( register-to-register path or clock-to-D path) in a pipeline stage. The longest stage delay ( clock-to-D delay) is 9.99 ns.
- Paths that start at a sequential logic cell output and end at an output pad. This is an exit path ( clock-to-output path) from the pipeline. The longest exit delay ( clock-to-output delay) is 11.95 ns.
By pipelining the design we added three clock periods of latency, but we increased the estimated operating speed. The longest prelayout critical path is now an exit delay, approximately 12 ns—more than doubling the maximum operating frequency. Next, we route the registered version of the design. The Actel software informs us that the postroute maximum stage delay is 11.3 ns (close to the preroute estimate of 9.99 ns). To check this figure we can perform another timing analysis. This time we shall measure the stage delays (the start points are all clock pins, and the end points are all inputs to sequential cells, in our case the D input to a D flip-flop). We need to define the sets of nodes at which to start and end the timing analysis (similar to the path clusters we used to specify timing constraints in logic synthesis). In the Actel timing analyzer we can use predefined sets 'clock' (flip-flop clock pins) and 'gated' (flip-flop inputs) as follows:
We could try to reduce the long stage delay (11.3 ns), but we have already seen from the preroute timing estimates that an exit delay may be the critical path. Next, we check some other important timing parameters.
Hold-time problems can occur if there is clock skew between adjacent flip-flops, for example. We first need to check for the shortest exit delays using the same sets that we used to check stage delays,
The shortest path delay, 4 ns, is between the clock input of a D flip-flop with instance name b_rr_ff_b1 (call this X ) and the D input of flip-flop instance name outp_ff_b1 ( Y ). Due to clock skew, the clock signal may not arrive at both flip-flops simultaneously. Suppose the clock arrives at flip-flop Y 3 ns earlier than at flip-flop X . The D input to flip-flop Y is only stable for (4 – 3) = 1 ns after the clock edge. To check for hold-time violations we thus need to find the clock skew corresponding to each clock-to-D path. This is tedious and normally timing-analysis tools check hold-time requirements automatically, but we shall show the steps to illustrate the process.
Before we can measure clock skew, we need to analyze the entry delays, including the clock tree. The synthesis tools automatically add I/O pads and the clock cells. This means that extra nodes are automatically added to the netlist with automatically generated names. The EDIF conversion tools may then modify these names. Before we can perform an analysis of entry delays and the clock network delay, we need to find the input node names. By looking for the EDIF 'rename' construct in the EDIF netlist we can associate the input and output node names in the behavioral Verilog model, comp_mux_rrr , and the EDIF names,
Thus, for example, the EDIF conversion program has renamed input port a to a_2_ because the design tools do not like the Verilog bus notation using square brackets. Next we find the connections between the ports and the added I/O cells by looking for 'PAD' in the Actel format netlist, which indicates a connection to a pad and the pins of the chip, as follows:
This tells us, for example, that the node we called clock in our behavioral model has been joined to a node (with automatically generated name) called CLKBUF_30:PAD , using a net (connection) named DEF_NET_145 (again automatically generated). This net is the connection between the node clock that is dangling in the behavioral model and the clock-buffer pad cell that the synthesis tools automatically added.
We now know that the clock-pad input is CLKBUF_30:PAD , so we can find the exit delays (the longest path between clock-pad input and an output) as follows (using the clock-pad input as the start set):
The input-to-clock delay, t IC , due to the clock-buffer cell (or macro) CLKEXT_0 , instance name CLKBUF_30/U0 , is 7.9 ns. The clock-to-Q delay, t CQ , of flip-flop cell DF1 , instance name outp_ff_b0 , is 4.5 ns. The delay, t QO , due to the output buffer cell OUTBUF , instance name OUTBUF_33 , is 3.7 ns. The longest path between clock-pad input and the output, t CO , is thus
The clock-buffer instance name, CLKBUF_30/U0 , is hierarchical (with a '/' hierarchy separator). This indicates that there is more than one instance inside the clock-buffer cell, CLKBUF_30 . Instance CLKBUF_30/U0 is the input driver, instance CLKBUF_30/U1 is the output driver (which is disabled and unused in this case).
(where both clock and data delays end at the same flip-flop instance). We find the clock delays in Eq. 13.24 using the clock input pin as the start set and the end set 'clock' . The timing analyzer tells us all 16 clock path delays are the same at 7.9 ns in our design, and the clock skew is thus zero. Actel’s clock distribution system minimizes clock skew, but clock skew will not always be zero. From the discussion in Section 13.7.1 , we see there is no possibility of internal hold-time violations with a clock skew of zero.
Next, we find the data delays in Eq, 13.24 using a start set of all input pads and an end set of 'gated' ,
We are only interested in the last six paths of this analysis (rank 10–15) that describe the delays from each data input pad ( a , a , a , b , b , b ) to the D input of a flip-flop. The maximum data delay, 10 ns, occurs on input buffer instance name INBUF_26 (pad 26); pin INBUF_26:PAD is node a_0_ in the EDIF file or input a in our behavioral model. The six t SU (external) equations corresponding to Eq, 13.24 may be reduced to the following worst-case relation:
We calculated the clock and data delay terms in Eq. 13.24 separately, but timing analyzers can normally perform a single analysis as follows:
Since t SU (internal) is always positive on Actel FPGAs, t SU (external) min is always positive for this design. In large ASICs, with large clock delays, it is possible to have external hold-time requirements on inputs. This is the reason that some FPGAs (Xilinx, for example) have programmable delay elements that deliberately increase the data delay and eliminate irksome external hold-time requirements.