Since we are focusing on performance optimization through layout manipulation of VLSI chips after place and route, we need to reliably determine those layout parameters that have the largest effect on timing of a chip fabricated with a DSM technology. Therefore, we examine only the most appropriate timing analysis methods used for VLSI chips to determine which layout dimensions are the dominant parameters affecting timing. These parameters then give us the information necessary for layout optimization. We also need to understand the limitations and underlying assumptions in these timing analysis techniques. Looking at timing analysis and the dominant layout parameters, we will learn that substantial speed performance optimization can be achieved through adjustments of interconnect dimensions in conjunction with the buffer stages driving them.
The most appropriate timing analysis in VLSI chips depends on the character of the circuit and the timing information needed. For now, we will focus on digital circuits. The timing information needed to determine and optimize the performance of a digital circuit is very different from what we would need to know for an analog circuit. We will see, however, that effects such as interconnect cross-coupling, will turn out to be analog effects, such as detailed pulse-shape information even for digital circuits. Focusing on digital circuits, the next questions concern the level at which we want to determine and verify the correct circuit timing behavior. Do we need a timing analysis at a high level, a functional level, or do we have to go all the way down to the transistor level? Since we are interested in physical layout optimization involving the polygon level, the analysis will have to be at the lowest and most detailed level, the transistor level. We will see later on what this means in terms of the complexity of the transistor models used for the timing analysis.
Normally, analyzing the time behavior of a digital circuit involves determining how a circuit moves through its digital - its binary - states with time, since digital circuits are state machines. Unfortunately, processing state information for timing analysis is too time-consuming for most situations, especially for the current complex VLSI circuits. In addition, if we want to perform a state-dependent simulation, we would also need to generate the appropriate simulation vector suites before we could even get started.
Fortunately, for physical layout optimization, changes do not occur in a circuit that affect its functional, state-dependent behavior. This means that we can focus on timing alone and just on time delays for now.
To know the timing, the highest clock frequency at which a VLSI digital circuit can operate, we have to determine the longest time delay among all the paths between circuits that latch the information on the clock edges. The longest delay path found still has to fit within the shortest clock cycle desired for the circuit. It will determine the highest clock frequency at which a circuit can run.
Figure 3.2 showed the typical configuration of a circuit in order to simply highlight the parasitics in the active parts of a circuit and suggest the distributed nature of the passive parts, the interconnects. Figure 3.2 does not show how signals get latched with clocked registers or latches. This is, however, how signals propagate through digital circuits. This process of signal propagation is described in detail in many books on digital systems where parameters such as setup and hold times are carefully explained. The concepts are just summarized here:
The time delays along all the signal paths in a circuit must enable every signal coming from a latch at the beginning of a path to pass through the corresponding latch at the end of that path before the latch “closes.” It closes with the clock edge. In other words, the data has to be available and stable so that the clock latches the correct data. The longest path of this type is called the critical path. The path delays between clocked latches determine a circuit's maximum possible clocking frequency.
By focusing exclusively on just time delay, we will determine the paths in the digital circuit that may present problems, paths that are marginal. The tools that determine only the delay in digital circuits and no slate information arc the TAs.
TAs became popular even before interconnects started to dominate timing. In a “well controlled” environment, such as a silicon compiler, they were used successfully twenty years ago. The other main reason for their widespread usage was that TAs are literally designed for MOS technology, which became the unquestionably dominant design methodology at about that time.
Today, the TA approach is also very well suited for timing verification for the exceedingly complex VLSI chips, because just determining time delays is much simpler than simulating a circuit through its states. It is also very fast and TAs are a very good fit for layout optimization in DSM technologies. No simulation vectors need to be generated and, in return, state information will not be determined. The resulting data is strictly time delays. However, while TAs are very useful for determining time delays in digital circuits, their results are valid only under certain physical assumptions. We will examine the physical assumptions for which TAs apply later, when we discuss interconnect modeling for time delay.
In summary, digital simulation is needed to determine the functionality of digital circuits. It is well known that TAs determine critical paths in a circuit that are not physically possible. This is, of course, due to the lack of state information. The person using the TA has to know if a very slow critical path indicated by the TA is actually a logically possible path. However, many of these shortcomings of TAs are far outweighed by their advantages. The result of the timing analysis is a knowledge of critical timing paths. Layout optimization can then focus on these paths in correcting timing problems.
So far, we have discussed some of the techniques for determining approximate but accurate capacitance values for interconnects. Together with other parameters, such as interconnect series resistance, these parasitic capacitances will be needed to model interconnects. The series resistance of interconnects requires a simple calculation based on sheet resistance.
Now that we have accurate values for capacitances and resistances, we need to find appropriate models for timing analysis. We will focus on finding accurate enough models for layout optimization that are computationally manageable. Such models do exist, and they will be discussed below. The main challenge, as indicated in Figure 3.2, is the interconnect part on the chip increasingly behaves like a lossy but linear transmission line, as layout geometries continue to shrink and clocking frequencies increase.
We assume that the series resistance in the interconnect dominates to such an extent that the inductance can be neglected, even for the latest VLSI geometries and operating speeds. As suggested in the literature, this is at present a very reasonable assumption. Also, most modeling discussions here and published information on interconnects/buffers make this assumption. New challenges will arise if and when series inductance starts to become important.
Lossy RC transmission lines in conjunction with the other circuit components on a chip create computational difficulties. Intelligent, discrete approximate circuits need to be found whenever distributed loads are present in a model. This is the path that was pursued for interconnects. Elmore and PRH have done pioneering work to find acceptable trade-offs between computational complexity and accuracy. We will first look at some results and then mention some of the assumptions made.
Before clocking frequencies were so high, when accurate models for interconnects were not so critical, models for interconnects were a simple RC circuit with a single R and a single C component. We will refer to this as a “lumped” equivalent circuit of the interconnect. The capacitance value was the total “parallel plate” capacitance of the interconnect over the length between buffer stages and the resistance is the total series resistance over the same total length. Gauging the range of possible solutions, this would be one limiting case. Another one at the other end of the spectrum would be a lossy RC transmission line, a continuum. But remember, this means solving a partial differential equation! We definitely need to find something in between.
An obvious compromise is: As few sections as possible with acceptable accuracy.
Elmore and PRH found a good compromise. Assuming a step function at the input of such an interconnect and focusing strictly on delay (not the detailed pulse shape), we show in Figure 3.7 the fruits of Elmore and PRH's work.
Fig. 3.6 The Result of Intelligent Curve-Fitting
The slower rising signal at the left side of the illustration (the one marked lumped) is the response to a step input to a simple RC circuit with a single R and a single C component as discussed above. Both C and R are equal to the total capacitance and total resistance of an interconnect.
The faster rising signal at the left of the illustration is the response to a step input to an exact representation of a distributed, lossy RC transmission line. Obviously, a simple, lumped RC circuit is inadequate in terms of accuracy. At the other extreme, a transmission line is not computationally tolerable. However, numerical calculations coupled with intelligent curve-fitting yield accurate responses for relatively simple circuits consisting of few lumps, as shown at the right side of the illustration in Figure 3.6.
One of the key assumptions for these approximations is a step-function input. Large errors start to occur for a slowly rising signal at the input of such an equivalent circuit.
The small percentage error is really remarkable, considering that a lossy transmission line is being approximated with just a few discrete components. Good delay and speed information can be calculated using these “simple” approximations.
Some of the layout-related issues are:
Let us summarize some of the assumptions underlying the models used for TAs without going into complicated network theoretic arguments. We should also review what can and can not be done with these results:
Before chips became as large as they are today, the most popular and most accurate timing analysis was based on SPICE. It is still the standard of reference today. SPICE determines signal rise and fall time, measured from the 10% point to the 90% point of a signal and the time delay from point to point measured between the 50% points of the signal. However, SPICE yields additional, very useful information such as the exact waveform and, therefore the rate of change of signals at points during the signal. This type of information will be of paramount importance for matters related to cross-coupling and signal integrity.
While SPICE can analyze any circuit with resistors, capacitors, inductors, current and voltage sources, a SPICE simulation of major parts of a chip rapidly became impractical because it is simply too slow, even with the current computer power. It is suitable only for a rather small number of transistors. That is the reason why SPICE has to be used judiciously where accuracy and detailed timing information are needed.
Fortunately, detailed time behavior provided by SPICE is not needed most of the time for digital circuits, particularly for time delay analysis. This is the key to the success of TAs.