We can view addition in terms of generate , G[ i ], and propagate , P[ i ], signals.

method 1  method 2

G[ i ] = A[ i ] · B[ i ]  G[ i ] = A[ i ] · B[ i ](2.42)

P[ i ] = A[ i ]   B[ i ]  P[ i ] = A[ i ] + B[ i ](2.43)

C[ i ] = G[ i ] + P[ i ] · C[ i   1]  C[ i ] = G[ i ] + P[ i ] · C[ i   1](2.44)

S[ i ] = P[ i ]   C[ i   1]  S[ i ] =  A[ i ]   B[ i ]   C[ i   1](2.45)

where C[ i ] is the carry-out signal from stage i , equal to the carry in of stage ( i  + 1). Thus, C[ i ] = COUT[ i ] = CIN[ i  + 1]. We need to be careful because C[0] might represent either the carry in or the carry out of the LSB stage. For an adder we set the carry in to the first stage (stage zero), C[1] or CIN[0], to '0'. Some people use delete (D) or kill (K) in various ways for the complements of G[i] and P[i], but unfortunately others use C for COUT and D for CINso I avoid using any of these. Do not confuse the two different methods (both of which are used) in Eqs.  2.422.45 when forming the sum, since the propagate signal, P[ i ] , is different for each method.

Figure 2.22(a) shows a conventional RCA. The delay of an n -bit RCA is proportional to n and is limited by the propagation of the carry signal through all of the stages. We can reduce delay by using pairs of "go-faster" bubbles to change AND and OR gates to fast two-input NAND gates as shown in Figure 2.22(a). Alternatively, we can write the equations for the carry signal in two different ways:

either  C[ i ] = A[ i ] · B[ i ] + P[ i ] · C[ i   1](2.46)

orn  C[ i ] = (A[ i ] + B[ i ] ) · (P[ i ]' + C[ i   1]),(2.47)

where P[ i ]'  = NOT(P[ i ]). Equations 2.46 and 2.47 allow us to build the carry chain from two-input NAND gates, one per cell, using different logic in even and odd stages (Figure 2.22b):

even stages  odd stages

C1[ i ]' = P[ i  ] · C3[ i   1] · C4[ i   1]  C3[ i ]' = P[ i  ] · C1[ i   1] · C2[ i   1](2.48)

C2[ i ] = A[ i  ] + B[ i  ]  C4[ i ]' = A[ i  ] · B[ i  ](2.49)

C[ i ] = C1[ i  ] · C2[ i  ]  C[ i ] = C3[ i  ] ' + C4[ i  ]'(2.50)

 FIGURE 2.22  The ripple-carry adder (RCA). (a) A conventional RCA. The delay may be reduced slightly by adding pairs of bubbles as shown to use two-input NAND gates. (b) An alternative RCA circuit topology using different cells for odd and even stages and an extra connection between cells. The carry chain is a fast string of NAND gates (shown in bold).

(the carry inputs to stage zero are C3[1] = C4[1] = '0'). We can use the RCA of Figure 2.22(b) in a datapath, with standard cells, or on a gate array.

Instead of propagating the carries through each stage of an RCA, Figure 2.23 shows a different approach. A carry-save adder ( CSA ) cell CSA(A1[ i ], A2[ i ], A3[ i  ], CIN, S1[ i ], S2[ i ], COUT) has three outputs:

S1[ i ] = CIN(2.51)

S2[ i ] = A1[ i ]   A2[ i ]   A3[ i  ] = PARITY(A1[ i ], A2[ i ], A3[ i  ])(2.52)

COUT = A1[ i ] · A2[ i ] + [(A1[ i ] + A2[ i ]) · A3[ i  ]] = MAJ(A1[ i ], A2[ i ], A3[ i  ])(2.53)

The inputs, A1, A2, and A3; and outputs, S1 and S2, are buses. The input, CIN, is the carry from stage ( i   1). The carry in, CIN, is connected directly to the output bus S1indicated by the schematic symbol (Figure 2.23a). We connect CIN[0] to VSS. The output, COUT, is the carry out to stage ( i  + 1).

A 4-bit CSA is shown in Figure 2.23(b). The arithmetic overflow signal for ones' complement or two's complement arithmetic, OV, is XOR(COUT[MSB], COUT[MSB  1]) as shown in Figure 2.23(c). In a CSA the carries are "saved" at each stage and shifted left onto the bus S1. There is thus no carry propagation and the delay of a CSA is constant. At the output of a CSA we still need to add the S1 bus (all the saved carries) and the S2 bus (all the sums) to get an n -bit result using a final stage that is not shown in Figure 2.23(c). We might regard the n -bit sum as being encoded in the two buses, S1 and S2, in the form of the parity and majority functions.

We can use a CSA to add multiple inputsas an example, an adder with four 4-bit inputs is shown in Figure 2.23(d). The last stage sums two input buses using a carry-propagate adder ( CPA ). We have used an RCA as the CPA in Figure 2.23(d) and (e), but we can use any type of adder. Notice in Figure 2.23(e) how the two CSA cells and the RCA cell abut together horizontally to form a bit slice (or slice) and then the slices are stacked vertically to form the datapath.

 FIGURE 2.23  The carry-save adder (CSA). (a) A CSA cell. (b) A 4-bit CSA. (c) Symbol for a CSA. (d) A four-input CSA. (e) The datapath for a four-input, 4-bit adder using CSAs with a ripple-carry adder (RCA) as the final stage. (f) A pipelined adder. (g) The datapath for the pipelined version showing the pipeline registers as well as the clock control lines that use m2.

We can register the CSA stages by adding vectors of flip-flops as shown in Figure 2.23(f). This reduces the adder delay to that of the slowest adder stage, usually the CPA. By using registers between stages of combinational logic we use pipelining to increase the speed and pay a price of increased area (for the registers) and introduce latency . It takes a few clock cycles (the latency, equal to n clock cycles for an n -stage pipeline) to fill the pipeline, but once it is filled, the answers emerge every clock cycle. Ferris wheels work much the same way. When the fair opens it takes a while (latency) to fill the wheel, but once it is full the people can get on and off every few seconds. (We can also pipeline the RCA of Figure 2.20. We add  i  registers on the A and B inputs before ADD[ i ] and add ( n  i ) registers after the output S[ i ], with a single register before each C[ i ].)

The problem with an RCA is that every stage has to wait to make its carry decision, C[ i ], until the previous stage has calculated C[ i   1]. If we examine the propagate signals we can bypass this critical path. Thus, for example, to bypass the carries for bits 47 (stages 58) of an adder we can compute BYPASS = P[4].P[5].P[6].P[7] and then use a MUX as follows:

C[7] = (G[7] + P[7] · C[6]) · BYPASS' + C[3] · BYPASS.(2.54)

Adders based on this principle are called carry-bypass adders ( CBA ) [Sato et al., 1992]. Large, custom adders employ Manchester-carry chains to compute the carries and the bypass operation using TGs or just pass transistors [Weste and Eshraghian, 1993, pp. 530531]. These types of carry chains may be part of a predesigned ASIC adder cell, but are not used by ASIC designers.

Instead of checking the propagate signals we can check the inputs. For example we can compute SKIP = (A[ i   1]   B[ i   1])  + (A[ i ]   B[ i ] ) and then use a 2:1 MUX to select C[ i ]. Thus,

CSKIP[ i ] = (G[ i ] + P[ i ] · C[ i   1]) · SKIP' + C[ i   2] · SKIP.(2.55)

This is a carry-skip adder [Keutzer, Malik, and Saldanha, 1991; Lehman, 1961]. Carry-bypass and carry-skip adders may include redundant logic (since the carry is computed in two different wayswe just take the first signal to arrive). We must be careful that the redundant logic is not optimized away during logic synthesis.

If we evaluate Eq. 2.44 recursively for i     = 1, we get the following:

C[1]= G[1] + P[1] · C[0] = G[1] + P[1] · (G[0] + P[1] · C[1])

= G[1] + P[1] · G[0].(2.56)

This result means that we can "look ahead" by two stages and calculate the carry into the third stage (bit 2), which is C[1], using only the first-stage inputs (to calculate G[0]) and the second-stage inputs. This is a carry-lookahead adder ( CLA ) [MacSorley, 1961]. If we continue expanding Eq. 2.44, we find:

C[2]= G[2] + P[2] · G[1] + P[2] · P[1] · G[0],

C[3]= G[3] + P[2] · G[2] + P[2] · P[1] · G[1] + P[3] · P[2] · P[1] · G[0].(2.57)

As we look ahead further these equations become more complex, take longer to calculate, and the logic becomes less regular when implemented using cells with a limited number of inputs. Datapath layout must fit in a bit slice, so the physical and logical structure of each bit must be similar. In a standard cell or gate array we are not so concerned about a regular physical structure, but a regular logical structure simplifies design. The BrentKung adder reduces the delay and increases the regularity of the carry-lookahead scheme [Brent and Kung, 1982]. Figure 2.24(a) shows a regular 4-bit CLA, using the carry-lookahead generator cell (CLG) shown in Figure 2.24(b).