ShareCG: ASICs .. the Book
10.2  A 4-bit Multiplier

Chapter  start  Previous page  Next  page

10.2  A 4-bit Multiplier

This section presents a more complex VHDL example to motivate the study of the syntax and semantics of VHDL in the rest of this chapter.

10.2.1  An 8-bit Adder

Table 10.1 shows a VHDL model for the full adder that we described in Section 2.6, "Datapath Logic Cells." Table 10.2 shows a VHDL model for an 8-bit ripple-carry adder that uses eight instances of the full adder.

TABLE 10.1    A full adder.

entity Full_Adder is
	generic (TS : TIME := 0.11 ns; TC : TIME := 0.1 ns);
	port (X, Y, Cin: in BIT; Cout, Sum: out  BIT);
end Full_Adder;
architecture Behave of Full_Adder is
Sum  <= X xor Y xor Cin after TS;
Cout <= (X and Y) or (X and Cin) or (Y and Cin) after TC;



TS (Input to Sum) = 0.1 1 ns

TC (Input to Cout) = 0.1 ns


TABLE 10.2    An 8-bit ripple-carry adder.

entity Adder8 is
	port (A, B: in BIT_VECTOR(7 downto 0);
	Cin: in BIT; Cout: out BIT; 
	Sum: out BIT_VECTOR(7 downto 0));
end Adder8;
architecture Structure of Adder8 is
component Full_Adder
port (X, Y, Cin: in BIT; Cout, Sum: out BIT);
end component;
signal C: BIT_VECTOR(7 downto 0);
Stages: for i in 7 downto 0 generate
	LowBit: if i = 0 generate
	FA:Full_Adder port map (A(0),B(0),Cin,C(0),Sum(0)); 
	end generate;
	OtherBits: if i /= 0 generate
	FA:Full_Adder port map 
	end generate;
end generate;
Cout <= C(7);


10.2.2  A Register Accumulator

Table 10.3 shows a VHDL model for a positive-edge-triggered D flip-flop with an active-high asynchronous clear. Table 10.4 shows an 8-bit register that uses this D flip-flop model (this model only provides the Q output from the register and leaves the QN flip-flop outputs unconnected).

TABLE 10.3    Positive-edge-triggered D flip-flop with asynchronous clear.

entity DFFClr is 
	generic(TRQ : TIME := 2 ns; TCQ : TIME := 2 ns);
	port (CLR, CLK, D : in BIT; Q, QB : out BIT); 
architecture Behave of DFFClr is
signal Qi : BIT;
begin QB <= not Qi; Q <= Qi;
process (CLR, CLK) begin
	if CLR = '1' then Qi <= '0' after TRQ;
	elsif CLK'EVENT and CLK = '1' 
		then Qi <= D after TCQ;
	end if;
end process;



TRQ (CLR to Q/QN) = 2 ns

TCQ (CLK to Q/QN) = 2 ns

TABLE 10.4    An 8-bit register.

entity Register8 is 
	port (D : in BIT_VECTOR(7 downto 0); 
	Clk, Clr: in BIT ; Q : out BIT_VECTOR(7 downto 0));
architecture Structure of Register8 is
	component DFFClr 
		port (Clr, Clk, D : in BIT; Q, QB : out BIT); 
	end component;
		STAGES: for i in 7 downto 0 generate
		FF: DFFClr port map (Clr, Clk, D(i), Q(i), open);
		end generate;


8-bit register. Uses

DFFClr positive edge-triggered flip-flop model.

Table 10.5 shows a model for a datapath multiplexer that consists of eight 2:1 multiplexers with a common select input (this select signal would normally be a control signal in a datapath). The multiplier will use the register and multiplexer components to implement a register accumulator.

TABLE 10.5    An 8-bit multiplexer.

entity Mux8 is 
	generic (TPD : TIME := 1 ns);
	port (A, B : in BIT_VECTOR (7 downto 0); 
	Sel : in BIT := '0'; Y : out BIT_VECTOR (7 downto 0));
architecture Behave of Mux8 is
	 Y <= A after TPD when Sel = '1' else B after TPD;


Eight 2:1 MUXs with

single select input.


TPD (input to Y) = 1 ns

10.2.3  Zero Detector

Table 10.6 shows a model for a variable-width zero detector that accepts a bus of any width and will produce a single-bit output of '1' if all input bits are zero.

TABLE 10.6    A zero detector.

entity AllZero is 
	generic (TPD : TIME := 1 ns);
	port (X : BIT_VECTOR; F : out BIT );
architecture Behave of AllZero is
begin process (X) begin F <= '1' after TPD;
	 for j in X'RANGE loop
		if X(j) = '1' then F <= '0' after TPD; end if;
	 end loop;
end process;


Variable-width zero detector.


TPD (X to F) = 1 ns

10.2.4  A Shift Register

Table 10.7 shows a variable-width shift register that shifts (left or right under input control, DIR ) on the positive edge of the clock, CLK , gated by a shift enable, SH . The parallel load, LD , is synchronous and aligns the input LSB to the LSB of the output, filling unused MSBs with zero. Bits vacated during shifts are zero filled. The clear, CLR , is asynchronous.

TABLE 10.7    A variable-width shift register.

entity ShiftN is
	generic (TCQ : TIME := 0.3 ns; TLQ : TIME := 0.5 ns;
		TSQ : TIME := 0.7 ns);
	port(CLK, CLR, LD, SH, DIR: in BIT; 
	begin assert (D'LENGTH <= Q'LENGTH) 
		report "D wider than output Q" severity Failure;
end ShiftN;
architecture Behave of ShiftN is
	begin Shift: process (CLR, CLK)
	subtype InB  is NATURAL range D'LENGTH-1 downto 0;
	subtype OutB is NATURAL range Q'LENGTH-1 downto 0;
	variable St: BIT_VECTOR(OutB);
		if CLR = '1' then 
			St := (others => '0'); Q <= St after TCQ;
		elsif CLK'EVENT and CLK='1' then
			if LD = '1' then 
				St := (others => '0'); 
				St(InB) := D; 
				Q <= St after TLQ;
			elsif SH = '1' then
				case DIR is 
				when '0' => St := '0' & St(St'LEFT downto 1);
				when '1' => St := St(St'LEFT-1 downto 0) & '0';
				end case;
				Q <= St after TSQ;
			end if;
		end if;
	end process;


CLK Clock

CLR Clear, active high

LD Load, active high

SH Shift, active high

DIR Direction, 1 = left

D Data in

Q Data out


Variable-width shift register. Input width must be less than output width. Output is left-shifted or right-shifted under control of DIR. Unused MSBs are zero-padded during load. Clear is asynchronous. Load is synchronous.



TCQ (CLR to Q) = 0.3 ns

TLQ (LD to Q) = 0.5 ns

TSQ (SH to Q) = 0. 7 ns

10.2.5  A State Machine

To multiply two binary numbers A and B , we can use the following algorithm:

If the LSB of A is '1', then add B into an accumulator.

Shift A one bit to the right and B one bit to the left.

Stop when all bits of A are zero.

Table 10.8 shows the VHDL model for a Moore (outputs depend only on the state) finite-state machine for the multiplier, together with its state diagram.

TABLE 10.8    A Moore state machine for the multiplier.

entity SM_1 is 
	generic (TPD : TIME := 1 ns);
	port(Start, Clk, LSB, Stop, Reset: in BIT; 
	Init, Shift, Add, Done : out BIT);
architecture Moore of SM_1 is
type STATETYPE is (I, C, A, S, E);
signal State: STATETYPE;
Init <= '1' after TPD when State = I
	else '0' after TPD;
Add  <= '1' after TPD when State = A
	else '0' after TPD;
Shift <= '1' after TPD when State = S
	else '0' after TPD;
Done <= '1' after TPD when State = E
	else '0' after TPD;
process (CLK, Reset) begin
	if Reset = '1' then State <= E;
	elsif CLK'EVENT and CLK = '1' then
		case State is
		when I => State <= C;
		when C => 
			if LSB = '1' then State <= A;
			elsif Stop = '0' then State <= S;
			else State <= E;
			end if;
		when A => State <= S;
		when S => State <= C;
		when E => 
			if Start = '1' then State <= I; end if; 
		end case;
	end if;
end process;


State Function


E End of multiply cycle.

I Initialize: clear output

register and load input


C Check if LSB of register A

is zero.

A Add shift register B to


S Shift input register A right

and input register B left.

10.2.6  A Multiplier

Table 10.9 shows a schematic and the VHDL code that describes the interconnection of all the components for the multiplier. Notice that the schematic comprises two halves: an 8-bit-wide datapath section (consisting of the registers, adder, multiplexer, and zero detector) and a control section (the finite-state machine). The arrows in the schematic denote the inputs and outputs of each component. As we shall see in Section 10.7, VHDL has strict rules about the direction of connections.

TABLE 10.9    A 4-bit by 4-bit multiplier.


entity Mult8 is
port (A, B: in BIT_VECTOR(3 downto 0); Start, CLK, Reset: in BIT;
Result: out BIT_VECTOR(7 downto 0); Done: out BIT); end Mult8;
architecture Structure of Mult8 is use work.Mult_Components.all;
signal SRA, SRB, ADDout, MUXout, REGout: BIT_VECTOR(7 downto 0);
signal Zero, Init, Shift, Add, Low: BIT := '0'; signal High: BIT := '1';
signal F, OFL, REGclr: BIT; 
REGclr <= Init or Reset; Result  <= REGout;
SR1 : ShiftN port map(CLK=>CLK,CLR=>Reset,LD=>Init,SH=>Shift,DIR=>Low ,D=>A,Q=>SRA);
SR2 : ShiftN port map(CLK=>CLK,CLR=>Reset,LD=>Init,SH=>Shift,DIR=>High,D=>B,Q=>SRB);
Z1 : AllZero port map(X=>SRA,F=>Zero);
A1 : Adder8  port map(A=>SRB,B=>REGout,Cin=>Low,Cout=>OFL,Sum=>ADDout);
M1 : Mux8    port map(A=>ADDout,B=>REGout,Sel=>Add,Y=>MUXout);
R1 : Register8 port map(D=>MUXout,Q=>REGout,Clk=>CLK,Clr=>REGclr);
F1 : SM_1    port map(Start,CLK,SRA(0),Zero,Reset,Init,Shift,Add,Done);

10.2.7  Packages and Testbench

To complete and test the multiplier design we need a few more items. First we need the following "components list" for the items in Table 10.9:

package Mult_Components is
component Mux8 port (A,B:BIT_VECTOR(7 downto 0);
	Sel:BIT;Y:out BIT_VECTOR(7 downto 0));end component;
component AllZero port (X : BIT_VECTOR;
	F:out BIT );end component;
component Adder8 port (A,B:BIT_VECTOR(7 downto 0);Cin:BIT;
	Cout:out BIT;Sum:out BIT_VECTOR(7 downto 0));end component;
component Register8 port (D:BIT_VECTOR(7 downto 0);
	Clk,Clr:BIT; Q:out BIT_VECTOR(7 downto 0));end component;
component ShiftN port (CLK,CLR,LD,SH,DIR:BIT;D:BIT_VECTOR;
	Q:out BIT_VECTOR);end component;
component SM_1 port (Start,CLK,LSB,Stop,Reset:BIT;
	Init,Shift,Add,Done:out BIT);end component;

Next we need some utility code to help test the multiplier. The following VHDL generates a clock with programmable "high" time ( HT ) and "low" time ( LT ):

package Clock_Utils is 
procedure Clock (signal C: out Bit; HT, LT:TIME);
end Clock_Utils;
package body Clock_Utils is
procedure Clock (signal C: out Bit; HT, LT:TIME) is
	loop C<='1' after LT, '0' after LT + HT; wait for LT + HT;
	end loop;
end Clock_Utils;

Finally, the following code defines two functions that we shall also use for testing--the functions convert an array of bits to a number and vice versa:

package Utils is 
	function Convert (N,L: NATURAL) return BIT_VECTOR;
	function Convert (B: BIT_VECTOR) return NATURAL;
end Utils;
package body Utils is
	function Convert (N,L: NATURAL) return BIT_VECTOR is
		variable T:BIT_VECTOR(L-1 downto 0);
		variable V:NATURAL:= N;
		begin for i in T'RIGHT to T'LEFT loop
			T(i) := BIT'VAL(V mod 2); V:= V/2;
		end loop; return T;
	function Convert (B: BIT_VECTOR) return NATURAL is
		variable T:BIT_VECTOR(B'LENGTH-1 downto 0) := B;
		variable V:NATURAL:= 0;
		begin for i in T'RIGHT to T'LEFT loop
			if T(i) = '1' then V:= V + (2**i); end if;
			end loop; return V;
end Utils;

The following code tests the multiplier model. This is a testbench (this simple example is not a comprehensive test). First we reset the logic (line 17) and then apply a series of values to the inputs, A and B . The clock generator (line 14) supplies a clock with a 20 ns period. The inputs are changed 1 ns after a positive clock edge, and remain stable for 20 ns through the next positive clock edge.

entity Test_Mult8_1 is end; -- runs forever, use break!!
architecture Structure of Test_Mult8_1 is 
use Work.Utils.all; use Work.Clock_Utils.all;
	component Mult8 port
		(A, B : BIT_VECTOR(3 downto 0); Start, CLK, Reset : BIT; 
		Result : out BIT_VECTOR(7 downto 0); Done : out BIT);
	end component;
signal A, B : BIT_VECTOR(3 downto 0);
signal Start, Done : BIT := '0';
signal CLK, Reset : BIT;
signal Result : BIT_VECTOR(7 downto 0);
signal DA, DB, DR : INTEGER range 0 to 255;
C: Clock(CLK, 10 ns, 10 ns);
UUT: Mult8 port map (A, B, Start, CLK, Reset, Result, Done);
DR <= Convert(Result);
Reset  <= '1', '0' after 1 ns; 
process begin 
	for i in 1 to 3 loop for j in 4 to 7 loop
		DA <= i; DB <= j;
		wait until CLK'EVENT and CLK='1'; wait for 1 ns; 
		Start <= '1', '0' after 20 ns; wait until Done = '1';
		wait until CLK'EVENT and CLK='1';
	end loop; end loop; 
	for i in 0 to 1 loop for j in 0 to 15 loop
		DA <= i; DB <= j;
		wait until CLK'EVENT and CLK='1'; wait for 1 ns;
		Start <= '1', '0' after 20 ns; wait until Done = '1'; 
		wait until CLK'EVENT and CLK='1';
	end loop; end loop;
end process;

Here is the signal trace output from the Compass Scout simulator:

      Time(fs) + Cycle            da           db           dr
----------------------  ------------ ------------ ------------
                  0+ 0:            0            0            0
                  0+ 1: *          1 *          4 *          0
           92000000+ 3:            1            4 *          4
          150000000+ 1: *          1 *          5            4
          193000000+ 3:            1            5 *          0
          252000000+ 3:            1            5 *          5
          310000000+ 1: *          1 *          6            5
          353000000+ 3:            1            6 *          0
          412000000+ 3:            1            6 *          6

Positive clock edges occur at 10, 30, 50, 70, 90, ... ns. You can see that the output (dr) changes from '0' to '4' at 92 ns, after five clock edges (with a 2 ns delay due to the output register, R1).

Chapter  start   Previous  page   Next  page

© 2020 Internet Business Systems, Inc.
25 North 14th Steet, Suite 710, San Jose, CA 95112
+1 (408) 882-6554 — Contact Us
ShareCG™ is a trademark of Internet Business Systems, Inc.

Report a Bug Report Abuse Make a Suggestion About Privacy Policy Contact Us User Agreement Advertise