IEEE-754 Floating Point MultiplierShyam Shankar H R EE15B127 September 16, 2017 Problem statement Question no: 18. To implement oating point multiplication with IEEE single precision format. Procedure 1. Obtaining the sign of the product. 2. Adding the exponents and subtracting the bias (=127). 3. Multiplying the mantissa with MSB 1 concatenated. 4. Placing the binary point in the result and checking for normalization requirement. 5. Normalizing the mantissa and incrementing exponent if needed. 6. Rounding o the mantissa to 23 bits. 7. Checking for underow/overow. The verilog code written is almost entirely structural. Implementation (Module-wise) 0.1 Sign bit identication To nd the sign bit of the product, we simply XOR the sign bits of the multiplicand and multiplier. Sign bit 1 implies -ve and 0 implies +ve. Verilog code: module sign_bit( output wire sign, input wire[31:0] in1, input wire[31:0] in2 ); xor(sign,in1[31],in2[31]); endmodule 0.2 Adding the exponents and subtracting the bias We need an 8 bit adder and a 9-bit subtractor (to include carry of addition, in the minuend). Here, I made a full adder module and implemented ripple carry algorithm to make it 8 bit adder. Here ripple carry adder is sucient as the overall time complexity of the code will be determined by the mantissa multiplication (24bit x 24bit) part. 1 in1[0]. Full subtractor with subtrahend = 1 Full subtractor with subtrahend = 0 These logic circuits are obtained from simple truth tables.c3. endmodule //8 bit Ripple-carry adder module ripple_8( output wire[7:0] sum.c2. output wire cout.c5. which is 127.in2[3]. full_adder FA1(sum[0].cout.cin).c6). input wire in2.in2[1].c4). input wire[7:0] in1.c7).in1.in1[7].c4. and(temp3.in1[5]. full_adder FA4(sum[3].in2[5]. wire c1. full_adder FA8(sum[7].in1[3]. Utilizing the fact that the subtrahend is always a constant (001111111).c3. full_adder FA2(sum[1].in2. we can make two specialized full subtractors (where subtrahend is xed at 0 and at 1 respectively). //borrow out input wire min.in1[6]. full_adder FA3(sum[2].c3).temp3). full_adder FA6(sum[5]. wire temp2.in1.c5). we need to subtract the bias. full_adder FA7(sum[6]. Verilog code: //1 bit subtractor with subtrahend = 1 module full_subtractor_sub1( output wire diff.in2[2].c2. xor(sum. endmodule After adding the exponents.c7. input wire in1.in1[4].Verilog code: //1 bit Full Adder module full_adder( output wire sum.in1[2].c2). wire temp1. input wire cin ).in2[0]. output wire cout. full_adder FA5(sum[4].in2.cin). not shown here. //difference output wire bout.in2[7].cin). //minuend input wire bin //borrow in 2 . and(temp1. input wire[7:0] in2.c7.c1).in1[1]. or(cout.in2[6].temp1.c6. and(temp2.in2[4].cin).in2).c4.in1. input wire cin ).c5.temp2.c1.c6. wire temp3. In carry save multiplication.b8.b3.~min.b5.bin).min[0].b2.min[1].bin).min[8]. full_subtractor_sub0 sub9(diff[8]. output wire bout. //minuend input wire bin //borrow in ). It is depicted in the gure shown (taken from Computer Organization 5th ed. full_subtractor_sub1 sub1(diff[0].min[2].3 Multiplying the mantissa using Carry Save Multiplier We need to append bit '1' to both mantissa and multiply the resulting 24 bit numbers. //Two most significand subtrahends are 0 in 001111111.b7.min. and(bout. the subtrahend is always 0.min[7].b6).bin). full_subtractor_sub1 sub4(diff[3]. Carl Hamacher): We rst make a module for the single bit cell: 3 .b5).b5. //Here. endmodule 0. We can implement it as: xnor(diff.b1.min[6].min.min[4].b4.min[3].bin). full_subtractor_sub1 sub2(diff[1]. wire b1. input wire bin ).b2. I have implemented a carry save multiplication routine to implement this. //difference output wire bout. endmodule Finally we make the bias subtractor (9 bit subtractor) as follows: Verilog code: //9 bit subtractor module subtractor_9( output wire [8:0] diff. or(bout.b6. full_subtractor_sub0 sub8(diff[7].b4.b1). input wire [8:0] min.b4). the carry from each level of partial product summands ow to the next level diagonally. //borrow out input wire min.b8. full_subtractor_sub1 sub7(diff[6]. full_subtractor_sub1 sub6(diff[5].We can implement it as: xor(diff.~min.b7. ).b3).b2). the subtrahend is always 1. endmodule //1 bit subtractor with subtrahend = 0 module full_subtractor_sub0( output wire diff.bout.b6.b7).min[5]..b8).bin).b3. full_subtractor_sub1 sub5(diff[4]. full_subtractor_sub1 sub3(diff[2]. //Here. mout[18]. We get 24 rows which take one bit each of the multiplier. ppi[8]. min[8]. c12). c3. ppi[4]. c9.c5. block b15(ppo[13]. block b10(ppo[8]. output wire sum. block b4 (ppo[2].c23. endmodule Finally. c2). min[4]. block b8 (ppo[6]. we extend this bit cell to implement a row of 23 such single bit cells: Verilog code: module row( output wire[23:0] ppo.c22. c12. block b12(ppo[10].c7. c7. mout[8]. block b20(ppo[18].c14. min[9]. c19. input wire[23:0] ppi. block b3 (ppo[1]. mout[7]. mout[13]. c9). c6). input wire q ). block b19(ppo[17]. ppi[10]. block b1 (sum.c8. c22. c4). c13. mout[3]. ppi[11]. block b7 (ppo[5]. c7). min[23]. we extend the rows to form the diagonal grids. c11. c23. ppi[15]. ppi[23]. c14. ppi[6]. mout[1]. wire c11.1'b0). mout[12]. q.c4. block b9 (ppo[7].q. min[10]. min[7]. c4. mout[21]. q. q. c5). c5. c21. ppi[13]. min[13]. block b2 (ppo[0]. q. c10. //output multiplicand term input wire min. q. min[11]. q. //input multiplier term input wire cin //input carry in ). q. c21). ppi[2]. ppi[12]. //output carry out output wire mout. c1). mout[15].min.c6. q. wire c1. block b24(ppo[22]. q.Verilog code: module block( output wire ppo. mout[19]. ppi[18]. q.c12. min[6]. c20.c16. min[16]. c10). min[21]. and(temp. min[3]. mout[14]. block b17(ppo[15]. q. ppi[14]. mout[20]. q. q. mout[22]. q. wire c21. block b14(ppo[12]. min[20]. mout[6]. //output partial product term output wire cout. c23).c18. ppi[17]. min[14]. mout[9]. c16).temp. ppi[9]. c18. c8.c10. mout[2].c3. full_adder FA(ppo. q. ppi[19]. min[12]. c8). c15. q. min[22]. block b11(ppo[9]. block b21(ppo[19].c19. q. c17). //input partial product term input wire q. min[18]. endmodule Next.c2.c13.c20. c22). ppi[5]. q. c18). The only inputs of this module are the multiplicand and the multiplier and the output is the nal product.ppi. ppi[3]. q.c1. ppi[7]. min[19].ppi[0]. block b22(ppo[20]. mout[11]. q. block b16(ppo[14]. block b5 (ppo[3]. input wire[23:0] min. mout[17]. //input multiplicand term input wire ppi.cout.q). min[2]. c19). block b18(ppo[16]. min[1]. ppi[20]. c6. block b6 (ppo[4]. c14). c13).cin). block b13(ppo[11]. ppi[1]. q.This is the top level module of the carry save multiplier: 4 . c11). min[5]. mout[5]. q. block b23(ppo[21]. c16.min[0]. ppi[16]. output wire[23:0] mout. c17.mout[0]. mout[10]. ppi[21]. mout[4].min. mout[23]. c20). q. c2.c9. c3). min[17].c15. c15). or(mout.1'b0). mout[16]. ppi[22]. ppo[23]. min[15].c17. wire temp. there is no need for normalization. temp20. temp5. ptemp4. q[13]). row r7 (ptemp7. It also generates a ag bit norm_ag. temp8. sum[15]. row r24(sum[47:24]. row r21(ptemp21. temp10. temp10. temp23. temp7. q[14]). there will be one '1'bit to the left of the binary point. //vertical p wire [23:0] ptemp11. row r15(ptemp15.e. the binary point occurs between 45th and 46th bit (starting from 0th bit at LSB). needs normalization 5 . i. //diagonal m wire [23:0] temp11. we just slice the bits.temp17. temp15. sum[11]. temp14. row r8 (ptemp8.1'b1). ptemp7..temp2. there are just 23 bits.temp5. //sel = 1 if leading one is at 47. min. temp9. row r1 (ptemp1. sum[9]. sum[14].ptemp12. temp13. q[1]). temp18. ptemp14. we need to normalize the mantissa.ptemp8. temp2.ptemp6.ptemp7. sum[6]. row r3 (ptemp3. endmodule 0. temp16. sum[17]. temp22. q[6]). temp19. row r16(ptemp16. temp8. temp11. ptemp18.ptemp4. row r14(ptemp14.temp16. Also. sum[21]. input wire[47:0] prdt ).temp8. sum[22]. q[19]). The '1' bit left of the binary point is dropped in product mantissa. q[9]). temp6. ptemp19. temp22. ptemp13. temp14. ptemp1. ptemp15.temp19. ptemp10. we ingored the binary point. sum[10].Verilog code: module product( output wire[47:0] sum. q[4]). ptemp12. row r12(ptemp12.temp4.ptemp13. sum[1]. right shift it by one. sum[4]. ptemp6. q[23]). sum[23]. so we round o our result to 23 bits (simple truncation). wire [23:0] temp21. row r6 (ptemp6. ptemp11. q[0]).ptemp3.. q[11]). Here. and(norm_flag.ptemp14. temp16. ptemp3. sum[7]. temp23.ptemp20. row r5 (ptemp5. sum[13]. temp1. temp17.ptemp19. wire [23:0] ptemp1. we make a module normalize to perform these operations. input wire[23:0] min. temp4.4 Placing the binary point and normalizing the mantissa In a normalized IEEE-754 oating point number.e. ptemp5. q[20]). But if there MSB '1' occurs one bit father from binary point. if there is just one '1' bit left of it. //returns norm =1 if normalization needs to be done. q[16]). in the result. the lower 46 bits lie to the left of binary point. temp20.prdt[47]. sum[18].temp12.ptemp5.e capture from [45:23] if there is no normalization and from [46:24] if there is normalization. ptemp20. temp21. temp18.temp14. q[2]).temp7. q[12]). sum[20].ptemp10.temp6. sum[3].temp20. q[21]). row r10(ptemp10. sum[2]. temp13. to indicate that normalization is done and we need to increment the exponent by 1.temp13. In the nal product's matissa.temp15. ptemp9. ptemp16.temp23.ptemp22. //adjusted mantissa (after extracting out required part) output wire norm_flag. q[17]).ptemp23. ptemp2. q[10]). temp24. input wire[23:0]q ). Verilog code: module normalize( output wire[22:0] adj_mantissa.ptemp15.ptemp2. temp7. ptemp22. row r13(ptemp13. ptemp21.ptemp18. temp3. sum[12]. 24'h000000. q[18]). i. temp2. temp1. temp15. temp6. Finally a multiplexer is used to channel the appropriate mantissa (shifted or intact) depending on norm_ag. wire [23:0] temp1.temp24. After placing the binary point. sum[19]. temp3. row r23(ptemp23. row r17(ptemp17. While multiplying the mantissa (23 bits left of the binary point). temp9. row r11(ptemp11.temp22. q[15]). temp12. q[7]). temp17. q[8]).ptemp16. sum[16]. q[22]). ptemp17. ptemp8. sum[0]. instead of right shifting and then dropping o excess terms. temp4. row r22(ptemp22. q[5]). row r19(ptemp19. temp5. ptemp23. temp21. temp12. row r2 (ptemp2.ptemp9.temp10. row r18(ptemp18.temp9. q[3]). row r4 (ptemp4.i.ptemp17. row r20(ptemp20. So. sum[8]. row r9 (ptemp9. temp11. wire [23:0] ptemp21. sum[5].temp3. temp19.temp18. dummy. wire dummy. wire [7:0] test_exp. and the results are shown below. wire [8:0] sub_temp. Here we invoke normalization module to check if exponent needs to be incremented or not. endmodule Test bench We implement the test bench so as to give four test inputs to the program.inp1[30:23]. While nding exponent.underflow.1'b0). //if the exponent has more than 8 bits: overflow //taking product of mantissa: wire [47:0] prdt. assign adj_mantissa = {results[norm_flag+0]}.1'b0). //to connect unused cout ports of adder wire carry. it signies an underow. wire [22:0] adj_mantissa. wire [22:0] mant_out. The nal result is formed as slices and concatenated together. Verilog code: //Control module to drive and regulate required modules in order module control( input wire[31:0] inp1.{carry. //if sel = 0. normalize norm1(adj_mantissa. wire [7:0] exp2. And if the nal exponent (after normalization) is 255 or greater. leading zero not at 47.1'b0).sub_temp[7:0]. wire [22:0] mant1. if exp1 + exp2 . output wire underflow.inp2[22:0]}).5 Underow/overow check and control module for operating other modules In this module.inp2[30:23].{1'b1. //if there is a carry out => underflow and(overflow.prdt). assign results[0] = prdt[45:23]. output wire[31:0] out.inp1[22:0]}. assign out[22:0] = adj_mantissa. by invoking the required modules in order. 6 .inp2). wire [7:0] exp_out. assign results[1] = prdt[46:24]. input wire[31:0] inp2.1'b1).sub_temp[8]. sign_bit sign_bit1(sign. wire [7:0]temp1.{1'b1. The nal exponent should lie between 1 and 254.. wire norm_flag. product p1(prdt. no need of normalization wire [1:0][22:0] results. ripple_8 ripple_norm(test_exp.norm_flag. wire [22:0] mant2. output wire overflow ).. we get dierent slices of the nal result. assign out[31] = sign.{7'b0. wire [7:0] exp1. assign out[30:23] = test_exp. then there is overow.carry.inp1.temp1}.norm_flag}. subtractor_9 sub1(sub_temp. wire sign.bias gives a borrow out. ripple_8 rip1(temp1. endmodule 0. Verilog code: `timescale 1ns/1ps module stimulus. stimulus). .12462746 × 214 . reg [31:0] in2. .in2. end initial begin $monitor("Multiplicand = %32b. wire underflow. initial begin $dumpfile("multiply. control control1( .overflow). in2 = 32'b11000011001110110011011010110100.underflow. . //product2 = 1 11000100 00111101111001001000001 #10 in1 = 32'b01000101010100100100011110001010. $dumpvars(0. The waveforms obtained for the above observations.in1. . The product is correctly obtained as 69 −1. wire overflow. #10 $finish. The product is correctly obtained as +1. reg [31:0] in1.prdct.inp1(in1).705788373 × 246 and multiplier is +1. //product1 = 0 10001101 00011111111001111001010 #10 in1 = 32'b11010110110110100101011101000110. wire [7:0] test_exp. using gtkwave.overflow(overflow) ). wire [22:0] mant. in2 = 32'b01000011001001100111010110110110.300467252 × 27 . wire [31:0] prdct. in1 = 32'b01000010110111010110001010110010. Overflow = %1b". Product = %32b.729574441 × 26 and multiplier is +1.241768056 × 2 .underflow(underflow). is shown below: 7 . In trial 2.out(prdct).455946207 × 222 . Multiplicand Multiplier Product 1 01000010110111010110001010110010 01000011001001100111010110110110 01000110100011111111001111001010 2 11010110110110100101011101000110 01001010101110100101110001110010 11100010000111101111001001000001 3 01000101010100100100011110001010 01001001101001011010001110110001 01001111100010000000111010010000 4 11001001110100101101100001110100 11000011001110110011011010110100 01001101100110100011000100101010 In trial 1. in2 = 32'b01001001101001011010001110110001. Underflow = %1b. multiplicand is +1.inp2(in2).vcd"). wire normy. Multiplier = %32b. end endmodule Test inputs and results Trial No. multiplicand is −1. #10 in1 = 32'b11001001110100101101100001110100. in2 = 32'b01001010101110100101110001110010. [3] IEEE 754-2008. [2] Computer Systems Design and Architecture.P Heuring. Smith. [4] Douglas J. IEEE Standard for Floating-Point Arithmetic.Command promt output References [1] Hamacher Computer Organization 5th-ed. V. VHDL & Verilog Compared & Contrasted Plus Modeled Example Written in VHDL.umd.ece. Verilog and C [5] http://www. 2008.pdf 8 .edu/class/enee359a/verilog_tutorial.
Report "IEEE-754 floating point multipler in Verilog"