# Low-Cost FIR Filter Designs Based on Faithfully Rounded Truncated Multiple Constant Multiplication/Accumulation

Neha Bharti<sup>1</sup>, Jaikaran Singh Chauhan<sup>2</sup> <sup>1</sup>M.Tech Scholar, SSSUT&MS Sehore, nehabharti01@gmail.com, India; <sup>2</sup>Asso. Professer, SSSUT&MS Sehore, jksingh81@gmail.com, India;

**Abstract** – Low-cost finite impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers. We jointly consider the optimization of bit width and hardware resources without sacrificing the frequency response and output signal precision. Nonuniform coefficient quantization with proper filter order is proposed to minimize total area cost. Multiple constant multiplication/accumulation in a direct FIR structure is implemented using an improved version of truncated multipliers. Comparisons with previous FIR design approaches show that the proposed designs achieve the best area and power results.

*Keywords*: Digital signal processing (DSP), Faithful rounding, Finite impulse response (FIR) filter, truncated multipliers, VLSI design.

## I. Introduction

FINITE impulse response (FIR) digital filter is one of the fundamental components in many digital signal processing(DSP) and communication systems. It is also commonly used in many portable applications with limited area and power budget. A general FIR filter of order M can be stated as

$$y[n] = \sum_{i=0}^{M-1} a_i x[n-i].$$

In case of linear phase, the coefficients are either symmetric anti symmetric or= with  $a_i = a_{M-i}$  or  $a_i = -a_{M-i}$ . There are two basic FIR structures, direct form and transposed form, as shown in Fig. 1 for a linear-phase even-order FIR filter. In the direct form in Fig. 1(a), the multiple constant multiplication (MCM)/accumulation (MCMA) module performs the parallel multiplications of individual delayed signals and respective filter coefficients, followed by accumulation of all the products. Thus, operands of the multipliers in MCMA are delayed input signals x[n - i] and coefficients ai. In the transposed form in Fig. 1(b), the operands of the multipliers in the MCM module are the current input signal x[n] and coefficients. The results of specific constant multiplications go through structure adders (SAs) and delay elements. In the past times, there are many papers on the designs and implementations of low-cost or high-speed FIR filters. In order to avoid costly multipliers, most earlier hardware

implementations of digital FIR filters can be divided into two categories: multiplier less based and memory based. Multiplier less-based designs MCM with shift-and add



Fig1: Structures of linear-phase even-order FIR filters: (a) Direct form and (b) transposed form.

operations and stake the common sub operations using standard signed digit (CSD) recoding and common sub expression rejection (CSE) to minimize the adder cost of MCM, more area savings are achieved by jointly considering the optimization of coefficient quantization and CSE. Maximum multiplier less MCM-based FIR filter designs use the transposed structure to permit for cross-coefficient sharing and tend to be faster, mostly when the filter order is large.

However, the area of delay elements is larger as compared with that of the direct form due to the range extension of the constant multiplications and the following additions in the SAs. In Gustafson presented high-throughput (TP) FIR filter designs by pipelining the carry-save adder trees in the constant multiplications using integer linear programming to minimize the area cost of full adders (FAs), half adders (HAs), and registers (algorithmic and pipelined registers).

Memory-based FIR designs consist of two types of methods: lookup table (LUT) methods and distributed arithmetic (DA) methods. The LUT-based design supplies in ROMs odd multiples of the input signal to understand the constant multiplications in MCM. The DA-based methods recursively accumulate the bit-level partial results for the inner product calculation in FIR filtering. An important design issue of FIR filter implementation is the optimization of the bit widths for filter coefficients, which has direct impact on the area cost of arithmetic units and registers. Moreover, since the bit widths after multiplications grow, manDSP applications do not need full-precision outputs. Instead, is desirable to generate faithfully rounded outputs where the total error introduced in quantization and rounding is no more than one unit of the last place (ulp) defined as the weighting the least significant bit (LSB) of the outputs.

In this brief, we present low-cost implementations of FIR filters based on the direct structure with faith fully rounded truncated multipliers. The MCMA module realized by accumulating all the partial products (PPs) where excessive PP bits (PPBs) are removed without affecting the final precision of the outputs. The bit widths of all the filter.

#### II. Methods

#### II.1. FIR FILTERS

Finite impulse response (FIR) filtering is one of the most commonly used DSP function. It is achieved by convolving the input data samples with the desired unit impulse response of the filter. The output Y(n) of an N – tap FIR file sum of latest N input data samples- N

$$Y[n] = A[i] \cdot X[n-i] i=0$$

The weights A[i] in the expression are the filter coefficients. The number of taps (N) and the coefficient values are derived so as to satisfy the desired filter response of pass-band ripple and stop-band attenuation.





#### II.2. FIR Filter Design Using MCMAT

The FIR filter design in this brief adopts the direct form in Fig. 3 where the MCMA module sums up all the products  $ai \times x[n - i]$ .Instead of accumulating individual multiplication for each product, it is more efficient to collect all the PPs into a single PPB matrix with carrysave addition to reduce the height of the matrix to two, followed by a final carry propagation adder. Fig. 3 illustrates the difference of individual multiplications and combined multiplication for  $A \times B + C \times D$ . In order to avoid the sign extension bits, we complement the sign bit of each PP row and add some bias constant using



Fig: 3- Direct form linear phase FIR filter

The property s = 1 - s, where s is the sign bit of a PP row, as shown in Fig. 1. All the bias constants are collected

into the last row in the PPB matrix. The complements of PPBs are denoted by white circles with over bars.

Two rows of PPBs are set undeletable because they will be removed at the subsequent truncation and rounding. The fault ranges of deletion, truncation, and rounding before and after adding the offset constants. The gray circles, crossed green circles, and crossed red circles represent respectively the deleted bits, truncated bits, and rounded bits.

### III. Proposed System

Synchronized MCMA with pipeline clues to decrease the critical path. Either advances the clock speed (or sampling speed) or decreases the power consumption at same speed in a DSP system. Pipelining is dropping the effective dangerous path by announcing pipelining latches along the critical data path. The pipelined implementation by introducing 2 extra latches in the critical path is reduced from TM+2TA to TM+TA. The schedule of events for this pipelined system. You can see that, at any time, 2 successive outputs are computed in an enclosed way.

Multiple continuous multiplication/accumulation in a direct FIR structure is implemented using an better version of truncated multipliers. Comparisons with preceding FIR design approaches show that the planned designs achieve the best area and power results.

A broad flow of FIR filter strategy and implementation can be divided into three stages: finding filter order, coefficient quantization, and hardware optimization. In first stage, the filter order and the analogous coefficients of infinite precision are resolute to satisfy the description of the frequency reply. Then, the coefficients are quantized to finite bit accuracy. lastly, various optimization approaches such as CSE are used to minimize the area cost of hardware implementations. Most previous FIR filter implementations focus on the hardware optimization stage. Last FIR filter operations, the output signals have larger bit width due to bit width expansion after multiplications. In numerous practical situations, only incomplete bit of the full-precision outputs are needed.



Fig4: Stages of digital FIR filter design and implementation

For example, conceited that the input signals of the FIR filter which has 12 bits and the filter coefficients are quantized to 10 bits, the bit thickness of the resultant FIR filter output signals is at smallest 22 bits, but we might need only the 12 most

significant bits for subsequent processing. In this short-lived, we adopt the direct FIR structure with MCMA because the area cost of the flip-flops in the delay elements is smaller compared with that of the transferred form. Also, we mutually consider the three design stages in order to achieve more efficient hardware design with authentically rounded output signals. Unlike conventional uniform quantization of filter coefficients with equal bit width, the non uniform quantization technique with possibly different bit widths is accepted in this brief.



Fig 5: Block Diagram of pipelined FIR filter

## IV. Result

According to synthesis result, maximum time delay produced is 1.108ns. That constraint yields minimum clock period 0.72ns. The minimum input arrival time before clock is 0.730ns and the maximum output required time after clock is 1ns.s

Let us also look at the macro statistics, i.e. the number of registers, flip-flops and multiplexer required. To implement the whole design, there are 108 number of registers are required .In a synthesis results, there are shown the top level block diagram, RTL view and the technological view of the implemented FIR Filter.



Fig6: RTL Design

International Journal of advancement in electronics and computer engineering (IJAECE) Volume 4, Issue 8, Dec. 2015, pp.550-554, ISSN 2278 -1412 Copyright © 2012: IJAECE (www.ijaece.com)







Fig8: Technology Design



Fig9: Technology Internal Design



Fig10: Design Simulation

## V. Conclusion

This paper presented low-cost FIR filter designs by jointly considering the optimization of coefficient bit size and hardware resources in implementations. Even if most previous designs are based on the transposed shape, we examine that the direct FIR structure with authentically rounded MCMAT leads to the least area cost and power consumption.

#### References

[1] M. M. Peiro, E. I. Boemo, and L.Wanhammar, "Design of high-speed multiplierless filters using a nonrecursive signed common subexpression algorithm," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 3, pp. 196–203.

[2] C.-H. Chang, J. Chen, and A. P. Vinod, "Information theoretic approach to complexity reduction of FIR filter design," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 8, pp. 2310–2321.

[3] F. Xu, C. H. Chang, and C. C. Jong, "Contention resolution—A new approach to versatile subexpressions sharing in multiple constant multiplications," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2, pp. 559–571.

[4] F. Xu, C. H. Chang, and C. C. Jong, "Contention resolution algorithms for common subexpression elimination in digital filter design," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 695–700.

[5] I.-C. Park and H.-J. Kang, "Digital filter synthesis based on an algorithm to generate all minimal signed digit representations," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 12, pp. 1525–1529.

[6] Shen Fu Hsiao, Jun Hong Zhang Jian, and Ming Chih Chen,"Low Cost FIR Filter Designs Based on Faithfully Rounded Truncated Multiple Constant Multiplication/Accumulation,"IEEE transactions on circuits and systems II: Exp.Briefs, vol. 60, No. 5, May 2013

[7] H.J. Ko and S.-F. Hsiao, "Design and application of faithfully rounded and truncated multipliers with combined

deletion, reduction, truncation, and rounding," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 5, pp. 304–308, May 2011.

[8] P. K. Meher, "New approach to look-up-table design and memory-based realization of FIR digital filter," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592–603, Mar. 2010.

[9] S. Hwang, G. Han, S. Kang, and J.-S. Kim, "New distributed arithmetic algorithm for low-power FIR filter implementation," IEEE Signal Process. Lett., vol. 11, no. 5, pp. 463–466, May 2004.

[10] Ron S. Waters, Member, IEEE, and Earl E.Swartzlander, Jr., Fellow, IEEE, "A Reduced Complexity Wallace Multiplier Reduction", IEEE Transactions On Computers, Vol. 59, No. 8, August 2010

[11] M. J. Schulte and E. E. Swartzlander, Jr., "Truncated multiplication with correction constant," in VLSI Signal Processing VI. Piscataway, NJ:IEEE Press, 1993, pp. 388–396.

[12] C.-H. Chang, J. Chen, and A. P. Vinod, "Information theoretic approach to complexity reduction of FIR filter design," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 8, pp. 2310–2321, Sep. 2008.

[13] Jiafeng Xie, Jian jun He, and Pramod Kumar Meher ,"Low Latency Systolic Montgomery Multiplier for Finite Field GF(2m) Based on Pentanomials" IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol. 21, No. 2, February 2013.