DUAL FIXED-POINT : AN EFFICIENT ALTERNATIVE TO ... - Xun ZHANG

energy efficient than equivalent floating-point designs. A n-bit DFX number ... such tools is an accurate truncation/rounding error model for the DFX modules.
101KB taille 1 téléchargements 322 vues
DUAL FIXED-POINT : AN EFFICIENT ALTERNATIVE TO FLOATING-POINT COMPUTATION FOR DSP APPLICATIONS Chun Te Ewe Department of Electrical & Electronic Engineering, Imperial College London, Exhibition Road, London SW7 2BT, United Kingdom. [email protected] Digital signal processing (DSP) applications are usually specified in a floating-point environment. The large dynamic range of floating-point means designers need not worry about scaling. On the other hand, hardware implementations of these applications rely on fixed-point approximations to reduce cost and power consumption while increasing throughput rates. This is because fixed-point systems are far less complex as compared to their floating-point counterparts. However, the main disadvantage of fixed-point is its limited dynamic range. The project aims to provide an alternative data representation to floating-point whenever a larger dynamic range is required. Known as Dual FiXed-point (DFX), the new data representation employs a single exponent bit which is recoded to select between two different fixed-point scalings. By doing so, DFX retains the implementation simplicity of fixed-point systems while enjoying an improved dynamic range capability like that of floating-point. Hence, designs that employ DFX are generally smaller, quicker and more energy efficient than equivalent floating-point designs. A n-bit DFX number consists of an exponent bit E, and n − 1 bits of a signed significand X as shown in Figure 1(a). The exponent selects between two scalings for the significand X, giving two possible ranges for the number. The lower number range is referred to as N u m 0 while the higher number range as N u m 1. To achieve two different scalings, N u m 0 is defined to have p0 fractional bits and N u m 1 to have p1 ensuring that p0 > p1 . The range and precision of N u m 0 and N u m 1 are illustrated in Figure 1(b). The value of a DFX number, D, is given by (1).  X · 2−p0 if E = 0 D= (1) if E = 1 X · 2−p1



   



    



   

  



 



Fig. 1. (a)DFX number format; (b)The N u m 0 and N u m 1 range in DFX.

Table 1. Dynamic Range Comparison. The DFX is 32 bits wide, p0 = 16 and p1 = 4 . Number System DFX Fixed-Point Floating-Point Format 32-bit 32-bit 32-bit IEEE Dynamic Range 24 6 ≈ 27 6 dB 23 1 ≈ 1 8 7 dB 22 5 4 ≈ 1 5 29 dB

In order to simplify the design of the arithmetic units, the boundary value is defined as the next incremental value after the maximum positive number of N u m 0, i.e. B = 2n−p0 −2 (−2 because of the exponent and sign bits). Dynamic range is defined by the ratio between the largest and the smallest absolute number in the data format. The smallest absolute value of a DFX number is 2−p0 while the largest absolute value is 2n−p1 −2 , therefore the dynamic range of a DFX number is given by (3). As expected, DFX’s dynamic range is between fixed-point and floating-point as shown in Table 1. Floating-point may have lots more dynamic range, but it many cases, number precision is more vital [1].

A boundary value, B, is needed to decide the appropriate scaling to use and hence the value of E. The value of E is determined by (2).  0 if −B ≤ D < B E= (2) 1 if D < −B or D ≥ B

0-7803-9362-7/05/$20.00 ©2005 IEEE



Dy na m ic r a ng e = 20 lo g 10 (2n+p0 −p1 −2 ) dB

(3)

DFX arithmetic modules were constructed and tested on a Xilinx Virtex II FPGA. Unsurprisingly, results show that DFX arithmetic modules were indeed smaller and faster than

715

their equivalent floating-point. In [1], the authors showed that 32-bit DFX arithmetic modules were about 1.2 to 4 times quicker and smaller than equivalent floating-point implementations. An IIR filter was then built using DFX modules for comparison with floating-point. DFX filters were also found to be at least 2 times smaller and 1.5 times quicker than floating-point filters for similar signal to noise ratio (SNR) performance. By choosing the right scaling, DFX could also can have similar performance to fixed-point having the capability of handling a wider dynamic range. So as to obtain an efficient DFX implementation while satisfying the computational accuracy constraints imposed by the designer, design automation tools that automatically determine the optimum parameters in a DFX design needs to be developed. One prerequisite to the development of such tools is an accurate truncation/rounding error model for the DFX modules. Traditional error analysis techniques such as the additive roundoff error model for fixed-point and the relative roundoff error model for floating-point cannot be utilised because DFX is a scaled number representation and it not normalised. As the errors in DFX arithmetic modules heavily depend on the temporal and spatial correlation of its inputs, a single profiling simulation run is required to attain the distribution of its inputs. With the input profiles, the errors for each module can been predicted accurately. As mentioned earlier, future work is in fully automating the design process. This work will be an extension to the

Synoptix program — a complete synthesis system proposed by Constantinides [2] — to include DFX data representation in its optimisation process. The automated design process will be able to decide whether DFX is a suitable number representation for a given design/signal. Not much work have been done here but some initial work show that when a design does not need as wide a dynamic range that only floating-point can handle, DFX is a clear winner because of the extra precision that DFX gives. The main rival for performance is between DFX and fixed-point. Generally, the area and speed cost of fixed-point designs are hard to match for a given SNR performance. However, DFX has shown to be worthy when the inputs/signals are exponentially distributed (i.e. have a high proportion of small magnitude values compared to large magnitude values). 1. REFERENCES [1] C. T. Ewe, P. Y. K. Cheung, and G. A. Constantinides, “Dual FiXed-Point: An efficient alternative to floating-point computation,” in Field Programmable Logic and Application: 14th International Conference, FPL 2004, Leuven, Belgium, August 2004, pp. 200–208. [2] G. A. Constantinides, “High level synthesis and word length optimization of digital signal processing systems,” Ph.D. dissertation, Imperial College of Science, Technology and Medicine, University of London, London, U.K., September 2001.

716