# Novel Parallel - Perfix Structure Binary to Residue Number System Conversion Method # **Omar Dajani** Pepe Siy Wayne State University Department of Electrical and Computer Engineering #### **Abstract** In the present world there is always a demand for faster algorithms and techniques that could boost up the speed of the computations. With the help of VLSI fabrication techniques and using residue number system (RNS) arithmetic we can achieve the faster speeds. In this paper we propose a novel parallel prefix binary to residue number system conversion method. The method that we present in this paper utilizes parallel-prefix technique with multiplexers and modulo adders as the main building blocks which makes it practical and suitable for VLSI implementation. **Keywords:** Residue Number System, Binary to Residue Conversion, Multiplexer, Modulo Adder. ### 1. INTRODUCTION In the last decade, Residue number system (RNS) has received increased attention due to its ability to support high speed concurrent arithmetic applications [1-3] such as Fast Fourier transform (FFT), image processing and digital filters utilize the efficiencies of RNS arithmetic in addition and multiplication. The advancements in very large scale integration (VLSI) technology and demand for parallelism computation have enabled researchers to consider RNS as an alternative approach to high speed concurrent arithmetic. Several methods are found in literature for binary to RNS conversion. Alia and Martinelli [4] have proposed a method for binary to residue conversion based on powers of 2. A modification to the above method was proposed by Cappocelli and Giancarlo [5]. Anandmohan [6] has proposed a similar method but with difference that his method is based on the cyclic property of power of 2 moduli set. Behrooa[7] proposed a table lookup schemes for binary to Residue conversions. In this paper, we present a novel binary to Residue Number System conversion method that we used to build Residue Arithmatic logic unit (RALU). RALU has three main units: Binary to Residue unit, ALU and Residue to Binary unit [8]. The organization of this paper is as follows. Section two explains RNS system. In section three we present new conversion from binary to RNS algorithm. Section four and five show illustrative example and implementation selection techniques. Section six is comparison between the new method and pervious work. Conclusion is in section seven. # 2. RESIDUE NUMBER SYSTEM Any n-bit nonnegative integer number X, in the range $0 \le X \le 2^n$ -1 is represented in binary number system as $X = 2^{n-1}b_{n-1} + ... + 2^2b_2 + 2b_1 + b_0 = \sum_{i=0}^{n-1} 2^ib_i$ where $b_{i} \in \{0, 1\}$ . Meanwhile in RNS, X is represented by k residue digits $x_i$ as $X = \{x_1, x_2, x_3, ..., x_k\}$ where $x_i = X \mod m_i$ and $m_i$ belong to set of relatively prime moduli; $m_i \in \{m_1, m_2, m_3, ..., m_k\}$ [9]. If the moduli are relatively prime numbers, there is a unique RNS representation for each integer in range $0 \le X \le \prod_{i=1}^s m_i$ # 3. NEW NOVEL CONVERSION METHOD FROM BINARY TO RESIDUE REPRESENTATION As shown above an integer number X can be represented in Binary system as $$X = 2^{n-1}b_{n-1} + ... + 2^{2}b_{2} + 2b_{1} + b_{0} = \sum_{i=0}^{n-1} 2^{i}b_{j}$$ And RNS representation of number X is $$\begin{aligned} \left| \mathbf{X} \right|_{\mathbf{m}} &= \left| \sum_{\mathbf{j}=0}^{\mathbf{n}-\mathbf{l}} \left| 2^{\mathbf{j}} \mathbf{b}_{\mathbf{j}} \right|_{\mathbf{m}} \right|_{\mathbf{m}} & \text{for } \mathbf{m} > 2 \\ &= \left| \sum_{\mathbf{j}=0}^{\mathbf{n}-\mathbf{l}} \left| 2^{\mathbf{j}} \right|_{\mathbf{m}} \mathbf{b}_{\mathbf{j}} \right|_{\mathbf{m}} & (1) \end{aligned}$$ Let $M_{A_1A_0}=(A_1,A_0)[Y_0\ ,\ Y_1,\ Y_2,Y_3]$ denotes a 2-bit multiplexer where the 2 control bits (A1, A0) select the inputs ( $Y_0,Y_1,\ Y_2,Y_3$ ) to be outputted <u>Lemma 1</u>: For any pair of bits $b_i \& b_i$ for $j \& i \ge 0$ , $$\left\|2^{j}\right\|_{m}b_{j}+\left|2^{i}\right|_{m}b_{i}\Big|_{m}=\left|X_{ji}\right|_{m}$$ can be implemented using 2-bit multiplexer: $$\mathbf{M}_{ji} = (\mathbf{b}_{j}, \mathbf{b}_{i})[0, |2^{i}|_{m}, |2^{j}|_{m}, |2^{j}|_{m} + |2^{i}|_{m}]$$ (2) Where the control bits (A<sub>1</sub>, A<sub>0</sub>) equal (b<sub>i</sub>, b<sub>i</sub>) # Proof: $$\begin{split} \text{Rewrite equation } \left\|2^j\right|_{\mathbf{m}} b_{\mathbf{j}} + \left|2^i\right|_{\mathbf{m}} b_{\mathbf{i}}\right|_{\mathbf{m}} \text{ as } \\ & (0.\overline{b}_{\mathbf{j}}.\overline{b}_{\mathbf{i}}) + \left(2^i\right|_{\mathbf{m}}.\overline{b}_{\mathbf{j}}.b_{\mathbf{i}}) + \left(2^j\right|_{\mathbf{m}}.b_{\mathbf{j}}.\overline{b}_{\mathbf{i}}) + \left(2^j\right|_{\mathbf{m}}.b_{\mathbf{j}}.b_{\mathbf{i}}) \end{split}$$ This is equivalent to 2-bit multiplexer $M_{ji}$ with control bits $(A_1, A_0)$ equal $(b_j, b_i)$ . Figure (1) shows the implementation for equation (2) with $b_i = b_1$ and $b_i = b_0$ FIGURE 1: Two Bits (b1 & b0) Binary to Residue Number System Conversion This pre-processing operator $M_{ji}$ is represented in acyclic graph as node " $\bigcirc$ " in figure (2a), where all the inputs are constants and pre-calculated . FIGURE 2: Prefix logic operation and their implementation In three bit system, let $M_{A_2A_1A_0}=(A_2,A_1,A_0)[Y_0,Y_1,Y_2,Y_3,Y_4,Y_5,Y_6,Y_7]$ denotes a 3-bit multiplexer where the 3 control bits (A<sub>2</sub>, A<sub>1</sub>, A<sub>0</sub>) select the inputs (Y<sub>0</sub>,Y<sub>1</sub>,Y<sub>2</sub>,Y<sub>3</sub>,Y<sub>4</sub>,Y<sub>5</sub>,Y<sub>6</sub>,Y<sub>7</sub>) to be outputted. <u>Lemma 2:</u> For any three bits $b_k$ , $b_j$ & $b_i$ for k, j & $i \ge 0$ , $$\left\|2^k\right|_m b_k + \left|2^j\right|_m b_j + \left|2^i\right|_m b_i \bigg|_m = \left|X_{kji}\right|_m \text{can be}$$ implemented using 3-bit multiplexer: $$\mathbf{M}_{kji} = (\mathbf{b}_{k}, \mathbf{b}_{j}, \mathbf{b}_{i})[0, |2^{i}|_{m}, |2^{j}|_{m}, |2^{j}|_{m} + |2^{i}|_{m}|_{m}, |2^{k}|_{m}, |2^{k}|_{m} + |2^{i}|_{m}|_{m}, |2^{k}$$ Where control bits $(A_2, A_1, A_0)$ equal $(b_k, b_i, b_i)$ Proof: $$\begin{split} & \text{Rewrite} \ \left\| 2^k \right\|_{\mathbf{m}} b_k + \left| 2^j \right|_{\mathbf{m}} b_j + \left| 2^i \right|_{\mathbf{m}} b_i \right|_{\mathbf{m}} \text{ as } \\ & (0.\overline{b}_k.\overline{b}_j.\overline{b}_i) + (\left| 2^i \right|_{\mathbf{m}}.\overline{b}_k\overline{b}_j.b_i) + (\left| 2^j \right|_{\mathbf{m}}.\overline{b}_k.b_j.\overline{b}_i) + \\ & (\left\| 2^j \right\|_{\mathbf{m}} + \left| 2^i \right|_{\mathbf{m}} \right|_{\mathbf{m}}.\overline{b}_k.b_j.b_i) + (\left| 2^k \right|_{\mathbf{m}}.b_k.\overline{b}_j.\overline{b}_i) + \\ & (\left\| 2^k \right\|_{\mathbf{m}} + \left| 2^i \right|_{\mathbf{m}} \right|_{\mathbf{m}},b_k\overline{b}_j.b_i) + (\left\| 2^k \right|_{\mathbf{m}} + \left| 2^j \right|_{\mathbf{m}} \right|_{\mathbf{m}} b_k.b_j.\overline{b}_i) + \\ & (\left| 2^k \right|_{\mathbf{m}} + \left| 2^j \right|_{\mathbf{m}} + \left| 2^i \right|_{\mathbf{m}}.b_k.b_j.b_i) \end{split}$$ Above equation is equivalent to 3-bit multiplexer with $b_k$ , $b_j$ & $b_i$ as selection control inputs. Figure (3) shows the implementation for equation 3 with $b_k = b_2$ $b_i = b_1$ and $b_i = b_0$ FIGURE 3: Three bits (b<sub>2</sub>, b<sub>1</sub>, b<sub>0</sub>) Binary to Residue Number System Conversion This pre-processing operator $M_{kji}$ is represented in acyclic graph as node " $\square$ " in figure (2b), where all the inputs are constants and pre-calculated. <u>Theorem 1:</u> For Any two pairs of bits $(b_i \& b_k)$ $(b_j \& b_i \text{ for } j, i, l \& k \ge 0 \text{ with the given expression}$ $$\|2^{1}\|_{m}b_{1} + \|2^{k}\|_{m}b_{k} + \|2^{j}\|_{m}b_{j} + \|2^{i}\|_{m}b_{i}\|_{m} = \|X_{1kji}\|_{m}$$ can be implemented using 2-bit multiplexer $$\mathbf{M}_{lkji} = (b_l + b_k), (b_j + b_i)[0, \mathbf{M}_{ji}, \mathbf{M}_{lk}, \mathbf{M}_{lk} + \mathbf{M}_{ji}]$$ (4) Where control bits $(A_1, A_0)$ equal $(b_i, +b_k, b_i, +b_i)$ $$\frac{\text{Proof}}{\left|X_{1kji}\right|_{m}} = \left\|2^{1}\right|_{m} b_{1} + \left|2^{k}\right|_{m} b_{k} + \left|2^{j}\right|_{m} b_{j} + \left|2^{i}\right|_{m} b_{i} \Big|_{m}$$ $$\left|\mathbf{X}_{1kji}\right|_{m} = \left|\mathbf{M}_{1k} + \mathbf{M}_{ji}\right|_{m} \tag{5}$$ Where $\mathbf{M}_{lk} = \left\|2^l\right\|_m \mathbf{b}_1 + \left|2^k\right\|_m \mathbf{b}_k \right\|_m$ and $\mathbf{M}_{ji} = \left\|2^j\right\|_m \mathbf{b}_j + \left|2^i\right\|_m \mathbf{b}_i \right\|_m$ by Lemma 1 Let $\mathbf{b}_{lk} = (\mathbf{b}_l + \mathbf{b}_k)$ and $\mathbf{b}_{ji} = (\mathbf{b}_j + \mathbf{b}_i)$ $$\text{Rewrite equation (5) as } (0.\overline{b}_{lk}.\overline{b}_{ji}) + (M_{ji}.\overline{b}_{lk}.b_{ji}) + (M_{lk}.b_{ji}) + (M_{lk}+M_{ji}.b_{lk}.b_{ji}).$$ And this is equivalent to 2-bit multiplexer $M_{lkji}$ with control bits $(A_1, A_0)$ equal $(b_l + b_k, b_j + b_i)$ . Figure (4) shows implementation for two pair bits $(b_3, b_2)$ & $(b_1, b_0)$ **FIGURE 4:** Four Bits Binary to RNS $\left|X_{3..0}\right|_{m} = \left\|2^{3}\right|_{m} b_{3} + \left|2^{2}\right|_{m} b_{2} + \left|2^{1}\right|_{m} b_{1} + b_{0}\right|_{m}$ ## Lemma 3: Combining two pairs of bits (b<sub>i</sub> & b<sub>k</sub>) (b<sub>j</sub> & b<sub>i</sub>) requires one 2-bit multiplexer and one 2 input mod adder. The delay time $\tau_{Total} = \tau_{mux_2} + \tau_{modadder_2}$ # **Proof** Equation (4) and figure (4) show that $(b_l + b_k), (b_j + b_i)[0, M_{ji}, M_{lk}, M_{lk} + M_{ji}]$ is equivalent to one 2-bit multiplexer and one 2-input mod adder; and delay time is equal to $\tau_{mux_2} + \tau_{modadder_2}$ . Figure (2c) represents acyclic graph " lacktriangle " for node $M_{lkji}$ where $M_{lk}$ & $M_{ji}$ are inputs. # Lemma 4 The parallel prefix operator has the following properties 1) Commutative $$M_{lk} \bullet M_{ji} = M_{ji} \bullet M_{lk}$$ 2) Associative $$M_{lk} \bullet (M_{hg} \bullet M_{ji}) = (M_{lk} \bullet M_{hg}) \bullet M_{ji}$$ Proof: $$M_{lk} \bullet M_{ji} = M_{lkji}$$ $$= (b_{1} + b_{k}), (b_{j} + b_{i})[0, M_{ji}, M_{lk}, M_{lk} + M_{ji}]$$ $$= ||2^{l}|_{m} b_{1} + |2^{k}|_{m} b_{k} + |2^{j}|_{m} b_{j} + |2^{i}|_{m} b_{i}|_{m} (6)$$ $$M_{ji} \bullet M_{lk} = M_{jilk}$$ $$= (b_{j} + b_{i}), (b_{1} + b_{k})[0, M_{kl}, M_{ji}, M_{ji} + M_{lk}]$$ $$= ||2^{j}|_{m} b_{j} + |2^{i}|_{m} b_{i} + |2^{l}|_{m} b_{1} + |2^{k}|_{m} b_{k}|_{m} (7)$$ Both expressions (6) and (7) are the same by commutative property of "+" hence ● operator is commutative $$M_{lk} \bullet (M_{hg} \bullet M_{ji}) = M_{lk} \bullet M_{hgji}$$ $$= M_{lkhgji}$$ $$= ||2^{l}|_{m} b_{k} + |2^{k}|_{m} b_{k} + |2^{h}|_{m} b_{h} + |2^{g}|_{m} b_{g} + |2^{j}|_{m} b_{j} + |2^{i}|_{m} b_{i}|_{m}$$ $$(M_{lk} \bullet M_{hg}) \bullet M_{ji} = M_{lkhg} \bullet M_{ji}$$ $$= M_{lkhgji}$$ $$= ||2^{l}|_{m} b_{k} + |2^{k}|_{m} b_{k} + |2^{h}|_{m} b_{h} + |2^{g}|_{m} b_{g} + |2^{j}|_{m} b_{j} + |2^{i}|_{m} b_{i}|_{m}$$ $$(9)$$ Both expressions (8) and (9) are the same by associative property of "+" hence ● operator is associative Theorem 2: For any three pairs of bits (b<sub>1</sub> & b<sub>k</sub>), $$(b_i \& b_i)$$ and $(b_h \& b_g)$ for I, k, j, i, h & g $\geq 0$ with given expression $$\left\| 2^{1} \right\|_{m} b_{1} + \left| 2^{k} \right|_{m} b_{k} + \left| 2^{j} \right|_{m} b_{j} + \left| 2^{i} \right|_{m} b_{i} + \left| 2^{h} \right|_{m} b_{h} + \left| 2^{g} \right|_{m} b_{g} = \left| X_{lkjihg} \right|_{m}$$ can be implemented using 3-bit multiplexer $$\begin{split} M_{lkjihg} = &(b_l + b_k), (b_j + b_i), (b_h + b_g)[0, M_{hg}, M_{ji}, M_{ji} + M_{hg}, M_{lk}, M_{lk} + M_{hg}, \\ M_{lk} + M_{ji}, M_{lk} + M_{ji} + M_{hg}] \end{split} \tag{10}$$ Where control bits $(A_2, A_1, A_0)$ equal $(b_1+b_k, b_i+b_i, b_h+b_a)$ $$\left|X_{lkjihg}\right|_{m} = \left\|2^{1}\right|_{m} b_{1} + \left|2^{k}\right|_{m} b_{k} + \left|2^{j}\right|_{m} b_{j} + \left|2^{i}\right|_{m} b_{i} + \left|2^{h}\right|_{m} b_{h} + \left|2^{g}\right|_{m} b_{g}\right|_{m}$$ $$\left|X_{lkjihg}\right|_{m} = \left|M_{lk} + M_{ji} + M_{hg}\right|_{m} \tag{11}$$ Where $$\boldsymbol{M}_{lk}=\left\|\boldsymbol{2}^{l}\right|_{m}\boldsymbol{b}_{l}+\left|\boldsymbol{2}^{k}\right|_{m}\boldsymbol{b}_{k}\right|_{m}$$ , $$\mathbf{M}_{ji} = \left\| 2^{j} \right\|_{m} \mathbf{b}_{j} + \left| 2^{i} \right\|_{m} \mathbf{b}_{i} \right\|_{m} \quad \text{and} \quad$$ $$\mathbf{M}_{\mathrm{hg}} = \left\| 2^{\mathrm{h}} \right\|_{\mathrm{m}} \mathbf{b}_{\mathrm{h}} + \left| 2^{\mathrm{g}} \right|_{\mathrm{m}} \mathbf{b}_{\mathrm{g}} \right\|_{\mathrm{m}}$$ by Lemma 1 Let $$b_{lk} = (b_l + b_k)$$ , $b_{ji} = (b_j + b_i)$ and $b_{hg} = (b_h + b_g)$ Rewrite equation (11) as $$\begin{split} &(0.\overline{b}_{lk}.\overline{b}_{ji}.\overline{b}_{hg}) + (M_{hg}.\overline{b}_{lk}\overline{b}_{ji}.b_{hg}) + (M_{ji}.\overline{b}_{lk}.b_{ji}.\overline{b}_{hg}) + ((M_{ji} + M_{hg})\overline{b}_{lk}.b_{ji}.b_{hg}) + \\ &(M_{lk}.b_{lk}.\overline{b}_{ji}.\overline{b}_{hg}) + ((M_{lk} + M_{hg}).b_{lk}\overline{b}_{ji}.b_{hg}) + \\ &((M_{lk} + M_{ji}).b_{lk}.b_{ji}.\overline{b}_{hg}) + \\ &((M_{lk} + M_{ji}).b_{lk}.b_{ji}.\overline{b}_{hg}) + \\ &((M_{lk} + M_{ji}).b_{lk}.b_{ji}.b_{hg}) \end{split}$$ This is equivalent to 3-bit multiplexer $M_{lkjihq}$ with control bits $(A_2, A_1, A_0)$ equal $(b_l + b_k, b_i + b_i, b_i)$ $b_h + b_a$ ) Figure (5) shows implementation for three pairs of bits (b<sub>5</sub>, b<sub>4</sub>), (b<sub>3</sub>, b<sub>2</sub>) & (b<sub>1</sub>, b<sub>0</sub>) FIGURE 5: Six Bits Binary to Residue Number System Conversion $$\left| \mathbf{X}_{5..0} \right|_{\mathbf{m}} = \left\| 2^{5} \right|_{\mathbf{m}} \mathbf{b}_{5} + \left| 2^{4} \right|_{\mathbf{m}} \mathbf{b}_{4} + \left| 2^{3} \right|_{\mathbf{m}} \mathbf{b}_{3} + \left| 2^{2} \right|_{\mathbf{m}} \mathbf{b}_{2} + \left| 2^{1} \right|_{\mathbf{m}} \mathbf{b}_{1} + \mathbf{b}_{0} \right|_{\mathbf{m}}$$ #### Lemma 5: Combining three pairs of bits $(b_i \& b_k)$ , $(b_j \& b_i) \& (b_h \& b_g)$ requires one 3- bit multiplexer and three 2-input mod adder and one 3-input mod adder. The delay time equals $$\tau_{\text{Total}} = \tau_{\text{mux}_3} + 2\tau_{\text{modadder}_2} = \tau_{\text{mux}_3} + \tau_{\text{modadder}_3}$$ # **Proof** Equation (10) and figure (5) show that $(b_l+b_k),(b_j+b_i),(b_h+b_g)[0,M_{\rm hg},$ $M_{ji}$ , $M_{ji}$ + $M_{hg}$ , $M_{lk}$ , $M_{lk}$ + $M_{hg}$ , $M_{lk}$ + $M_{ji}$ , $M_{lk}$ + $M_{ji}$ + $M_{hg}$ ] is equivalent to one 3-bit multiplexer and three 2-input mod adder and one 3-input mod adder; and delay time is equal to $\tau_{mux_3}$ + $2\tau_{modadder}$ , Figure (2d) represents acyclic graph for "■ " for node M<sub>Ikhqii</sub> where M<sub>Ik,</sub> M<sub>hq</sub> & M<sub>ii</sub> are inputs # Lemma 6: The parallel prefix operator has the following properties 1) Commutative $$M_{lkjihg}$$ $M_{tsrqpo} = M_{tsrqpo}$ $M_{lkjihg}$ 2) Associative $$M_{lkjihg}$$ $\blacksquare$ $(M_{tsrqpo}$ $\blacksquare$ $M_{zyxwru}) = (M_{lkjihg}$ $\blacksquare$ $M_{tsrqpo})$ $\blacksquare$ $M_{zyxwru}$ ### Proof: The proof is similar to Lemma 4 # 4. ILLUSTRATIVE EXAMPLE In this section, we will use illustrate how theorem 1, theorem 2, lemma 1 and lemma 2 can be combined to design a binary to residue convertor. Figure (6) shows how $\left|X\right|_m$ for n = 8 is computed. In the first layer, using pre-processing operator $\bigcirc$ each consecutive pair of bits are group together $(b_7,b_6)$ $(b_5,b_4)$ $(b_3,b_2)$ $(b_1,b_0)$ creating nodes $M_{76}$ , $M_{54}$ , $M_{32}$ , $M_{10}$ . In the second layer, using parallel prefix operator $\bullet$ each consecutive M node are combined $(M_{76},M_{54})$ $(M_{32},M_{10})$ forming nodes $M_{7..4}$ & $M_{3..0}$ . In the last layer, using parallel prefix operator $\bullet$ the last 2 M nodes are combined $(M_{7..4},M_{3..0})$ forming node $M_{7..0} = \left|X_{7..0}\right|_m$ . Figure (7a) shows the actual hardware implementation. FIGURE 6: Prefix Structure of 8 Bits Binary to RNS Total delay time for this example is calculated by counting the delay introduce by the operator in each layer by using lemma 3 as follows Layer 1: delay time is $\tau_{mux}$ , pre-processing operator O doesn't requires an adder Layer 2 : delay time is $\tau_{\text{mux}_2} + \tau_{\text{modadder}_2}$ Layer 3: delay time is $\tau_{\text{mux}_2} + \tau_{\text{modadder}_2}$ Total delay is the sum of all layers delay time $au_{Total} = 3 au_{mux_2} + 2 au_{modadder_2}$ To show that hardware works, the signal propagation for binary number $\left|X\right|_7 = \left|(1110 \quad 0110)_2\right|_7 = \left|244_{10}\right|_7 = 6$ is illustrated in figure (7b). Similarly, the reader can try any bit pattern in figure (7b) to check the validity of the design. For example $\left|X\right|_7 = \left|(1111 \quad 1111)_2\right|_7 = \left|255_{10}\right|_7 = 3$ where each multiplexer select line is 3 and the selected output are shown in parenthesis. FIGURE 7A: Eight Bits Binary to Residue Number System Conversion $$\left|X_{7..0}\right|_{m} = \left\|2^{7}\right|_{m}b_{7} + \left|2^{6}\right|_{m}b_{6} + \left|2^{5}\right|_{m}b_{5} + \left|2^{4}\right|_{m}b_{4} + \left|2^{3}\right|_{m}b_{3} + \left|2^{2}\right|_{m}b_{2} + \left|2\right|_{m}b_{1} + b_{0}\right|_{m}$$ **FIGURE 7B:** Example for Signal propagation of $|(1110 \quad 0110)_2|_7$ and $|(1111 \quad 1111)_2|_7$ # 5. IMPLEMENTATION SELECTION There are several possible binary to RNS imp- lementations using a combination of 2-bit and 3-bit multiplexers. Figure (8) shows three different imp- lementations (design 1, design 2 and design 3) for 10 bits binary to residue conversion system. To simplify comparison, the following reasonable assumptions are made $$\tau_{\text{mux}_2} = \tau_{\text{mux}_3}; \quad \tau_{\text{modadder}_3} = 2\tau_{\text{modadder}_2}$$ Design 1 uses nine 2-bit multiplexers and four 2-input mod adders with Layer 1: delay time is $\tau_{mux}$ Layer 2: delay time is $\tau_{mux_2} + \tau_{modadder_2}$ # Omar Dajani Layer 3: delay time is $\tau_{\text{mux}_2} + \tau_{\text{modadder}_2}$ Layer 4: delay time is $\tau_{mux_2} + \tau_{modadder_2}$ Total delay is sum of all layers delay time $\tau_{Total} = 4\tau_{mux_2} + 3\tau_{modadder_2}$ FIGURE 8: Prefix Structure of 10 Bits Binary to RNS $$\left|X_{9..0}\right|_{m} = \left|2^{9}\right|_{m}b_{9} + \left|2^{8}\right|_{m}b_{8} + \left|2^{7}\right|_{m}b_{7} + \left|2^{6}\right|_{m}b_{6} + \left|2^{5}\right|_{m}b_{5} + \left|2^{4}\right|_{m}b_{4} + \left|2^{3}\right|_{m}b_{3} + \left|2^{2}\right|_{m}b_{2} + \left|2\right|_{m}b_{1} + b_{0}\right|_{m}$$ <u>Design 2</u> uses four 3-bit multiplexers, one 2-bit multiplexers, four 2-input mod adders and one 3-input adder with Layer 1: delay time is $\tau_{mux_3}$ Layer 2: delay time is $\tau_{\text{mux}_3} + 2\tau_{\text{modadder}_2}$ Layer 3: delay time is $$\tau_{mux_2} + \tau_{modadder_2}$$ $\tau_{Total} = \tau_{mux_2} + 2\tau_{mux_3} + 3\tau_{modadder_2}$ $= 3\tau_{mux_2} + 3\tau_{modadder_2}$ Design 3 uses three 3-bit multiplexers, three 2-bit multiplexer and three 2-input mod adders with Layer 1: delay time is $\tau_{mux}$ Layer 2 : delay time is $\tau_{mux_2} + \tau_{modadder_2}$ Layer 3: delay time is $\tau_{mux_2} + \tau_{modadder_2}$ $$\begin{split} \tau_{Total} &= 2\tau_{mux_2} + \tau_{mux_3} + 2\tau_{modadder_2} \\ &= 3\tau_{mux_2} + 2\tau_{modadder_2} \end{split}$$ From Table (1), it shows that Design 3 uses less hardware and the fastest | Design | | Hardware count | | | | |--------|----------------------------------------------------|------------------|------------------|-------------|-------------| | # | Time Delay | Mux <sub>2</sub> | Mux <sub>3</sub> | Mod<br>add₂ | Mod<br>add₃ | | 1 | $4\tau_{\text{mux}_2} + 3\tau_{\text{modadder}_2}$ | 9 | 0 | 4 | 0 | | 2 | $3\tau_{\text{mux}_2} + 3\tau_{\text{modadder}_2}$ | 1 | 4 | 4 | 1 | | 3 | $3\tau_{\text{mux}_2} + 2\tau_{\text{modadder}_2}$ | 3 | 3 | 3 | 0 | **TABLE 1:** shows comparison between the three designs implementation for n = 10 #### 6. COMPARISON SELECTION TO PERVIOUS WORK This Novel method has hardware advantages than any competitive converters. In 1984, Alia and Martinelli [3] published binary to RNS conversion design based on power 2 mod $m_i$ . The design uses processing elements (PE) and each PE is associated with two registers. Each of these registers is serially loaded with $\left|2^i\right|_m$ and $\left|2^{i+1}\right|_m$ respectively and it enabled to put their content or zeros' out depending on value 1 or 0 of $b_j$ and $b_{j+1}$ respectively. The two outputs are added in a modular adder. Thus, at first level, n/2 PEs are required. The number of stages in this method is $[\log_2 n]$ and after successive transformation and addition, the residue result is available. Cappocelli and Giancarlo [4] suggested the use of t PEs where $t = n/\log_2 n$ , each PE computing the residue corresponding to k- bit binary word where $k = \log_2 n$ , the residue $2^{kt}$ mod $m_i$ is serially fed to $\hat{k}$ th PE ( $\hat{k}=0,\ 1,\ 2,\ ...,\ t-1$ ), Based on these initial residues, the residues corresponding to the next (k-1) powers are computed by first doubling and then weighting according to the input bits in each PE. The partial residues of k-bit words computed over parallel t PEs are then added to yield the final residue. Anandmohan [5] has proposed a similar method but with a difference that X is divided into t sections based on the cyclic property of $2^j$ mod $m_i$ . Using the fact that, $2^{j}$ , $2^{j+lo}$ and $2^{j+2lo}$ have the same residues due to periodicity of period lo, lo bits are first added. The width of the result is confined to lo bits by adding the carry bit resulting from previous addition to LSB of the result. The residue results is then determined by using methods given in [3] # 7. CONCLUSIONS In this paper we presented a new novel binary to Residue conversion method that eliminates the need for processing elements (PE) as the above competitive converter designs and doesn't use table lookup as in Behrooa Parhami [6]. The new method that we present here is based on multiplexers concept which makes it practical and suitable for VLSI implementation # 8. REFERENCES - [1] M.A. Bayoumi, "Digital filter VLSI systolic arrays over finite fields for DSP applications," in Proc. 6th IEEE annual Phoenix Conf. Computers and Communications, pp 194-199, Feb 1987. - [2] M. A. Soderstrand et al., Eds., "Residue Number System Arithmetic: Modern Applications in Digital Signal Processing" New York: IEEE Press, 1986 - [3] K. Konstantinides and V. Bhaskaran, "Monolithic architectures for image processing and compression," IEEE Computer Graphics & Applications, pp. 75-86, Nov. 1992 - [4] G. Alia and E. Martinelli, "A VLSI algorithm for direct and reverse conversion from weighted binary number to residue number system," IEEE Trans. Circuits Syst., vol. 31, pp. 1425–1431, Dec. 1984. - [5] R. M. Capocelli and R. Giancarlo, "Efficient VLSI networks for converting an integer from binary system to residue number system and vice versa," IEEE Trans. Circuits Syst., vol. 35, pp. 1425–1431, Nov. 1988. - [6] A. Mohan, "Novel design for binary to RNS converters," in Proc. Int.Symp. Circuits and Systems, London, U.K., 1994, pp. 357–360. - [7] Behrooa Parhami, "Optimal Table-Lookup Schemes for Binary-to-Residue and Residueto-Binary Conversions," IEEE Trans, Circuits Syst., 1993 - [8] Mohamed. Akkal and Pepe Siy, "A new Mixed Radix Conversion algorithm", Journal of Systems Architecture, Volume 5, lusse 9, September 2007, Pages 577-586 - [9] N. S. Szabo and R. I. Tanaka, "Residue Arithmetic and Its Applications to Computer Technology". New York: McGraw Hill, 1967.