**3.4 Spice simulations for delay time estimation and electrical verification**

The netlist including parasitic capacitances from the layout of Fig.18 is simulated to verify the circuit operation. The voltage waveforms for the median datum are shown in Fig. 19. The signals S0, S1, ..., and S7, correspond to the 8-bits median datum of the set of inputs {D0, D1, ..., D9} given by: D1=00101011 (21210), D2=01000011(19410), D3=00100010 (6910), D4=00001101 (17610), D5=00011100 (5610), D6=00011110 (12010), D7=00010111 (23210), D8=00001111(24010), and D9=00010101 (16810).

In this simulation, the median for the input described in base-10 { 21210, 19410, 6910, 17610, 5610, 12010, 23210, 24010, 16810 } is the datum 17610 ,which in a binary base is represented as

VLSI Design of Sorting Networks in CMOS Technology 107

Given an *n*-dimensional domain divided into a simplicial partition with a regular grid, a *n*-

0

*i F X μ c* =

where *X*= {*x*1... *x*n} is the point in the *n*-dimensional domain where the function is evaluated, *µi* are scaling parameters depending on *X*, and *ci* are the values of the function at simplex vertices, *i*=0,...,*n*. In order to compute the PWL equation *ci* and *µi* are required. Usually the *ci* parameters are stored in a RAM while *µi* need to be computed. For this operation, the algorihmic procedure defined in (Agustin et al., 2011) is followed what involves the descomposition of the *xi* componentens into integer and fractional parts, sorting of the fractional parts and performing a sucesive subtraction of the sorted

Micro-architecture of an ASIP strongly depends on its target application. A successful ASIP provides the required hardware to solve the target set of computational problems in an optimized way in terms of execution time, power, or chip resources, while maintaining the exibility and programmability characteristics of a general purpose microprocessor. A trade-off exists among optimization levels and exibility levels; thus, an ASIP can be considered as an intermediate point between a general purpose microprocessor and an Application Specific Integrated Circuit (ASIC). Three main architectural blocks of the PWLR6-µP, namely, Data Path, I/O, and Control, were designed taking into account the special operations required to perform the PWL calculation. The result was a nearly basic microprocessor with special features that accelerate the PWL computation. In this seccion, the sorting step of the algorithm and its relationship with the resources provieded by the PWLR6-up is addressed. For further details about the rest of the architecture and its special resources for PWL function evaluation, the reader is referred to (Agustin et al., 2011). Sorting constitutes the second part of the PWL function evaluation algorithm (as it was mentioned, it is required to evaluate the *µi* parameters). In this implementation, the 6 fractional parts are sorted following the comparison sequence of the so called Bose–Nelson sorting network (Knuth, 1997). However, in order to maintain the microprocessor structure and to avoid the area overhead of 12 CS blocks, only one comparator (the one provided by the ALU) was used. Consequently, the Bose-Nelson sorting network is embedded in the

The hardware resouces provided by the PWLR6-µp architecture are showed in Fig. 20 and they are used during the sorting step as follows: firstly the RF (register le), which is composed by six registers, stores the fractional parts to be sorted. Then, the two bidirectional ports, Port A, which is connected to Register A, and port B connected to Register B, transfer data between them and RF. After that, compare operation is performed from these registers and depending on the result, Register A and Register B values may be

*i i*

<sup>=</sup> ∑ (2)

*n*

( )

**4.2 Micro-architecture for the PWL evaluation through a sorting scheme** 

PWLR6-µp by combining an apropiate harware-software design.

written back into the RF by switching sources and destinations.

**4.1 The sorting networks for nonlinear function evaluation** 

fractional parts.

dimensional PWL function can be expressed as the weighted sum:

D4=0000110110. It is important to clarify that the output median datum will be enabled until the acknowledge signal (ACK) is high and the CLK low.

Fig. 19. Simulated output waveforms of the nine data sorting network

#### **4. Piecewise linear function computation using sorting network**

As a second example of the application of sorting networks in dedicated VLSI systems, an Application Specific Instruction Processor (ASIP) for piecewise linear function evaluation is described. Piecewise linear (PWL) functions allow the aproximation of multidimensional nonlinear functions or models in a convenient way to be evaluated with computing systems (Julian et al., 1999). The simplicity of the representation and evaluation methods, combined with the scalability in terms of the number of dimensions, impulsed the adoption of PWL functions as the modelling abstraction in a broad spectrum of systems. The first and most traditional area has been the computationally efficient resolution of nonlinear circuits (Chua & Ying, 1983) that require high performace function evaluation in terms of speed, and more recently in communication systems (Kaddoum et al., 2007) and power electronics (Pejovic & Maksimovic, 1995) for nonlinear operations involved in predistortion (Hammi et al., 2005). Motivated by these applications, the ASIP for piecewise linear function evaluation, hereafter denoted as PWLR6-µp, was designed and implemented in CMOS technology to provide a flexible environment for computation of 6-dimentional PWL functions and, due to the fact that PWL evaluation algorithm requires a sorting procedure, sorting networks have been embedded in this design as it will be expossed in this section.

D4=0000110110. It is important to clarify that the output median datum will be enabled until

the acknowledge signal (ACK) is high and the CLK low.

Fig. 19. Simulated output waveforms of the nine data sorting network

embedded in this design as it will be expossed in this section.

**4. Piecewise linear function computation using sorting network**

As a second example of the application of sorting networks in dedicated VLSI systems, an Application Specific Instruction Processor (ASIP) for piecewise linear function evaluation is described. Piecewise linear (PWL) functions allow the aproximation of multidimensional nonlinear functions or models in a convenient way to be evaluated with computing systems (Julian et al., 1999). The simplicity of the representation and evaluation methods, combined with the scalability in terms of the number of dimensions, impulsed the adoption of PWL functions as the modelling abstraction in a broad spectrum of systems. The first and most traditional area has been the computationally efficient resolution of nonlinear circuits (Chua & Ying, 1983) that require high performace function evaluation in terms of speed, and more recently in communication systems (Kaddoum et al., 2007) and power electronics (Pejovic & Maksimovic, 1995) for nonlinear operations involved in predistortion (Hammi et al., 2005). Motivated by these applications, the ASIP for piecewise linear function evaluation, hereafter denoted as PWLR6-µp, was designed and implemented in CMOS technology to provide a flexible environment for computation of 6-dimentional PWL functions and, due to the fact that PWL evaluation algorithm requires a sorting procedure, sorting networks have been

#### **4.1 The sorting networks for nonlinear function evaluation**

Given an *n*-dimensional domain divided into a simplicial partition with a regular grid, a *n*dimensional PWL function can be expressed as the weighted sum:

$$F(X) = \sum\_{i=0}^{n} \mu\_i c\_i \tag{2}$$

where *X*= {*x*1... *x*n} is the point in the *n*-dimensional domain where the function is evaluated, *µi* are scaling parameters depending on *X*, and *ci* are the values of the function at simplex vertices, *i*=0,...,*n*. In order to compute the PWL equation *ci* and *µi* are required. Usually the *ci* parameters are stored in a RAM while *µi* need to be computed. For this operation, the algorihmic procedure defined in (Agustin et al., 2011) is followed what involves the descomposition of the *xi* componentens into integer and fractional parts, sorting of the fractional parts and performing a sucesive subtraction of the sorted fractional parts.

#### **4.2 Micro-architecture for the PWL evaluation through a sorting scheme**

Micro-architecture of an ASIP strongly depends on its target application. A successful ASIP provides the required hardware to solve the target set of computational problems in an optimized way in terms of execution time, power, or chip resources, while maintaining the exibility and programmability characteristics of a general purpose microprocessor. A trade-off exists among optimization levels and exibility levels; thus, an ASIP can be considered as an intermediate point between a general purpose microprocessor and an Application Specific Integrated Circuit (ASIC). Three main architectural blocks of the PWLR6-µP, namely, Data Path, I/O, and Control, were designed taking into account the special operations required to perform the PWL calculation. The result was a nearly basic microprocessor with special features that accelerate the PWL computation. In this seccion, the sorting step of the algorithm and its relationship with the resources provieded by the PWLR6-up is addressed. For further details about the rest of the architecture and its special resources for PWL function evaluation, the reader is referred to (Agustin et al., 2011). Sorting constitutes the second part of the PWL function evaluation algorithm (as it was mentioned, it is required to evaluate the *µi* parameters). In this implementation, the 6 fractional parts are sorted following the comparison sequence of the so called Bose–Nelson sorting network (Knuth, 1997). However, in order to maintain the microprocessor structure and to avoid the area overhead of 12 CS blocks, only one comparator (the one provided by the ALU) was used. Consequently, the Bose-Nelson sorting network is embedded in the PWLR6-µp by combining an apropiate harware-software design.

The hardware resouces provided by the PWLR6-µp architecture are showed in Fig. 20 and they are used during the sorting step as follows: firstly the RF (register le), which is composed by six registers, stores the fractional parts to be sorted. Then, the two bidirectional ports, Port A, which is connected to Register A, and port B connected to Register B, transfer data between them and RF. After that, compare operation is performed from these registers and depending on the result, Register A and Register B values may be written back into the RF by switching sources and destinations.

VLSI Design of Sorting Networks in CMOS Technology 109

This work has been partially supported by Universidad Veracruzana and by the CB-SEP-CONACyT Project No.102669 of Instituto Tecnológico Superior de Xalapa, México. Some partial research results from the PROMEP/103.5/09/4482 project of Universidad Veracruzana, Mexico, and PICT 2003 No. 13468 of Universidad Nacional del Sur, Argentina,

Agustin Rodriguez J., Lifschitz Omar D., Jimenez-Fernandez Victor M., Julian Pedro. (2011).

Batcher K. E.*.* (1962)*.* "Sorting networks and their applications", *Proceedings of AFIPS Spring* 

Chua L. & R. Ying. (1983). "Canonical piecewise-linear analysis," *IEEE Transactions on* 

Faundez Zanuy M. (2001). *"Digital voice and image processing with multimedia application"*, First Edition, Alfaomega-Marcombo, 97-88-42671244-8, Barcelona, Spain Hammi O., S. Boumaiza, M. Jaidane-Saidane, and F. Ghannouchi. (2005). "Digital subband

*Microwave Theory Tech.*, Vol. 53, No. 5, May 2005 , pp. 1643–1652, 0018-9480 Jimenez F. Victor, Martinez-Navarrete Denisse, Ventura-Arizmendi Carlos, Hernandez-

*Circuits Systems*, Vol. 30, No. 3, Mar 1983, pp. 125–140, 0098-4094

*Joint Computer Conference,* ISBN: n.d., April 1962

6 ,Toulouse, France, Aug. 2007

1997. 0-201-89685-0, Massachussets, USA

Pursley Bryan, (2008), Sorting Networks, *n.d,* Aug 09,2011,

Mc Graw Hill, 0-07-246053-9, New York, N.Y.

*Electronics,* Vol.10 , No. 3, May 1995, pp. 340-348, 0885-8993

"Application Specic Processor for Piecewise Linear Functions Computation". *IEEE Transactions on circuits and systems*, Vol. 58, No. 5, May 2011, pp. 971-981, 1459-

ltering predistorter architecture for wireless transmitters," *IEEE Transactions* 

Paxtian Zulma, Ramirez-Rodriguez Joel. (2011) *International Journal of Circuits, Systems and Signal Processing,* Vol. 5, No. 3, Apr. 2011, pp. 297-304, 1998-4464 Julian P., Desages A., and Agamennoni O.,(1999), "High-level canonical piecewise linear

representation using a simplicial partition," *IEEE Transactions on Circuits and Systems-I, Fundam. Theory Appl*., Vol. 46, No. 4, Apr. 1999 , pp. 463–480, 1057-7122 Kaddoum, G.; Roviras, D.; Charge, P.; Fournier-Prunaret, D. (2007). "Analytical calculation

of BER in communication systems using a piecewise linear chaotic map". *Circuit Theory and Design, 2007. ECCTD 2007. 18th European Conference on .* 978-1-4244-1341-

systems using piecewise-linear device models, *IEEE Transactions on Power* 

*Perspective",* Second Edition., Prentice Hall Electronics and VLSI Series, Upper

Knuth E. Donald. (1997).*"The Art of Computer Programming,*" Third Edition. Addison-Wesley,

Kang S. M. and Yusuf Leblebici, (2003). *"CMOS Digital Integrated Circuits,"* Third Edition*,*

Pejovic, P. Maksimovic, D. ( 2002). A new algorithm for simulation of power electronic

URL: http:// brianpursley.com/Files/CSC204\_FinalProject\_BrianPursley.pdf Rabaey J. M., Chandrakasan A. P., and Nikolic B. (2003). "*Digital Integrated Circuits: A Design* 

Saddle River, NJ: Prentice Hall/Pearson Education, 0-13-0909960-3,n.d.

**6. Acknowledgment**

**7. References** 

8328

have been referred in this chapter.

Fig. 20. Architecture for the six-data sorting network embedded in the PWL microprocessor

#### **5. Conclusion**

In this chapter, sorting networks have been addressed since a physical CMOS realization perspective with applicability to VLSI design. The CS circuit, analyzed at the beginning of this chapter, was introduced as the fundamental cell from which more complex sorting topologies could emerge. It must be pointed that because the speed in the CS design is limited by the delay of the *n*-bits carry out critical path, and by the transmission gates delay, a future research proposal for this work must be aimed to achieve higher overall frequencies. The two provided examples: the median filter architecture and the PWL evaluation scheme, allow to show the inclusion of sorting networks, into these specific applications. In these sense, about these examples the following particular conclusions must be observed: firstly, in the sorting network inmerse in the median filter, the main advantage consists in its regular structure beacuse although it is not optimal in the number of comparisons (21), the execution of several CS elements is done in parallel, and finally, the choice of an embeded sorting strategy in the PWL ASIP was due to the simplicity that allows the PWLR6-µP architecture (compared to other sorting algorithms like bubble sort or quick sort, designed to sort bigger datasets) and because of the small size of the input, this strategy is efficient in terms of hardware resources and code length.
