**1. Introduction**

26 Will-be-set-by-IN-TECH

92 VLSI Design

Deutschmann B. & Ostermann T. (2003). Cmos output drivers with reduced ground bounce

Svensson C. & Yuan J. (1991). High speed cmos chip to chip communication circuit, *IEEE International Sympoisum on Circuits and Systems*, vol. 4, pp. 2228 - 2231, June 1991. Choy C. S. et. al. (1997). A low power-noise output driver with an adaptive characteristic

Shin S. K. et. al. (2005). A slew rate-controlled output driver having a constant transition

Balatsos A. (1998), Clock buffer ic with dynamic impedance matching and skew

Carusone A. et. al. (2001). Differential signaling with a reduced number of signal paths, *IEEE Transactions on Circuits, and Systems II*, vol. 48, no. 3, pp. 294 - 300, Mar. 2001. Wang T. & Yuan F. (2007), A new current-mode incremental signaling scheme with

Young B. (2001). *Digital Signal Integrity Modeling and Simulation with Interconnects and Packages*,

Koo K. et. al. (2006). A versatile i/o with robust impedance calibration for various memory

Muljono H. et. al. (2003). A 400-mt/s 6.4-gb/s multiprocessor bus interface, *IEEE Journal of*

Fan Y. & Smith J. (2003). On-die termination resistors with analog impedance control for

Tae-Hyoung K. et. al. (2005). A 1.2v multi gb/s/pin memory interface circuits with high

Munshi A. et. al. (1994). Adaptive impedance matching, *IEEE International Symposium on*

Carusone A. & Johns D. (2000). Analog adaptive filters: Past and present, *IEE Proceedings*

Zadeh, L. (1965). *Fuzzy sets*, Information and Control, page numbers (338-353), Academic

Arroyo-Huerta, E.; Díaz Méndez, A.; Ramírez-Cortés, J.M. & Sánchez-García, J.C. (2009). An

Thierauf S.(2004). A High-Speed Circuit Board Signal Integrity, *Artech House, Inc.*, 2004 E. Lopez-Delgadillo, J.A. Diaz-Mendez, M.A. Garcia-Andrade, M.E. Magana, F. Maloberti

*Circuits, Devices and Systems*, vol. 174, no. 1, pp.82 - 90, Feb. 2000.

Brooks D. (2003), *Signal Integrity Issues and Printed Circuit Board Design*, Prentice Hall PTR. Koo K. et. al. (2001). A new impedance control circuit for usb2.0 transceiver, *Proceedings of 27th*

*European Solid State Circuits Conference*, pp. 237 - 240, Sept. 2001.

*Solid-State Circuits*, vol. 38, no. 11, pp. 1846 - 1856, Nov. 2003.

compensation, *Masters of Applied Science Thesis*, Toronto, Canada.

*Conference, 2003*, ESSCIRC 03, pp. 537 - 540.

*Custom Integrated Circuits Conference*, pp. 231 - 234.

*Papers*, vol. 54, no. 2, pp. 255 - 267, Feb. 2007.

Prentice Hall PTR.

2006.

Press.

2009.

August 2009.

364, Feb. 2003.

vol. 2, pp. 1847 - 1850, May 2005.

*Circuits and Systems*, vol. 2, pp. 69 - 72.

vol. 32, no. 6, pp. 913 - 917, June 1997.

and electromagnetic emission, *Proceedings of the 29th European Solid - State Circuits*

applicable to a wide range of loading conditions, *IEEE Journal of Solid-State Circuits*,

time over the variations of process, voltage and temperature, *Proceedings of the IEEE*

applications to gb/s parallel links, *IEEE Transactions on Circuits and Systems I: Regular*

interfaces, *IEEE International Symposium on Circuits and Systems*, pp. 1003 - 1006, May

standard cmos technology, *IEEE Journal of Solid-State Circuits*, vol. 38, no. 2, pp. 361 -

linearity and low mismatch, *IEEE International Sympoisum on Circuits and Systems*,

adaptive impedance matching approach based on fuzzy control, *52nd IEEE Midwest Symposium on Circuits and Systems*, ISBN: 978-1-4244-4479-3, Cancún, México, August

(2009). A Self Tuning System for On-Die Terminators in Current Mode Off-Chip Signaling, *52nd IEEE Midwest Symposium on Circuits and Systems*, Cancún, México, Although sorting networks have extensively been reported in literature (Batcher, 1962), there are a few references that cover a detailed explanation about their VLSI (Very Large Scale of Integration) realization in CMOS (Complementary Metal-Oxide-Semiconductor) technology (Turan et al., 2003). From an algorithmic point of view, a sorting network is defined as a sequence of compare and interchange operations depending only on the number of elements to be sorted. From a hardware perspective, sorting networks can be visualized as combinatorial circuits where a set of denoted compare-swap (CS) circuits can be connected in accordance to a specific network topology (Knuth, 1997). In this chapter, the design of sorting networks in CMOS technology with applicability to VLSI design is approached at block, transistor, and layout levels. Special attention has been placed to show the hierarchical structure observed in sorting schemes where the so called CS circuit constitutes the fundamental standard cell. The CS circuit is characterized through SPICE simulation making a particular emphasis in the silicon area and delay time parameters. In order to illustrate the inclusion of sorting networks into specific applications, like signal processing and nonlinear function evaluation, two already reported examples of integrated circuit designs are provided (Agustin et al., 2011; Jimenez et al., 2011).

### **2. Compare-swap block design in CMOS technology**

In an algorithmic context, the CS element is conceived as an ideal operator which is free of the inherent delay time presented when a signal propagates through it. It can be seen as a trivial two-input/two-output component with a general two number sorting capability. Also, it is considered that the CS element works taking in two numbers and, simultaneously, placing the minimum of them at the bottom output, and the maximum at the top output by performing a swap, if necessary (Pursley, 2008). Figure 1 shows the typical Knuth diagram for a CS operator. In this pictorial representation, at the input, the horizontal lines describe

<sup>\*</sup> Ana D. Martínez*1*, Joel Ramírez*1*, Jesús S. Orea*1*, Omar Alba*2*, Pedro Julián3, Juan A. Rodríguez3, Osvaldo Agamennoni3 and Omar D. Lifschitz3

*<sup>1</sup>Universidad Veracruzana/Facultad de Instrumentación Electrónica, México* 

*<sup>2</sup>Instituto Tecnológico Superior de Xalapa/ Departamento de Electrónica, México* 

*<sup>3</sup>Universidad Nacional del Sur/ Departamento de Ingeniería Eléctrica y de Computadoras, Argentina* 

VLSI Design of Sorting Networks in CMOS Technology 95

Leblebici, 2003), are used. Figure 3, shows the transistor level schematic for the one-bit fulladder and for the one-bit multiplexer. The full-adder is composed by 24 MOS transistors, topologically connected in a CMOS configuration, where 12 PMOS transistors (M1…M12) belong to the pull up network and 12 NMOS transistors (M13…M24) are associated to the pull down network. The multiplexor is integrated by two transmission gates (composed by transistors M25-M28) and one NOT-gate with two inputs (In0, In1), one selector (Sw), and one output (Out). The multiplexer output depends only of the Cout of the full-adder, since Cout is assigned to the selector that will activate one pair of transistors. If the selector is low ("0" logic), the transmission gate at the top (integrated by M25 and M26) switches ON, so In0 becomes the output; otherwise the transmission gate at the bottom (composed by M27

Fig. 3. Transistor level schematic of the one-bit full-adder (on left) and one-bit multiplexer

Masks of the CMOS full-adder and multiplexer circuits using minimum size transistor are depicted in Fig. 4 and Fig. 5. It is important to point out that the (W/L) ratio for all the NMOS and PMOS transistors in this layout were computed to optimize the transient performance of the circuit, specifically a balance between the high-to-low and low-to-high propagation times. For PMOS transistors a (10λ/2λ) ratio was considered while a (6λ/2λ) ratio was used to NMOS transistors. Since a 0.5 microns process technology is included, the physical dimension of lambda for this design technology is λ=0.35 microns. The well-known layout style based on a "line of diffusion" rule that is commonly used for standard cells in automated layout systems (Weste & Eshraghian, 1993) is employed in this layout. In this style, four horizontal strips can be identified: a metal ground at the bottom of the cell (GND or VSS), n-diffusion for all the NMOS transistors, a n-well with a corresponding p-diffusion for all the PMOS transistors, and a metal power at the top (VDD). A set of vertical lines of poly-silicon are also used to connect the transistor gates while within the cell metal layers

**2.2 Design at layout level for a compare-swap standard cell** 

connect the transistors in accordance with a schematic diagram.

and M28) is ON, hence In1 is the output.

(on right)

the two numbers to be sorted (A and B) and, at the output, max(A,B) and min(A,B) denote the maximum and minimum numbers, respectively. In turn, the vertical connector line represents the element dedicated to compare and interchange (swap) data.

Fig. 1. Knuth diagram for a compare-swap element

However, this is only a theoretical viewpoint, because when the CS element is carried out to a level of silicon realization it is affected by parasitic elements, presenting a different time delay for each output. Due to the fact that in a sorting network the main structural element is the CS circuit, a special attention is given to describe in detail its internal design. In this section, the CS circuit design is covered at schematic transistor and at layout levels; furthermore, the area and delay time are estimated by considering a given 0.5 microns process technology.

#### **2.1 Design at transistor level for a compare-swap standard cell**

The CS element is a combinatorial circuit that accepts as input two binary signals (numbers), compares their magnitude, and outputs the maximum in the max(A,B) bus line, whereas the minimum is output in the min(A,B) bus line. This block is integrated by one full-adder and two multiplexers, as shown in Fig. 2.

Fig. 2. Block level diagram for the CS circuit

Notice that due to one input is complemented, the full-adder is in fact configured as a subtractor. The most significant bit resulting from the subtraction, carry out (Cout), is used to make the selection in the multiplexer. If a greater number is subtracted from a lesser one, then the result is a negative number what, in binary terms, can be identified because the generated Cout will be in high ("1" logic). When the Cout signal is in high state, a swap data will be performed; otherwise, the input data will not be interchanged.

For translating the diagram of the CS block, in Fig. 2, to a transistor level circuit description, the two well known standard cells for the full-adder and for the multiplexer (Kang &

the two numbers to be sorted (A and B) and, at the output, max(A,B) and min(A,B) denote the maximum and minimum numbers, respectively. In turn, the vertical connector line

However, this is only a theoretical viewpoint, because when the CS element is carried out to a level of silicon realization it is affected by parasitic elements, presenting a different time delay for each output. Due to the fact that in a sorting network the main structural element is the CS circuit, a special attention is given to describe in detail its internal design. In this section, the CS circuit design is covered at schematic transistor and at layout levels; furthermore, the area and delay time are estimated by considering a given 0.5 microns

The CS element is a combinatorial circuit that accepts as input two binary signals (numbers), compares their magnitude, and outputs the maximum in the max(A,B) bus line, whereas the minimum is output in the min(A,B) bus line. This block is integrated by one full-adder and

Notice that due to one input is complemented, the full-adder is in fact configured as a subtractor. The most significant bit resulting from the subtraction, carry out (Cout), is used to make the selection in the multiplexer. If a greater number is subtracted from a lesser one, then the result is a negative number what, in binary terms, can be identified because the generated Cout will be in high ("1" logic). When the Cout signal is in high state, a swap data

For translating the diagram of the CS block, in Fig. 2, to a transistor level circuit description, the two well known standard cells for the full-adder and for the multiplexer (Kang &

will be performed; otherwise, the input data will not be interchanged.

represents the element dedicated to compare and interchange (swap) data.

**2.1 Design at transistor level for a compare-swap standard cell** 

Fig. 1. Knuth diagram for a compare-swap element

process technology.

two multiplexers, as shown in Fig. 2.

Fig. 2. Block level diagram for the CS circuit

Leblebici, 2003), are used. Figure 3, shows the transistor level schematic for the one-bit fulladder and for the one-bit multiplexer. The full-adder is composed by 24 MOS transistors, topologically connected in a CMOS configuration, where 12 PMOS transistors (M1…M12) belong to the pull up network and 12 NMOS transistors (M13…M24) are associated to the pull down network. The multiplexor is integrated by two transmission gates (composed by transistors M25-M28) and one NOT-gate with two inputs (In0, In1), one selector (Sw), and one output (Out). The multiplexer output depends only of the Cout of the full-adder, since Cout is assigned to the selector that will activate one pair of transistors. If the selector is low ("0" logic), the transmission gate at the top (integrated by M25 and M26) switches ON, so In0 becomes the output; otherwise the transmission gate at the bottom (composed by M27 and M28) is ON, hence In1 is the output.

Fig. 3. Transistor level schematic of the one-bit full-adder (on left) and one-bit multiplexer (on right)

#### **2.2 Design at layout level for a compare-swap standard cell**

Masks of the CMOS full-adder and multiplexer circuits using minimum size transistor are depicted in Fig. 4 and Fig. 5. It is important to point out that the (W/L) ratio for all the NMOS and PMOS transistors in this layout were computed to optimize the transient performance of the circuit, specifically a balance between the high-to-low and low-to-high propagation times. For PMOS transistors a (10λ/2λ) ratio was considered while a (6λ/2λ) ratio was used to NMOS transistors. Since a 0.5 microns process technology is included, the physical dimension of lambda for this design technology is λ=0.35 microns. The well-known layout style based on a "line of diffusion" rule that is commonly used for standard cells in automated layout systems (Weste & Eshraghian, 1993) is employed in this layout. In this style, four horizontal strips can be identified: a metal ground at the bottom of the cell (GND or VSS), n-diffusion for all the NMOS transistors, a n-well with a corresponding p-diffusion for all the PMOS transistors, and a metal power at the top (VDD). A set of vertical lines of poly-silicon are also used to connect the transistor gates while within the cell metal layers connect the transistors in accordance with a schematic diagram.

VLSI Design of Sorting Networks in CMOS Technology 97

time (difference between input transition at 50% and the 50% output level). The simulated output voltage obtained for the one-bit CS circuit is shown in Fig. 6. In this simulation, the voltage supply of 5V (VDD) and the overall frequency of 5MHz are considered. Also, the simplest representation 0 or 1 will be hereafter used instead of the "1" logic or the "0" logic notations. After running the SPICE simulation, it can be observed the outputs MAX(A,B)={0,1,1,1} and MIN(A,B)={0,0,0,1} when the inputs A and B are given by A={0,0,1,1} and B={0,1,0,1}. It is important to notice that the signal CARRY\_OUT (Cout) is

only in high when A=0 and B=1 (the unique case where a swap is needed).

Fig. 6. Simulated output voltage obtained for the one-bit CS circuit

Fig. 7. Worst-case delay time for the one-bit CS circuit

Fig. 5. Mask layout of the one-bit multiplexer circuit
