**5. SRAM power reduction**

wire usage. Clock load placement should be done in such a way that one should get lower clock

Placement and routing (P&R) on the chip also affects the dynamic power consumption because it decides the total parasitic capacitance in the design. To minimize the parasitic capacitance, it is essential to optimize the P&R strategy. It is always advisable to place two connected func‐ tional instances closer because it will reduce the interconnect wire‐length which in turn can reduce the capacitive loading of the net and lead to dynamic power reduction. The modern FPGA development software typically supports power‐driven layout to automatically accom‐ plish this task. Power‐driven layout tools examine connection between functional instances for optimization [68–70]. Power‐analysis tools are used to further optimize the power saving. Power‐analysis tools examine each subcomponent in a design hierarchy to highlight power consumption. Careful examination of this information and subsequent manipulation of the

Reducing the power supply of I/O can save up to 80% dynamic power. The switching activ‐ ity of I/O can be controlled by using techniques like time multiplexing, minimum I/O count design portioning [71–73], and reducing I/O drive strength/slew rates. A considerable amount of dynamic power can be saved by adopting differential I/O standards and resistively termi‐ nated I/O standards for highest toggling frequency and single‐ended I/O standard for low

Tsang et al. [74] have studied the effectiveness of employing precomputation in reducing dynamic power consumption in commercial off‐the‐shelf (COTS) FPGAs. Precomputation is a high‐level logic optimization technique that lowers power consumption of a design by disabling part of the circuit based on a few relatively simple precomputation conditions. With careful design con‐ siderations and increased logic utilization, its associated power consumption can be reduced by

disabling much larger part of the design with negligible increase in resource overhead.

capacitance, which results in lower dynamic power consumption.

228 Field - Programmable Gate Array

**Figure 2.** Proposed new programmable low‐power FPGA routing switch [65].

design can result in significant power savings.

toggling frequency.

The design of low power and high performance SRAM cell becomes a necessity in today's FPGAs because SRAM is a critical component in FPGA design. Although SRAM‐based FPGA acquires larger area on the chip but still one of the most useful SRAM‐based structure is the lookup table (LUT).

SRAM‐based FPGAs such as those manufactured by Xilinx and Altera comprise the largest fraction of the overall market. These FPGAs utilize SRAM for routing and programmability, typically through the use of LUTs and multiplexers. Due to the large number of cells within SRAM FPGA interconnects, a considerable leakage current (of order of milliamps) flows at standby [78]. However, leakage current increases as process geometry shrinks which further exacerbates the power problem. The dynamic power consumption in cell is a serious threat because of large parasitic capacitance (due to longer metallic bitline) which results in larger charging/discharging activity at the bitline. Study on the leakage current and dynamic power in Xilinx Spartan‐3 FPGA [79] (**Figure 3**) and Xilinx Virtex‐4 [80] (**Figure 4**) show that the major contributor for power consumption in FPGA is configurable SRAM; hence, the new design technique becomes essential to increase the lifetime of the battery. Several techniques have been proposed in the literature [81–85] to address the power consumption problem in SRAM cell. It is worth to disable the SRAM devices that are temporarily unused. This technique will avoid the power consumption by unused components. A system control‐ ler can deactivate the device when it is not required in the current operation, or put the device in its sleep mode when that device will not be accessed for an extended period of time. Implementing such a system controller in FPGA reduces the overall switching activity of the system. As discussed by Tuan et al. [86], the data of the configurable SRAM cell alter only when FPGA is configured. FPGA is configured only when power supply is turned on. Therefore, it is necessary to control the leakage current in the cell during idle phase to save the overall power.

Wang et al. [87] have proposed the design of an ultra‐low voltage 9T SRAM cell. Their designed cell consists of a 6T SRAM part (for write operation) and a dedicated read port. The read port comprises three NMOS transistors for realizing equalized bitline leakage and improving bitline sensing margin in a single‐ended read bitline (RBL). The write access paths and the data storage latch are implemented with HVT devices for leakage reduction while the read port employs LVT devices for better performance. Their test chip shows an improvement of 40% in energy efficiency with the minimum energy per operation of 2.07 pJ at 0.4 V. This design increases the fabrication complexity due to the use of LVT and HVT transistors.

Although much research has been done in order to design a power‐efficient SRAM circuit, still interest in power‐efficient cell design at the architecture level continues to increase due to the occupation of considerable fraction of total area on chip by configurable SRAM cells and circuitry in the FPGA design. Ye et al. [13] have observed that more than 40% of the total FPGA's logic block area is occupied by SRAM cells. Such huge area overhead results in larger wire length, which leads in larger parasitic capacitance at load. This increased capacitance increases the dynamic power consumption. The most widely used and well accepted SRAM cell is 6T cell [88] (as shown in **Figure 5**) due to its symmetric structure and larger data storage capacity. The cell has two cross‐coupled inverters which form latch to keep the programmed data intact. Two pass transistors are used to transfer the data from bitline to cell node (write operation) or cell node to bitline (read operation). The actual control of the FPGA is handled by the Q and Qbar outputs. The main drawbacks of the conventional 6T cell are: poor stability, large power consumption, and degraded performance.

**Figure 3.** Leakage power breakdown in Xilinx Spartan [79].

**Figure 4.** Dynamic power breakdown in Xilinx Virtex‐4 [80].

**Figure 5.** Architecture of the conventional 6T SRAM cell [88].

#### **5.1. Subthreshold SRAM cell**

to the occupation of considerable fraction of total area on chip by configurable SRAM cells and circuitry in the FPGA design. Ye et al. [13] have observed that more than 40% of the total FPGA's logic block area is occupied by SRAM cells. Such huge area overhead results in larger wire length, which leads in larger parasitic capacitance at load. This increased capacitance increases the dynamic power consumption. The most widely used and well accepted SRAM cell is 6T cell [88] (as shown in **Figure 5**) due to its symmetric structure and larger data storage capacity. The cell has two cross‐coupled inverters which form latch to keep the programmed data intact. Two pass transistors are used to transfer the data from bitline to cell node (write operation) or cell node to bitline (read operation). The actual control of the FPGA is handled by the Q and Qbar outputs. The main drawbacks of the conventional 6T cell are: poor stability, large power consumption, and degraded

performance.

230 Field - Programmable Gate Array

**Figure 3.** Leakage power breakdown in Xilinx Spartan [79].

**Figure 4.** Dynamic power breakdown in Xilinx Virtex‐4 [80].

Subthreshold operation is achieved when the device is allowed to operate at power supply (*V*dd) lower than its threshold voltage. Using this concept, researchers [89–94] have proposed the subthreshold SRAM cells to reduce the overall power consumption in the cell. Teman et al. [95] have designed a robust, low‐voltage SRAM bit cell with reduced 5 transistors com‐ pared to the standard 6T circuit. Their designed cell can operate at voltage as low as 400 mV in a commercial 40 nm CMOS process. At this supply voltage, the proposed bit cell provides 6*σ* stability and an average static power reduction of 21× compared to the 6T cell. The main drawback of the circuit is its extra processing complexity due to HVT and SVT transistors.

Calhoun et al. [90] have proposed 10T subthreshold bit cell (**Figure 6**). Transistors M1 through M6 forms conventional 6T cell except that the source of M3 and M6 tie to a virtual supply volt‐ age rail (VVDD). The proposed cell has distinct read and write ports to improve the stability of the cell. Eliminating the read SNM problem allows this bitcell to operate at half of the *V*dd of a 6T cell while retaining the same 6*σ* stability. Transistors M7–M10 are used to remove the read SNM problem by buffering the stored data during read operation. M10 is mainly included in the cell to control the leakage current. Their experimental results show that the proposed cell saves 2.5× and 3.8× leakage power at *V*dd = 0.6 V and *V*dd = 0.4 V at room tem‐ perature. This saving is more aggressive (60×) when power supply is scaled down to 0.3 V.

A design of 10T SRAM is proposed by Jiangzheng et al. [96] by employing voltage lowering techniques to effectively control the leakage current in the cell after allowing cell to operate in subthreshold region. The proposed circuit generates a subthreshold read pulse for trans‐ ferring the data out of the SRAM. The floating write bitlines minimizes write bitline leakage on the cost of degraded stability. Short read bitlines improve read speed and suppress read power on the cost of area overhead.

**Figure 6.** Architecture of 10‐T subthreshold bitcell [90].

Kushwah et al. [97] have proposed a single‐ended dynamic feedback control 8T static RAM (SRAM) cell to enhance the static noise margin (SNM) for ultralow power supply. It achieves write SNM of 1.4× and 1.28× as that of isoarea 6T and read‐decoupled 8T (RD‐8T), respectively at 300 mV. The standard deviation of write SNM for 8T cell is reduced to 0.4× and 0.56× as that for 6T and RD‐8T, respectively. The proposed 8T consumes about 0.6× less write power and 0.48× less read power than 6T cell.

#### **5.2. Data‐aware power‐efficient SRAM cell**

The main drawbacks of subthreshold cells are poor stability and degraded performance. Besides the cell leakage, the bitline leakage is another dominating factor for power consumption. The overall bitline power consumption is data dependent. Many data‐aware cells have been reported in the literature to control the bitline power consumption [98–102]. Chiu et al. [103] have pro‐ posed 8T single‐ended subthreshold SRAM with cross‐point data‐aware write operation. In the circuit write operation is performed by traditional write circuit as in 6T cell, whereas 2T stacked read buffer is used for read operation. Due to stack read circuit, leakage current is controlled and stability is improved. The data‐aware cross‐point write operation improves the writeability. The main drawback of the circuit is large voltage swing on bitline during write operation.

A 130 mV SRAM with expanded write and read margins for subthreshold applications was pro‐ posed by Chang et al. [104] to reduce the voltage swing on the respective bitlines during write operation. They have used two separate signals SCR and SCL to perform write operation. The proper selected value of these two signals controls the write power consumption after reducing the discharging activity at the bitline. The isolated read circuit improves the stability of the cell on the cost of large parasitic capacitance and resource burden due to two extra signals.

Singh et al. [105] have designed a data aware dynamic 9T SRAM cell to reduce the bitline power consumption. The dynamic nature of the cell flips the data faster at the bitline so that the average discharging activity is reduced. The cell contains nine transistors with isolated read and writes circuits. The write operation is performed using write signal WS. The value of write signal is chosen based on the write operation. The simulation results predicted the 47% lower write power consumption compared to the 6T. They also observed that power saving varies from 42.45 to 61.3% when no peripheral devices are included in the array during hold mode because of lower leakage current from write bitlines and lower discharging activity at RBL. The cell imposes hardware and wiring burden due to extra signal.

The bit‐interleaving‐enabled 8T SRAM architecture is proposed by Wen et al. [106]. The pro‐ posed cell features shared data‐aware write structure and utterly eliminates the half‐select disturbance. In their proposed design, shared write and separated read behaviors are imple‐ mented by activating horizontal cells and vertical bitlines instead of enabling blocks. They also proposed a reference‐based sense amplifier (SA) to coordinate the column‐selection array to further optimize the area efficiency. The proposed SRAM operates at a frequency of 125 kHz and consumes a total power of 5.1 μW.

#### **5.3. Data‐dependent‐write‐assist dynamic (DDWAD) SRAM cell**

Kushwah et al. [97] have proposed a single‐ended dynamic feedback control 8T static RAM (SRAM) cell to enhance the static noise margin (SNM) for ultralow power supply. It achieves write SNM of 1.4× and 1.28× as that of isoarea 6T and read‐decoupled 8T (RD‐8T), respectively at 300 mV. The standard deviation of write SNM for 8T cell is reduced to 0.4× and 0.56× as that for 6T and RD‐8T, respectively. The proposed 8T consumes about 0.6× less write power and

The main drawbacks of subthreshold cells are poor stability and degraded performance. Besides the cell leakage, the bitline leakage is another dominating factor for power consumption. The overall bitline power consumption is data dependent. Many data‐aware cells have been reported in the literature to control the bitline power consumption [98–102]. Chiu et al. [103] have pro‐ posed 8T single‐ended subthreshold SRAM with cross‐point data‐aware write operation. In the circuit write operation is performed by traditional write circuit as in 6T cell, whereas 2T stacked read buffer is used for read operation. Due to stack read circuit, leakage current is controlled and stability is improved. The data‐aware cross‐point write operation improves the writeability. The main drawback of the circuit is large voltage swing on bitline during write operation.

A 130 mV SRAM with expanded write and read margins for subthreshold applications was pro‐ posed by Chang et al. [104] to reduce the voltage swing on the respective bitlines during write operation. They have used two separate signals SCR and SCL to perform write operation. The proper selected value of these two signals controls the write power consumption after reducing the discharging activity at the bitline. The isolated read circuit improves the stability of the cell

Singh et al. [105] have designed a data aware dynamic 9T SRAM cell to reduce the bitline power consumption. The dynamic nature of the cell flips the data faster at the bitline so that the average discharging activity is reduced. The cell contains nine transistors with isolated read and writes circuits. The write operation is performed using write signal WS. The value of

on the cost of large parasitic capacitance and resource burden due to two extra signals.

0.48× less read power than 6T cell.

232 Field - Programmable Gate Array

**5.2. Data‐aware power‐efficient SRAM cell**

**Figure 6.** Architecture of 10‐T subthreshold bitcell [90].

Recently, we have designed a power‐efficient SRAM cell [107] by utilizing dynamic data aware concept for write operation and stack effect to control the read leakage current. The architecture of the cell is shown in **Figure 7(a)**. The designed cell has distinct read and write ports with sin‐ gle bitline to improve the overall stability of the cell. To flip the data at the storage node faster without waiting bitline BL to charge/discharge completely we have introduced a write signal WS and broken the latch of the cell (since WL = high). To control the leakage current in read circuit during write operation and hold mode, stack technique is (three series connected OFF transistors in read path) used on the cost of increased delay. The write signal (WS) has been generated according to the data to be stored at Q and Qbar with the help of circuit as shown in **Figure 7(b)** [107]. During read and hold mode, WS maintains its previous value and latch nature of the cell is restored to keep the stored data intact. The proposed cell and other cells were simulated at layout level using Cadence 6.1 CMOS design rules for 65 nm technology. The large write power saving (**Figure 8**) is due to no discharging activity at the bitline BL due to high resistive path (NM1 Turns OFF because WS = 0 (write 1 operation)). Similarly, for WS = high, OFF transistor PM1 does not allow any current to flow between *V*dd and ground. This causes low voltage at the storage node Q. In both write operations, a small voltage drops at BL results in considerable dynamic power saving. Due to OFF transistors NM4 and NM6 (since RWL = 0 during write operation) in the read path, the leakage current through RBL is restricted.

Due to the forbidden discharging of precharged RBL during read 1 operation and stack effect in read path, a considerable power saving is achieved compared to the conventional 6T cell (**Figure 9**). In hold mode, WS maintains its value due to internal latch. The static power con‐ sumption in the proposed cell is lower than the 6T cell and other proposed cells in the litera‐ ture irrespective of the power supply (**Figure 10**). The lower static power in the proposed cell [107] is due to lower leakage current through write bitline BL and stack effect in read circuit. During simulation, we observed that the proposed cell shows a nominal variation in static power consumption with temperature, which reflects the robustness of the cell against temper‐ ature. The data at the storage nodes maintained strongly at their respective values for power supply range of 300 mV ≤ *V*ddmin ≤ 400 mV. The proposed cell shows larger immunity toward the statistical variation due to signal WS as discussed in our published paper in detail [107].

**Figure 7.** (a) Architecture of DDWAD SRM cell. (b) Circuit to generate appropriate WS signal depending on write operation [107].

**Figure 8.** Total power consumption in data aware cell [107].

**Figure 9.** Read power consumption [107].

**Figure 10.** Hold leakage power at various power supplies [107].

**Figure 7.** (a) Architecture of DDWAD SRM cell. (b) Circuit to generate appropriate WS signal depending on write

operation [107].

234 Field - Programmable Gate Array

Although the proposed cell imposes area overhead compared to the conventional 6T cell, it is not a serious threat in FPGA implementation because of lower leakage current through bitline, more number of cells can be connected on a single bitline in the array.

In SRAM‐based FPGA memory accesses are performed with a designed clock and series of inter‐ face circuits like row/column decoder, write/read enabled circuit, etc. These peripheral circuits consume a considerable power in the chip. To implement an array using the proposed cell, we have adopted the hierarchical design approach in which instead of giving individual signals (WS, WL, and RWL) to each cell, global signal circuits are used [108]. The main advantage of using the hierarchical design is the use of shorter wires within local blocks, which reduces parasitic capaci‐ tances. In this approach, at one time only one block address can be activated which saves consid‐ erable power. Each global signal is connected to corresponding local signal through NMOS pass transistor to save the area. The column‐based approach is adopted in which signal WS is routed parallel to write bitline BL. To avoid the column half selected disturbance in the array due to tog‐ gle of the signal WS during write operation, we proposed a circuit as shown in **Figure 7(b)** [107].

#### **5.4. Proposed decoder circuits and sense amplifier**

The most important signals that affect the power dissipation in SRAM memory are the address lines, read and write enable circuits, block select, and sense amplifier. To address these concerns, we have designed new architectures for these circuits to reduce the power consumptions. The detail about these circuits is available in our published work [108, 109].

The proposed column decoder circuit is shown in **Figure 11** [108], where *C*Lj represents the address of the columns to be selected (*j* is an integer number). The architecture of the other decoder cir‐ cuits is explained in Ref. [108]. Since the proposed decoder is implemented without using NAND gates as in the conventional decoder, the number of transistors is reduced to 546 compared to 1939 in the conventional decoder [108]. The reduced number of transistor results in lower para‐ sitic capacitance, which leads to approximately 76% power saving [108]. The proposed WL driver consumes lower power compared to other designs due to the compactness of the circuit.

As we know most of the current will be dissipated in the SRAM cell by sense amplifier. To address this issue we have also designed a single‐ended sense amplifier [109]. The proposed SA (sense amplifier) reduces the power consumption by controlling the leakage current dur‐ ing evaluation/precharge mode. The circuit can be used even at higher temperature with minimum power consumption. The working of the circuit is explained in detail in Ref. [109].

**Table 1** gives the comparison of read power consumption in various sense amplifiers. The main reason for lower power consumption in the proposed circuit is due to lower average current during evaluation mode, small voltage drops on RBL, and lower leakage current com‐ pared to other circuits [110, 111]. During hold mode, power consumption in the proposed circuit is lower than the other circuits [110, 111] due to gating effect.

We have implemented 32Kb SRAM array using the proposed cell and proposed decoder cir‐ cuits/sense amplifier. The simulation results were compared with ref. [112] array. The results were encouraging in terms of power consumption as seen in **Figures 12** and **13**, respectively. The lower hold power obtained in the implemented cache is due to write signal WS and stack effect (read path).

Power Efficient Data-Aware SRAM Cell for SRAM-Based FPGA Architecture http://dx.doi.org/10.5772/67257 237

**Figure 11.** Proposed decoder [108].

Although the proposed cell imposes area overhead compared to the conventional 6T cell, it is not a serious threat in FPGA implementation because of lower leakage current through

In SRAM‐based FPGA memory accesses are performed with a designed clock and series of inter‐ face circuits like row/column decoder, write/read enabled circuit, etc. These peripheral circuits consume a considerable power in the chip. To implement an array using the proposed cell, we have adopted the hierarchical design approach in which instead of giving individual signals (WS, WL, and RWL) to each cell, global signal circuits are used [108]. The main advantage of using the hierarchical design is the use of shorter wires within local blocks, which reduces parasitic capaci‐ tances. In this approach, at one time only one block address can be activated which saves consid‐ erable power. Each global signal is connected to corresponding local signal through NMOS pass transistor to save the area. The column‐based approach is adopted in which signal WS is routed parallel to write bitline BL. To avoid the column half selected disturbance in the array due to tog‐ gle of the signal WS during write operation, we proposed a circuit as shown in **Figure 7(b)** [107].

The most important signals that affect the power dissipation in SRAM memory are the address lines, read and write enable circuits, block select, and sense amplifier. To address these concerns, we have designed new architectures for these circuits to reduce the power consumptions. The detail about these circuits is available in our published work [108, 109]. The proposed column decoder circuit is shown in **Figure 11** [108], where *C*Lj represents the address of the columns to be selected (*j* is an integer number). The architecture of the other decoder cir‐ cuits is explained in Ref. [108]. Since the proposed decoder is implemented without using NAND gates as in the conventional decoder, the number of transistors is reduced to 546 compared to 1939 in the conventional decoder [108]. The reduced number of transistor results in lower para‐ sitic capacitance, which leads to approximately 76% power saving [108]. The proposed WL driver

consumes lower power compared to other designs due to the compactness of the circuit.

circuit is lower than the other circuits [110, 111] due to gating effect.

effect (read path).

As we know most of the current will be dissipated in the SRAM cell by sense amplifier. To address this issue we have also designed a single‐ended sense amplifier [109]. The proposed SA (sense amplifier) reduces the power consumption by controlling the leakage current dur‐ ing evaluation/precharge mode. The circuit can be used even at higher temperature with minimum power consumption. The working of the circuit is explained in detail in Ref. [109]. **Table 1** gives the comparison of read power consumption in various sense amplifiers. The main reason for lower power consumption in the proposed circuit is due to lower average current during evaluation mode, small voltage drops on RBL, and lower leakage current com‐ pared to other circuits [110, 111]. During hold mode, power consumption in the proposed

We have implemented 32Kb SRAM array using the proposed cell and proposed decoder cir‐ cuits/sense amplifier. The simulation results were compared with ref. [112] array. The results were encouraging in terms of power consumption as seen in **Figures 12** and **13**, respectively. The lower hold power obtained in the implemented cache is due to write signal WS and stack

bitline, more number of cells can be connected on a single bitline in the array.

**5.4. Proposed decoder circuits and sense amplifier**

236 Field - Programmable Gate Array


**Table 1.** Read power consumption in various SA [109].

The overall reduction in dynamic and static power in the proposed cell, decoder, and sense amplifier make them an ideal choice for the implementation of power‐efficient and reliable SRAM‐based FPGA.

**Figure 12.** Write power consumption in 32 kb SRAM array.

**Figure 13.** Read power consumption in 32 kb array.
