**3. Architecture of the normalized correlator module**

The block diagram of the normalized correlator module is revealed in **Figure 4**. The module supports the filtering, block energy computation, correlation computation, detection, and buffering. The filtering operation is the preprocessing step for the spike detection. It reduces the DC offset and noises. The objective of the block energy computation is to find ||**x***m*||2 , which is then followed by correlation computation for calculating . The detection results are then produced by the comparison operations. The detected spikes are stored in the switch buffer, which can be accessed by the OSORT module for subsequent clustering operations. Without loss of generality, the length of spike is set to be *N* = 64 for our discussion.

**Figure 4.** The block diagram of the normalized correlator module.

#### **3.1. Filter unit and block energy computation unit**

The filter unit is a hardware implementation of a band-pass Butterworth filter. The filter unit contains multipliers, shift registers, and adders. The architecture is a simple realization of the direct form I of IIR filters. To implement the block energy computation unit, we first note that a basic approach may involve *N* multiplications for the energy computation, resulting in higharea costs. The proposed approach is based on the fact that

$$\left\|\mathbf{x}\_{m}\right\|^{2} = \left\|\mathbf{x}\_{m-1}\right\|^{2} + \mathbf{x}^{2}\left\lceil m \right\rceil - \mathbf{x}^{2}\left\lceil m - N \right\rceil. \tag{6}$$

Consequently, the calculation of ||**x***m*||2 needs only two multiplications. This is because || **x***m* − 1||2 (i.e., the block energy of the previous block) is already available. **Figure 5** shows the resulting design, which contains two multiplier, a single *N*-stage shift register, and two adders. The goal of the shift register is to store the previous samples (i.e., *x*[*k*], *k* = *m* − 1,…, *m* − *N*) of *x*[*m*]. The shift register therefore is able to offer the sample *x*[*m* − *N*] for the calculation of *x*2 [*m* − *N*]. In addition, the shift register can be employed for the correlation computation.

**Figure 5.** Architecture of block energy computation unit.

#### **3.2. Correlator unit**

**Figure 4.** The block diagram of the normalized correlator module.

8 Field - Programmable Gate Array

**3.1. Filter unit and block energy computation unit**

Consequently, the calculation of ||**x***m*||2

**x***m* − 1||2

area costs. The proposed approach is based on the fact that

The filter unit is a hardware implementation of a band-pass Butterworth filter. The filter unit contains multipliers, shift registers, and adders. The architecture is a simple realization of the direct form I of IIR filters. To implement the block energy computation unit, we first note that a basic approach may involve *N* multiplications for the energy computation, resulting in high-

<sup>1</sup> . - = + -- éù é ù

resulting design, which contains two multiplier, a single *N*-stage shift register, and two adders. The goal of the shift register is to store the previous samples (i.e., *x*[*k*], *k* = *m* − 1,…, *m* − *N*) of

(i.e., the block energy of the previous block) is already available. **Figure 5** shows the

ëû ë û **x x** *m m xm xmN* (6)

needs only two multiplications. This is because ||

2 2 2 2

The goal of the unit is to carry out the normalized correlation . Note that the normalized template can be obtained offline from the OSORT circuit. Therefore, it is only necessary to find online. One simple approach to compute is to divide each sample of **x***m* by ||**x***m*||. Because the block **x***m* contains *N* samples, *N* dividers are required. In the proposed architecture, a novel postnormalization approach is employed, where the inner product is computed first. Because is a scalar, we can then use only one divider to compute by dividing by ||**x***m*||.

**Figure 6** shows the architecture of the correlator unit for the case of two templates. The circuit consists of 2*N* multipliers, one squared root circuit, two accumulators, and one divider. Moreover, there are two registers for storing the normalized templates <sup>1</sup> and <sup>2</sup>. Recall that the shift register in the block energy computation unit contains the samples of **x***m*. Based on **x***m* and , = 1, 2,the computation of each , = 1, 2, is carried out in parallel. Moreover, the multiplication results are accumulated in a pipelined fashion. The accumulation results are then scaled by a factor of 1/||**x***m*||. Because the block energy computation unit provides ||**x***m*||2 , only a squared root circuit and an inverse circuit are needed for the calculation of 1/||**x***m*||, as shown in **Figure 6**.

**Figure 6.** Architecture of correlator unit.

#### **3.3. Threshold unit**

Although the operations of the unit can be easily accomplished by a simple comparison circuit, the detection accuracy may be further improved by taking the detection results of the neighboring blocks into consideration. Because the neighboring blocks are overlapping, they may be similar. As a result, the normalized correlation values of the neighboring blocks may also be similar. Therefore, it is likely that an occurrence of a single spike may result in the issues of multiple hits.

To solve this problem, when the normalized correlation value of a block is above the threshold, a hit is not immediately declared. The architecture will then examine the normalized correlation values of the previous blocks. A hit would actually be issued only if *k* out of *K* preceding blocks have normalized correlation values above a threshold. In this way, the false alarm rate (FAR) can be effectively lowered. **Figure 7** shows the corresponding architecture, which contains a *K*-stage shift register storing the comparison results of the *K* previous blocks. Each stage of the shift register contains only a single-bit information, where 1 indicates that the corresponding block has normalized correlation value above the threshold *η*, and 0 otherwise. Therefore, if the sum of all the *K* stages is larger or equal to *k*, then at least *k* preceding blocks have normalized correlation value above the threshold. In this case, the architecture issues a hit.

**Figure 7.** Architecture of threshold unit.

#### **3.4. Switch buffer**


1/||**x***m*||, as shown in **Figure 6**.

10 Field - Programmable Gate Array

**Figure 6.** Architecture of correlator unit.

**3.3. Threshold unit**

multiple hits.

, only a squared root circuit and an inverse circuit are needed for the calculation of

Although the operations of the unit can be easily accomplished by a simple comparison circuit, the detection accuracy may be further improved by taking the detection results of the neighboring blocks into consideration. Because the neighboring blocks are overlapping, they may be similar. As a result, the normalized correlation values of the neighboring blocks may also be similar. Therefore, it is likely that an occurrence of a single spike may result in the issues of

To solve this problem, when the normalized correlation value of a block is above the threshold, a hit is not immediately declared. The architecture will then examine the normalized The goal of switch buffer is to store the detected spikes for subsequent clustering operations. As shown in **Figure 8**, there are two buffers (denoted as Buffer x and Buffer y) in the circuit. When one of the buffers stores the detected spikes, the other provides the detected spikes to the OSORT module for clustering operations. The switch controller in the circuit is responsible for the determination of the buffer to store the detected spikes. The flowchart of the operations of the switch controller is shown in **Figure 9**. From the flowchart, it can be observed that the controller assigns the detected spikes to a buffer in accordance with the availability of that buffer. A buffer is available when it has empty cells for storing new detected spikes, and is not currently providing spikes to the OSORT module.

**Figure 8.** Architecture of switch buffer.

**Figure 9.** Flowchart of switch controller.
