**2. The algorithm**

and classification operations for separating spikes from background noise and clustering the detected spikes. Large amount of spike-trains would impose heavy computation load for a

One approach to reduce the computation time is to implement a spike sorting system by hardware. A number of spike sorting systems based on field programmable gate array (FPGA) [3] have been proposed. In [4], a spike classification architecture using probabilistic neural networks [5] is proposed. Although high speed-up over its software counterpart is observed, the system does not provide spike detection. In addition, the architecture requires the user to input the number of clusters. Therefore, it does not support a fully unsupervised hardware classification. Similarly, the architecture proposed in [6] also focuses on the classification. The generalized Hebbian algorithm (GHA) [7] is implemented by FPGA in this work for feature extraction. The features produced by the architecture are then clustered by the k-means algorithm. The architecture does not include the hardware implementation of the k-means algorithm. In addition, the number of clusters still needs to be prespecified in the k-means algorithm. The architecture in [8] is able to carry out the feature extraction and clustering in hardware. In the architecture, the feature extraction and clustering are based on GHA and fuzzy c-means [9] algorithm, respectively. Therefore, similar to the architecture in [6], fully unsupervised classification is difficult for the architecture in [8] because the number of clusters

An alternative FPGA-based hardware architecture [10] is to adopt OSORT algorithm [11] for spike-sorting. The OSORT algorithm is able to perform clustering without the prior knowledge of the number of clusters. Unsupervised classification therefore can be carried out. The architecture also includes a spike detection circuit based on the nonlinear energy operators (NEOs) [12], which is an energy-based detection algorithm. Similar to other energy-based algorithms [13, 14], the NEO algorithm is simple and effective. However, although the energybased algorithms can operate in conjunction with a number of automatic threshold algorithms [12–14], proper selection of threshold values for these algorithms may still be difficult when noise becomes large. Therefore, their performance may deteriorate rapidly as noise energy

The template-based spike detection algorithm may be suited for the detection of spikes from the source sequences with high noise level. A matched filter [15, 16] is a typical technique based on templates. To detect the presence of the templates, the filter correlates known templates with the input spike trains. This operation can also be viewed as a likelihood ratio detection (LRT) [17]. A drawback of the matched filter is that the templates are required. Although the adaptive generation of templates is possible [18], only a single template is produced for spike trains. However, because spike trains in general are formed from two or more neurons, a single template may not be sufficient for the detection of all the spikes generated by the neurons. The template-based algorithm presented by [19] has been found to be effective for the detection of noisy spikes. It adopts the OSORT algorithm for automatic template generation. Normalized correlation operations then carry out the spike detection using the templates produced by the OSORT algorithm. Nevertheless, in the algorithm, both the template generation and correlation computation have high computational complexities.

software spike sorting system, resulting in long processing time.

still needs to be known beforehand.

2 Field - Programmable Gate Array

increases.

Before presenting the hardware architectures for spike sorting, we first review the spike detection and clustering algorithms adopted by this work. Detailed discussions of these algorithms can be found in [19].

### **2.1. Normalized correlation**

Consider a spike train **X**, where the *m*th sample of **X** is denoted by *x*[*m*]. Moreover, the *m*th segment of the spike train **X** is denoted by x*m* = [*x*[*m*], *x*[*m* − 1], …, *x*[*m* − *N* + 1]]*<sup>T</sup>*, where *N* is the length of the segment. Suppose the spike train is processed by a matched filter with template t = [*t*[0], …, *t*[*<sup>N</sup>* − 1]]*<sup>T</sup>*. Let and be the normalized version of **x***m* and **t**, respectively. That is,

$$
\overline{\mathbf{x}}\_m = \frac{\mathbf{x}\_m}{\left\| \mathbf{x}\_m \right\|}, \quad \overline{\mathbf{t}} = \frac{\mathbf{t}}{\left\| \mathbf{t} \right\|}. \tag{1}
$$

The normalized output at *m*, denoted by, , is computed from

#### 4 Field - Programmable Gate Array

$$\overline{y}\begin{bmatrix} m \end{bmatrix} = \sum\_{k=0}^{N-1} \overline{\mathbf{x}}\begin{bmatrix} m-k \end{bmatrix} \overline{t}\begin{bmatrix} k \end{bmatrix} = \overline{\mathbf{x}}\_m^T \overline{\mathbf{t}} \tag{2}$$

This is the inner product of segment and template , which indicates the normalized correlation between these two vectors. The segment **x***m* is detected as a spike when is larger than a prespecified threshold *η*. Let , be the squared distance between and . It can be shown that

$$d(\overline{\mathbf{x}}\_m, \overline{\mathbf{t}}) = 2 - 2\overline{\mathbf{x}}\_m^T \overline{\mathbf{t}}.\tag{3}$$

Because , <sup>≥</sup> <sup>0</sup>,

$$
\overline{\mathbf{x}}\_m^T \overline{\mathbf{t}} \le 1.\tag{4}
$$

Our normalized correlation operations are based on and . When ≥ , then **x***m*is detected as a spike. From Eq. (4), it follows that

$$
\eta \le 1 \tag{5}
$$

The normalized correlation operations shown in Eq. (2) may have high computational complexities. Although the template can be computed beforehand, the computation of

still needs to be carried out online, involving the calculation of||**x***m*||and <sup>=</sup> . Note

that *N* multiplications, *N*−1 additions, and one squared root operation are required for the computation of ||**x***m*||. Moreover, *<sup>N</sup>* divisions are needed for . Finally, the inner product of

 requires *N* multiplications and *N*−1 additions. In total, the basic implementation of the normalized correlation operations requires 2*N* multiplications, (2*N*−2) additions, *N* divisions, and 1 squared root operation. When the fast computation is an important concern, the hardware implementation may then be desirable.

#### **2.2. OSORT algorithm**

The OSORT algorithm is an unsupervised template-based clustering algorithm for spike sorting. It does not require feature extraction, and the number of clusters *c* is automatically determined by the algorithm. Define , = 1, …, , as the clusters of spikes generated by the OSORT algorithm, where **t***<sup>i</sup>* **,***i* **= 1, …,** *c***,** is the average value of the spikes belonging to .

Given a current detected spike **s** for clustering, in the OSORT algorithm, the squared distances *di* = *d*(**s**, **t***<sup>i</sup>* ) for *i* = 1, …, *c* are first computed. The minimum distance \* is then identified, where \* = arg min . We will assign **s** to \* when \* is less than a prespecified threshold *τ*1. In this case, because \* has a new member, its mean value \* will also be updated. Otherwise, a new cluster is created, where **s** is its only member. After the updating of \*, the cluster merging process will be activated. The process involves the computation of the distance between \* and **t***<sup>j</sup>* , *i*\* ≠ *j*. Two clusters \*and \*will be merged when \*, \* < 2, where\* = arg min , \* \*, . **Figure 1** summarizes the operations of the OSORT algorithm.

**Figure 1.** The flowchart of OSORT operations.

1

é ù é ù éù = -= ë û ë û ëû å **x t**

**x t** £ 1. *<sup>T</sup>*

The normalized correlation operations shown in Eq. (2) may have high computational complexities. Although the template can be computed beforehand, the computation of

that *N* multiplications, *N*−1 additions, and one squared root operation are required for the computation of ||**x***m*||. Moreover, *<sup>N</sup>* divisions are needed for . Finally, the inner product of

 requires *N* multiplications and *N*−1 additions. In total, the basic implementation of the normalized correlation operations requires 2*N* multiplications, (2*N*−2) additions, *N* divisions, and 1 squared root operation. When the fast computation is an important concern, the

The OSORT algorithm is an unsupervised template-based clustering algorithm for spike sorting. It does not require feature extraction, and the number of clusters *c* is automatically

still needs to be carried out online, involving the calculation of||**x***m*||and <sup>=</sup>

Our normalized correlation operations are based on and . When

This is the inner product of segment and template , which indicates the normalized correlation between these two vectors. The segment **x***m* is detected as a spike when is larger than a prespecified threshold *η*. Let , be the squared distance between and . It can be

*T m*

*ym xm kt k* (2)

( ,) 2 2 . **x t xt** = - *<sup>T</sup> m m <sup>d</sup>* (3)

*<sup>m</sup>* (4)

*η* £ 1 (5)

≥ , then **x***m*is detected

 

. Note


*N*

0

=

*k*

shown that

**2.2. OSORT algorithm**

Because , <sup>≥</sup> <sup>0</sup>,

4 Field - Programmable Gate Array

as a spike. From Eq. (4), it follows that

hardware implementation may then be desirable.

### **2.3. Normalized correlation and OSORT algorithm for spike sorting**

By combining the normalized correlation with the OSORT algorithm, an effective spike sorting system for both spike detection and classification can be realized. The system is a feedback system capable of automatic template generation for spike detection and unsupervised clustering for the classification. The block diagram of the system is revealed in **Figure 2**.

**Figure 2.** The block diagram of the spike detection/sorting system based on GLRT test, normalized correlator, and OSORT algorithms.

Initially, the clusters and templates produced by the OSORT are not available. As a result, it may be difficult to carry out the normalized correlation for spike detection. One way to solve this problem is to use only the block energy for the detection of spikes. The detected spikes are then clustered by the OSORT algorithm for the generation of initial templates.

After the templates become available, the spike detection is then based on the normalized correlation . The input block is detected as a spike when any of the c normalized correlation exceeds the threshold *η*. Because of the normalized correlation operations, the threshold value is bounded as shown in Eq. (5). Note that the templates for spike detection will be updated regularly so that the variations of input signals can be tracked to improve the spike detection performance.

The hardware architecture for implementing the spike sorting system is depicted in **Figure 3**. The architecture contains two modules and one controller. The first module of the architecture, termed normalized correlator module, is responsible for the spike detection. It is capable of performing both the GLRT and normalized correlation operations. The second module, termed OSORT module, carries out the unsupervised OSORT spike sorting. The global controller coordinates the operations of these two modules. The architecture and detailed operations of the normalized correlator module and OSORT module are presented in the following two sections.

**Figure 3.** The hardware architecture of the proposed spike sorting system.

**2.3. Normalized correlation and OSORT algorithm for spike sorting**

By combining the normalized correlation with the OSORT algorithm, an effective spike sorting system for both spike detection and classification can be realized. The system is a feedback system capable of automatic template generation for spike detection and unsupervised clustering for the classification. The block diagram of the system is revealed in **Figure 2**.

**Figure 2.** The block diagram of the spike detection/sorting system based on GLRT test, normalized correlator, and

Initially, the clusters and templates produced by the OSORT are not available. As a result, it may be difficult to carry out the normalized correlation for spike detection. One way to solve this problem is to use only the block energy for the detection of spikes. The detected spikes are

After the templates become available, the spike detection is then based on the normalized

exceeds the threshold *η*. Because of the normalized correlation operations, the threshold value is bounded as shown in Eq. (5). Note that the templates for spike detection will be updated regularly so that the variations of input signals can be tracked to improve the spike detection

. The input block is detected as a spike when any of the c normalized correlation

then clustered by the OSORT algorithm for the generation of initial templates.

OSORT algorithms.

6 Field - Programmable Gate Array

correlation

performance.
