**4. The architecture of OSORT module**

The OSORT module contains buffers, distance computation unit, mean updating unit, comparator, and controller, as shown in **Figure 10**. The centroid and the size of each cluster are stored in the buffers. The distance computation unit and mean updating unit are responsible for squared distance computation and the updating of centroid of the clusters, respectively. The control unit coordinates different components of the OSORT module for carrying out the unsupervised clustering operations.

**Figure 10.** Architecture of OSORT module.

#### **4.1. Buffers**

**Figure 8.** Architecture of switch buffer.

12 Field - Programmable Gate Array

**Figure 9.** Flowchart of switch controller.

There are three buffers in the OSORT module, which are denoted by Buffer 1, Buffer 2, and Buffer 3, respectively. Buffer 1 holds an input spike detected by the correlators. Buffers 2 and 3 contain the mean value and size of each cluster, respectively. Buffer 1 is a simple *N*-stage shift register, fetching or delivering one sample of the input spike at a time. As shown in **Fig‐ ure 11**, Buffer 2 contains *QN*-stage shift registers. Each shift register holds the mean value of a cluster. Therefore, *Q* is the upper-bound of the number of clusters. Buffer 2 updates or provides mean values of clusters one at a time. Because the mean value **t***<sup>i</sup>* of a cluster is the average value of the spikes mapping to that cluster, the mean value also contains *N* samples. Accessing the mean value of the cluster is also carried out one sample at a time. Buffer 3 records the size of each cluster. There are *Q* entries in the buffer. The *i*th entry contains the number of spikes in the cluster .

**Figure 11.** Architecture of Buffer 2.

#### **4.2. Distance computation unit and mean updating unit**

Because buffers in the memory unit can be accessed one sample at a time, the circuits in the distance computation unit and mean updating unit provide only sample-wise computations. This is beneficial for reducing the area costs. There are two cases when the distance computation unit needs to be activated. In the first case, a new spike is arrived. To find the cluster for the new spike, the squared distance computation is required. In the second case, it is desired to merge two clusters. The squared distance calculation is needed for finding the closest clusters. The distance computation unit takes the samples fetched from Buffer 1 and Buffer 2 as inputs. The unit finds the squared distance between the spikes stored in Buffer 1 and the mean value of a cluster selected in Buffer 2. Upon the completion of the squared distance computation, the comparison of the new squared distance with the current minimum distance stored in a register of the unit is carried out. If the new squared distance is smaller than the current minimum distance, then it becomes the new current minimum distance. The same squared distance computation and comparison operations will be repeated for until all the mean values in Buffer 2 are searched. This scheme is useful for finding the best matching mean value stored in Buffer 2 to the spike stored in Buffer 1.

The mean updating unit is activated after a new spike is assigned to a cluster, or after two clusters are merged. In these cases, the mean of the updated cluster needs to be computed. Note that the clusters to be updated are determined by the distance computation unit. The mean updating unit is only responsible for the computation of the new mean of the updated clusters. The circuit takes waveforms stored in Buffer 1 and Buffer 2, and the cluster size stored in Buffer 3 as inputs. The updated mean is the weighted sum of the waveforms obtained from Buffer 1 and Buffer 2, as shown in **Figure 12**. The weights are determined from the cluster sizes from Buffer 3. The updated results are then stored back to Buffer 2 and Buffer 3.

**Figure 12.** Architecture of mean updating unit.

#### **4.3. Control unit**

**Figure 11.** Architecture of Buffer 2.

14 Field - Programmable Gate Array

**4.2. Distance computation unit and mean updating unit**

value stored in Buffer 2 to the spike stored in Buffer 1.

Because buffers in the memory unit can be accessed one sample at a time, the circuits in the distance computation unit and mean updating unit provide only sample-wise computations. This is beneficial for reducing the area costs. There are two cases when the distance computation unit needs to be activated. In the first case, a new spike is arrived. To find the cluster for the new spike, the squared distance computation is required. In the second case, it is desired to merge two clusters. The squared distance calculation is needed for finding the closest clusters. The distance computation unit takes the samples fetched from Buffer 1 and Buffer 2 as inputs. The unit finds the squared distance between the spikes stored in Buffer 1 and the mean value of a cluster selected in Buffer 2. Upon the completion of the squared distance computation, the comparison of the new squared distance with the current minimum distance stored in a register of the unit is carried out. If the new squared distance is smaller than the current minimum distance, then it becomes the new current minimum distance. The same squared distance computation and comparison operations will be repeated for until all the mean values in Buffer 2 are searched. This scheme is useful for finding the best matching mean

The mean updating unit is activated after a new spike is assigned to a cluster, or after two clusters are merged. In these cases, the mean of the updated cluster needs to be computed. Note that the clusters to be updated are determined by the distance computation unit. The mean updating unit is only responsible for the computation of the new mean of the updated clusters. The circuit takes waveforms stored in Buffer 1 and Buffer 2, and the cluster size stored The control unit activates components of the memory unit and cluster computation unit for the unsupervised clustering. The states of the control unit are summarized in **Table 1**. In addition to the tasks carried out by each state, the activated circuit components associated with each state are also included in the table. **Figure 13** shows the flowchart of the OSORT algorithm in terms of the states defined in **Table 1**.


**Table 1.** States of the control unit in OSORT module.

**Figure 13.** The flowchart of the controller in the OSORT module.

The flowchart in **Figure 13** is consistent with that shown in **Figure 1**. Although the combinations of the states shown in **Table 1** are able to implement the OSORT algorithm, additional modifications may still be desirable to facilitate the hardware implementation. One modification implemented in the controller is to handle the cases when the current number of clusters reaches the upper limit *Q*, and the creation of new cluster is still desired. In this case, the least recently updated cluster will be replaced by the new cluster. To carry out this modification, an additional field is added to each entry of Buffer 3. The entry indicates the number of updates in the past for the corresponding cluster. This modification can be viewed as an additional function supported in State 3 in **Table 1** for the creation of a new cluster.

### **5. Global controller and NOC**

As depicted in **Figure 3**, the global controller in the proposed spike sorting system coordinates the operations of the normalized correlator module and OSORT module. The major goal of the global controller is to fetch detected spikes from the normalized correlator module and deliver them to the OSORT module. When a buffer in the switch buffer of the normalized correlator module becomes full, the global controller starts to fetch the detected spikes one at a time from the buffer to the OSORT module.

The fetching operations are repeated until all the spikes stored in the buffer are fetched. At this time, the buffer becomes available again for storing the new detected spikes, as shown in **Figure 14**. The proposed hardware spike sorting system is configured as a user component in a NOC system, which is designed by the QSYS platform. In addition to the proposed system, we see from **Figure 15** that the NOC contains the NIOS II processor, a DMA controller, an onchip RAM, and a hardware timer.

**Figure 14.** The flowchart of the global controller.

**Figure 13.** The flowchart of the controller in the OSORT module.

16 Field - Programmable Gate Array

**5. Global controller and NOC**

The flowchart in **Figure 13** is consistent with that shown in **Figure 1**. Although the combinations of the states shown in **Table 1** are able to implement the OSORT algorithm, additional modifications may still be desirable to facilitate the hardware implementation. One modification implemented in the controller is to handle the cases when the current number of clusters reaches the upper limit *Q*, and the creation of new cluster is still desired. In this case, the least recently updated cluster will be replaced by the new cluster. To carry out this modification, an additional field is added to each entry of Buffer 3. The entry indicates the number of updates in the past for the corresponding cluster. This modification can be viewed as an additional

As depicted in **Figure 3**, the global controller in the proposed spike sorting system coordinates the operations of the normalized correlator module and OSORT module. The major goal of the global controller is to fetch detected spikes from the normalized correlator module and

function supported in State 3 in **Table 1** for the creation of a new cluster.

The raw spike trains are stored in the on-chip RAM. The DMA controller is responsible for delivering the spike trains to the proposed hardware system without the intervention of the NIOS II processor. The DMA controller is able to halt the delivery of spike trains automatically when both buffers in the switch buffer of the normalized correlator module are unavailable and/or full. The hardware timer is used to measure the computation speed of the proposed system. The NIOS II processor integrates different components in the NOC. It activates the DMA controllers for the delivery of spike trains. After that, the processor collects the spike sorting results from the OSORT modules of the proposed spike sorting system. The processor is also able to read the information provided by the hardware timer for the measurement of the computation speed of the proposed circuit.

**Figure 15.** The proposed NOC system for spike sorting.
