**4. Experimental results**

This section presents some experimental results of the proposed architecture applied to texture classification. The target FPGA device for all the experiments in this paper is Altera Cyclone III (Altera Corp., 2010). The design platform is Altera Quartus II with SOPC Builder and NIOS II IDE. Two sets of textures are considered in the experiments. The first set of textures, shown in Figure 15, consists of three different textures. The second set of textures is revealed in Figure 16, which contains four different textures. The size of each texture in Figures 15 and 16 is 320 × 320.

In the experiment, the principal component based *k* nearest neighbor (PC-*k*NN) rule is adopted for texture classification. Two steps are involved in the PC-*k*NN rule. In the first step, the GHA is applied to the input vectors to transform *m* dimensional data into *p* principal components. The synaptic weight vectors after the convergence of GHA training are adopted to span the linear transformation matrix. In the second step, the *k*NN method is applied to the principal subspace for texture classification.

10 Will-be-set-by-IN-TECH

**x**(*n*) and synaptic weight vector **w***j*(*n*) are obtained from the Buffer A and Buffer C of the memory unit, respectively. When both **x**(*n*) and **w***j*(*n*) are available, the proposed PCC unit then computes *yj*(*n*). Note that, after the *yj*(*n*) is obtained, the SWU unit can then compute **w***j*(*n* + 1). Figure 13 reveals the timing diagram of the proposed architecture. It can be observed from Figure 13 that both the *yj*<sup>+</sup>1(*n*) and **w***j*(*n* + 1) are computed concurrently. The

The proposed architecture is used as a custom user logic in a SOPC system consisting of softcore NIOS CPU (Altera Corp., 2010), DMA controller and SDRAM, as depicted in Figure 14. All training vectors are stored in the SDRAM and then transported to the proposed circuit via the Avalon bus. The softcore NIOS CPU runs on a simple software to coordinate different components, including the proposed custom circuit in the SOPC. The proposed circuit operates as a hardware accelerator for GHA training. The resulting SOPC system is

This section presents some experimental results of the proposed architecture applied to texture classification. The target FPGA device for all the experiments in this paper is Altera Cyclone III (Altera Corp., 2010). The design platform is Altera Quartus II with SOPC Builder and NIOS II IDE. Two sets of textures are considered in the experiments. The first set of textures, shown in Figure 15, consists of three different textures. The second set of textures is revealed in Figure 16, which contains four different textures. The size of each texture in Figures 15 and 16

In the experiment, the principal component based *k* nearest neighbor (PC-*k*NN) rule is adopted for texture classification. Two steps are involved in the PC-*k*NN rule. In the first step, the GHA is applied to the input vectors to transform *m* dimensional data into *p* principal components. The synaptic weight vectors after the convergence of GHA training are adopted to span the linear transformation matrix. In the second step, the *k*NN method is applied to the

throughput of the proposed architecture can then be effectively enhanced.

Fig. 13. The timing diagram of the proposed architecture.

able to perform efficient on-chip training for GHA-based applications.

**3.4 SOPC-based GHA training system**

principal subspace for texture classification.

**4. Experimental results**

is 320 × 320.

Fig. 14. The SOPC system for implementing GHA.

Fig. 15. The first set of textures for the experiments.

Figures 17 and 18 show the distribution of classification success rates (CSR) of the proposed architecture for the texture sets in Figures 15 and 16, respectively. The classification success rate is defined as the number of test vectors which are successfully classified divided by the total number of test vectors. The number of principal components is *p* = 4. The vector dimension is *m* = 16 × 16. The distribution is based on 20 independent GHA training processes. The distribution of the architecture presented in (Lin et al., 2011) with the same *p* is also included for comparison purpose. The vector dimension for (Lin et al., 2011) is *m* = 4 × 4.

Fig. 18. The distribution of CSR of the proposed architecture for the texture set in Figure 16.

FPGA Implementation for GHA-Based Texture Classification 177

vector dimension because the area cost of the SWU unit in the architecture is independent of vector dimension. By contrast, the area cost of the SWU unit in (Lin et al., 2011) grows with the vector dimension. Therefore, only smaller vector dimension (i.e., *m* = 4 × 4) can be

To further elaborate these facts, Tables 1 and 2 show the hardware resource consumption of the proposed architecture and the architecture in (Lin et al., 2011) for vector dimensions *m* = 4 × 4 and *m* = 16 × 16. Three different area costs are considered in the table: logic elements (LEs), embedded memory bits, and embedded multipliers. It can be observed from Tables 1 and 2 that given the same *m* = 4 × 4 and the same *p*, the proposed architecture consumes significantly less hardware resources as compared with the architecture in (Lin et al., 2011). Although the area costs of the proposed architecture increase as *m* becomes 16 × 16, as shown in Table 1, they are only slightly higher than those of (Lin et al., 2011) in Table 2. The proposed architecture therefore is well suited for GHA training with larger vector dimension due to

Proposed GHA with *m* = 4 × 4 Proposed GHA with *m* = 16 × 16

Bits Multipliers Bits Multipliers

*p* LEs Memory Embedded LEs Memory Embedded

3 3942 1152 36 63073 1152 569 4 4097 1152 36 65291 1152 569 5 4394 1280 36 70668 1280 569 6 4686 1280 36 75258 1280 569 7 4988 1280 36 79958 1280 569

Table 1. Hardware resource consumption of the proposed GHA architecture for vector

implemented.

better spatial information exploitation.

dimensions *m* = 4 × 4 and *m* = 16 × 16.

Fig. 16. The second set of textures for the experiments.

Fig. 17. The distribution of CSR of the proposed architecture for the texture set in Figure 15.

It can be observed from Figures 17 and 18 that the proposed architecture has better CSR. This is because the vector dimension of the proposed architecture is higher than that in (Lin et al., 2011). Spatial information of textures therefore is more effectively exploited for improving CSR by the proposed architecture. In fact, the vector dimension in the proposed architecture is *m* = 16 × 16. The proposed architecture is able to implement hardware GHA for larger 12 Will-be-set-by-IN-TECH

Fig. 17. The distribution of CSR of the proposed architecture for the texture set in Figure 15.

It can be observed from Figures 17 and 18 that the proposed architecture has better CSR. This is because the vector dimension of the proposed architecture is higher than that in (Lin et al., 2011). Spatial information of textures therefore is more effectively exploited for improving CSR by the proposed architecture. In fact, the vector dimension in the proposed architecture is *m* = 16 × 16. The proposed architecture is able to implement hardware GHA for larger

Fig. 16. The second set of textures for the experiments.

Fig. 18. The distribution of CSR of the proposed architecture for the texture set in Figure 16.

vector dimension because the area cost of the SWU unit in the architecture is independent of vector dimension. By contrast, the area cost of the SWU unit in (Lin et al., 2011) grows with the vector dimension. Therefore, only smaller vector dimension (i.e., *m* = 4 × 4) can be implemented.

To further elaborate these facts, Tables 1 and 2 show the hardware resource consumption of the proposed architecture and the architecture in (Lin et al., 2011) for vector dimensions *m* = 4 × 4 and *m* = 16 × 16. Three different area costs are considered in the table: logic elements (LEs), embedded memory bits, and embedded multipliers. It can be observed from Tables 1 and 2 that given the same *m* = 4 × 4 and the same *p*, the proposed architecture consumes significantly less hardware resources as compared with the architecture in (Lin et al., 2011). Although the area costs of the proposed architecture increase as *m* becomes 16 × 16, as shown in Table 1, they are only slightly higher than those of (Lin et al., 2011) in Table 2. The proposed architecture therefore is well suited for GHA training with larger vector dimension due to better spatial information exploitation.


Table 1. Hardware resource consumption of the proposed GHA architecture for vector dimensions *m* = 4 × 4 and *m* = 16 × 16.

GHA architectures. The architecture also has low area cost for PCA analysis with high vector dimension. The proposed architecture therefore is an effective alternative for on-chip learning applications requiring low area cost, high classification success rate and high speed

FPGA Implementation for GHA-Based Texture Classification 179

Alpaydin, E. (2010). *Introduction to Machine Learning*, second ed., MIT Press, Massachusetts,

Bravo, I., Mazo, M., Lazaro, J.L., Gardel, A., Jimenez, P., and Pizarro, D. (2010). An Intelligent

Carvajal, G., Valenzuela, W., Figueroa, M. (2007). Subspace-Based Face Recognition in Analog

Carvajal, G., Valenzuela, W., and Figueroa M. (2009). Image Recognition in Analog VLSI with

Chen, D., and Han, J.-Q. (2009). An FPGA-based face recognition using combined 5/3 DWT with PCA methods. *Journal of Communication and Computer*, Vol. 6, pp.1 - 8. Chengcui, Z., Xin, C., and Wei-bang, C. (2006). A PCA-Based Vehicle Classification

Dogaru, R., Dogaru, I., and Glesner, M. (2004). CPCA: a multiplierless neural PCA.

El-Bakry, H.M. (2006). A New Implementation of PCA for Fast Face Detection. *International*

Gunter S., Schraudolph, N.N., and Vishwanathan, S.V.N. (2007). Fast Iterative Kernel Principal Component Analysis, *Journal of Machine Learning Research*, pp. 1893 - 1918.

Karhunen, J. and J. Joutsensalo (1995). Generalizations of principal component analysis, optimization problems, and neural networks. *Neural Networks*, 8(4), pp. 549 - 562. Kim, K., Franz, M.O., and Scholkopf, B. (2005). Iterative kernel principal component analysis

Lin, S.-J., Hung, Y.-T., and Hwang, W.-J. (2011). Efficient hardware architecture based on

for image modeling, *IEEE Trans. Pattern Analysis and Machine Intelligence*, pp. 1351 -

generalized Hebbian algorithm for texture classification. *Neurocomputing* 74(17), pp.

*Journal of Intelligent Systems and Technologies*,1(2), pp. 145 - 153.

Haykin, S. (2009). *Neural Networks and Learning Machines*, third Ed., Pearson. Jolliffe, I.T. (2002). *Principal component Analysis*, second Ed., Springer, New York.

FPGA. *International Symposium on Communications and Information Technologies*, pp.

Architecture Based on FPGA Designed to Detect Moving Objects by Using Principal

VLSI. In: *Advances in Neural Information Processing Systems*, 20, pp. 225 - 232, MIT

On-Chip Learning. *Lecture Notes in Computer Science* (ICANN 2009), Vol.5768, pp. 428

Framework. Proceedings of the 22nd International Conference on Data Engineering

Proceedings of IEEE International Joint Conference on Neural Networks, vol.2684,

http://www.altera.com/products/devices/cyclone3/cy3-index.jsp Boonkumklao, W., Miyanaga, Y., and Dejhan, K. (2001). Flexible PCA architectura realized on

Altera Corporation (2010). *NIOS II Processor Reference Handbook ver 10.0*. http://www.altera.com/literature/lit-nio2.jsp

Component Analysis. *Sensors*. 10(10), pp. 9232 - 9251.

Altera Corporation (2010). *Cyclone III Device Handbook*.

computation.

**6. References**

USA.

590 - 593.


1366.

3248 - 3256.

Press, Cambridge.

Workshops, pp. 17.

pp. 2689 - 2692.


Table 2. Hardware resource consumption of the GHA architecture (Lin et al., 2011) for vector dimension *m* = 4 × 4.

Fig. 19. The CPU time of the NIOS-based SOPC system using the proposed architecture as the hardware accelerator for various numbers of training iterations with *p* = 4.

Figure 19 shows the CPU time of the NIOS-based SOPC system using the proposed architecture as the hardware accelerator for various numbers of training iterations with *p* = 4. The clock rate of NIOS CPU in the system is 50 MHz. The CPU time of the software counterpart also is depicted in the Figure 19 for comparison purpose. The software training is based on the general purpose 2.67-GHz Intel *i*7 CPU. It can be clearly observed from Figure 16 that the proposed hardware architecture attains high speedup over its software counterpart. In particular, when the number of training iterations reaches 1800, the CPU time of the proposed SOPC system is 1861.3 ms. By contrast, the CPU time of Intel *i*7 is 13860.3 ms. The speedup of the proposed architecture over its software counterpart therefore is 7.45.

#### **5. Concluding remarks**

Experimental results reveal that the proposed GHA architecture has superior speed performance over its software counterparts. In addition, the architecture is able to attain higher classification success rate for texture classification as compared with other existing GHA architectures. The architecture also has low area cost for PCA analysis with high vector dimension. The proposed architecture therefore is an effective alternative for on-chip learning applications requiring low area cost, high classification success rate and high speed computation.

#### **6. References**

14 Will-be-set-by-IN-TECH

Table 2. Hardware resource consumption of the GHA architecture (Lin et al., 2011) for vector

Fig. 19. The CPU time of the NIOS-based SOPC system using the proposed architecture as

Figure 19 shows the CPU time of the NIOS-based SOPC system using the proposed architecture as the hardware accelerator for various numbers of training iterations with *p* = 4. The clock rate of NIOS CPU in the system is 50 MHz. The CPU time of the software counterpart also is depicted in the Figure 19 for comparison purpose. The software training is based on the general purpose 2.67-GHz Intel *i*7 CPU. It can be clearly observed from Figure 16 that the proposed hardware architecture attains high speedup over its software counterpart. In particular, when the number of training iterations reaches 1800, the CPU time of the proposed SOPC system is 1861.3 ms. By contrast, the CPU time of Intel *i*7 is 13860.3 ms. The speedup of the proposed architecture over its software counterpart therefore is 7.45.

Experimental results reveal that the proposed GHA architecture has superior speed performance over its software counterparts. In addition, the architecture is able to attain higher classification success rate for texture classification as compared with other existing

the hardware accelerator for various numbers of training iterations with *p* = 4.

*p* LEs Memory Embedded

3 22850 0 204 4 31028 0 272 5 38261 0 340 6 45991 0 408 7 53724 0 476

dimension *m* = 4 × 4.

**5. Concluding remarks**

GHA in (Lin et al., 2011) with *m* = 4 × 4

Bits Multipliers

	- http://www.altera.com/products/devices/cyclone3/cy3-index.jsp

**10** 

Yaya Keho

*Côte d'Ivoire* 

**The Basics of Linear** 

**Principal Components Analysis** 

*Ecole Nationale Supérieure de Statistique et d'Economie Appliquée (ENSEA), Abidjan* 

When you have obtained measures on a large number of variables, there may exist redundancy in those variables. Redundancy means that some of the variables are correlated with one another, possibly because they are measuring the same "thing". Because of this redundancy, it should be possible to reduce the observed variables into a smaller number of variables. For example, if a group of variables are strongly correlated with one another, you do not need all of them in your analysis, but only one since you can predict the evolution of all the variables from that of one. This opens the central issue of how to select or build the representative variables of each group of correlated variables. The simplest way to do this is to keep one variable and discard all others, but this is not reasonable. Another alternative is to combine the variables in some way by taking perhaps a weighted average, as in the line of the well-known Human Development Indicator published by UNDP. However, such an approach calls the basic question of how to set the appropriate weights. If one has sufficient insight into the nature and magnitude of the interrelations among the variables, one might choose weights using one's individual judgment. Obviously, this introduces a certain amount of subjectivity into the analysis and may be questioned by practitioners. To overcome this shortcoming, another method is to let the data set uncover itself the relevant weights of variables. Principal Components Analysis (PCA) is a variable reduction method that can be used to achieve this goal. Technically this method delivers a relatively small set of synthetic variables called

principal components that account for most of the variance in the original dataset.

examples explaining in detail the steps of implementation of PCA in practice.

Introduced by Pearson (1901) and Hotelling (1933), Principal Components Analysis has become a popular data-processing and dimension-reduction technique, with numerous applications in engineering, biology, economy and social science. Today, PCA can be implemented through statistical software by students and professionals but it is often poorly understood. The goal of this Chapter is to dispel the magic behind this statistical tool. The Chapter presents the basic intuitions for how and why principal component analysis works, and provides guidelines regarding the interpretation of the results. The mathematics aspects will be limited. At the end of this Chapter, readers of all levels will be able to gain a better understanding of PCA as well as the when, the why and the how of applying this technique. They will be able to determine the number of meaningful components to retain from PCA, create factor scores and interpret the components. More emphasis will be placed on

**1. Introduction** 

