**5. Neuro–EM and neuro-k mean clustering approach for VLSI design partitioning**

This section is focused in use of clustering methods k-means (J. B. MacQueen, 1967) and Expectation-Maximization (EM) methodology (Kaban & Girolami, 2000).

#### **5.1 Neuro-EM model**

The system consists of three parts each dealing with data extraction, Learning stage and recognition stage. In data extraction, a circuit is bipartite and partitions it into 10 clusters, a user-defined value, by using K-means (J. B. MacQueen, 1967) and EM methodology (Kaban & Girolami, 2000), respectively. In recognition stage the parameters, that is, centroid and probability are fed into generalized delta rule algorithm separately and train the network to recognize sub-circuits with lowest amount of interconnections between them. Block diagram of model to recognize sub-circuits with lowest amount of interconnections between them using two techniques K-means and EM methodology with neural network are shown in Fig.6 and Fig.7.

In recognition stage the parameters, that is, centroid and probability are fed into generalized delta rule algorithm separately and train the network to recognize sub circuit with minimum interconnection between them. Block diagram of model for Partitioning a Circuit are depicted in Fig. 8.

Algorithms for CAD Tools VLSI Design 141

Sub circuit 1 A, B, C total edges = 7, Sub circuit 2 D, E, F total edges =10

Cell No of edges Bipartition

A 2 1

B 2 1

C 3 1

D 3 0

E 3 0

F 4 0

The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin in the Journal of the Royal Statistical Society (Arthur et al.,1997). They pointed out that method had been "proposed many times in special circumstances" by other authors, but the 1977 paper generalized the method and

Fig. 10. Bipartition Circuit

Table 1. Bipartation Matrix

Sub circuit2 (D, E, F) 0011 0011 0100

developed the theory behind it.

**5.3 Expectation Maximization algorithms** 

Data representation: Sub circuit1 (A, B, C) 0010 0010 0011,

Fig. 6. Block diagram of K-means with neural network

Fig. 7. Block diagram of EM methodology with neural network


Fig. 8. Block Diagram of Model for Partitioning a Circuit

### **5.2 Sample data set**

A sample example representation is shown in Fig.9 and Fig 10

Fig. 9. Sample Circuit

**Classifier Neural Network** 

> **Classifier Neural Network**

**Recognition Result** 

**Recognition Result** 

Fig. 6. Block diagram of K-means with neural network

1. Circuit is bipartite and data represented

2. Applying K-means 3. Applying EM methodology 4. centroid and probability 5. Neural network 6. Recognition Result

**A circuit is bipartite** 

> **A circuit is bipartite**

**5.2 Sample data set** 

Fig. 9. Sample Circuit

Fig. 7. Block diagram of EM methodology with neural network

**Probability extracted by EM clustering method** 

**Centroid extracted by K-means clustering method**

Fig. 8. Block Diagram of Model for Partitioning a Circuit

A sample example representation is shown in Fig.9 and Fig 10

Fig. 10. Bipartition Circuit Sub circuit 1 A, B, C total edges = 7, Sub circuit 2 D, E, F total edges =10


Table 1. Bipartation Matrix Data representation: Sub circuit1 (A, B, C) 0010 0010 0011, Sub circuit2 (D, E, F) 0011 0011 0100

#### **5.3 Expectation Maximization algorithms**

The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin in the Journal of the Royal Statistical Society (Arthur et al.,1997). They pointed out that method had been "proposed many times in special circumstances" by other authors, but the 1977 paper generalized the method and developed the theory behind it.

Algorithms for CAD Tools VLSI Design 143

iii. Classification probabilities instead of classifications. The results of EM clustering are different from those computed by k-means clustering. The latter will assign observations to clusters to maximize the distances between clusters. The EM algorithm does not compute actual assignments of observations to clusters, but classification probabilities. In other words, each observation belongs to each cluster with a certain probability. Of course, as a final result one can usually review an actual assignment of observations to

clusters, based on the (largest) classification probability (Gyllenberg et al., 2000).

The algorithm is similar to the K-means procedure in that a set of parameters are recomputed until a desired convergence value is achieved. The parameters are recomputed until a desired convergence value is achieved. The finite mixtures model assumes all

A mixture is a set of *N* probability distributions where each distribution represents a cluster. An individual instance is assigned a probability that it would have a certain set of attribute values given it was a member of a specific cluster. In the simplest case N=2 the probability distributes are assumed to be normal and data instances consist of a single real-valued attribute. Using the scenario, the job of the algorithm is to determine the value of five

2. Use the probability density function for a normal distribution to compute the cluster probability for each instance. In the case of a single independent variable with mean

<sup>1</sup> ( )

*f x*

1. Use the probability scores to re-estimate the five parameters.

(2 )

In the two-cluster case, there are two probability distribution formulae each having differing

The algorithm terminates when a formula that measures cluster quality no longer shows significant increases. One measure of cluster quality is the likelihood that the data came from the dataset determined by the clustering. The likelihood computation is simply the multiplication of the sum of the probabilities for each of the instances. With two clusters *A*

.5 ( | ) .5 ( | ) .5 ( | ) .5 ( | ) ... .5 ( | ) .5 ( | ) *Px A Px B Px A Px B Px A Px B* 1 12 2 *n n* (11)

and *B* containing instances x1, x2, … xn where PA = PB = 0.5 the computation is:

*e*

2 2 (? ) 2

(10)

3. The sampling probability *P* for cluster 1 (the probability for cluster 2 is 1-*P*)

clusters (Kim, 2002).

parameters specifically,

attributes to be independent random variables.

1. The mean and standard deviation for cluster 1 2. The mean and standard deviation for cluster 2

1. Guess initial values for the five parameters.

and standard deviation , the formula is:

the general procedure is given below,

mean and standard deviation values.

2. Return to Step 2

refined (adjusted) to maximize the likelihood of the data given the specified number of

The EM algorithm for clustering is described in detail in Witten and Frank (2001) (Witten & Frank, 2005). The Expectation-Maximization (EM) algorithm is part of the Weka clustering package. EM is a statistical model that makes use of the finite Gaussian mixtures model. The basic approach and logic of this clustering method is as follows. Suppose a single continuous variable in a large sample of observations is measured. Further, suppose that the sample consists of two clusters of observations with different means (and perhaps different standard deviations) within each sample, the distribution of values for the continuous variable follows the normal distribution. The resulting distribution of values (in the population) may look as shown in Fig.11.

Fig. 11. Two normal distributions of EM Algorithm (Screen Shot)

i. Mixtures of distributions. The illustration in Fig 5.12 shows two normal distributions with different means and different standard deviations and the sum of the two distributions. Only the mixture (sum) of the two normal distributions (with different means and standard deviations) would be observed. The goal of EM clustering is to estimate the means and standard deviations for each cluster so as to maximize the likelihood of the observed data (distribution). Put another way, the EM algorithm attempts to approximate the observed distributions of values based on mixtures of different distributions in different clusters.

With the implementation of the EM algorithm in some computer programs, one may be able to select (for continuous variables) different distributions such as the normal, lognormal, and Poisson distributions (Karlis, 2003) and can select different distributions for different variables, thus derive clusters for mixtures of different types of distributions.

ii. Categorical variables. The EM algorithm can also accommodate categorical variables. The method will at first randomly assign different probabilities (weights, to be precise) to each class or category, for each cluster. In successive iterations, these probabilities are

The EM algorithm for clustering is described in detail in Witten and Frank (2001) (Witten & Frank, 2005). The Expectation-Maximization (EM) algorithm is part of the Weka clustering package. EM is a statistical model that makes use of the finite Gaussian mixtures model. The basic approach and logic of this clustering method is as follows. Suppose a single continuous variable in a large sample of observations is measured. Further, suppose that the sample consists of two clusters of observations with different means (and perhaps different standard deviations) within each sample, the distribution of values for the continuous variable follows the normal distribution. The resulting distribution of values (in the

population) may look as shown in Fig.11.

Fig. 11. Two normal distributions of EM Algorithm (Screen Shot)

different distributions in different clusters.

i. Mixtures of distributions. The illustration in Fig 5.12 shows two normal distributions with different means and different standard deviations and the sum of the two distributions. Only the mixture (sum) of the two normal distributions (with different means and standard deviations) would be observed. The goal of EM clustering is to estimate the means and standard deviations for each cluster so as to maximize the likelihood of the observed data (distribution). Put another way, the EM algorithm attempts to approximate the observed distributions of values based on mixtures of

With the implementation of the EM algorithm in some computer programs, one may be able to select (for continuous variables) different distributions such as the normal, lognormal, and Poisson distributions (Karlis, 2003) and can select different distributions for different variables, thus derive clusters for mixtures of different types of distributions. ii. Categorical variables. The EM algorithm can also accommodate categorical variables. The method will at first randomly assign different probabilities (weights, to be precise) to each class or category, for each cluster. In successive iterations, these probabilities are refined (adjusted) to maximize the likelihood of the data given the specified number of clusters (Kim, 2002).

iii. Classification probabilities instead of classifications. The results of EM clustering are different from those computed by k-means clustering. The latter will assign observations to clusters to maximize the distances between clusters. The EM algorithm does not compute actual assignments of observations to clusters, but classification probabilities. In other words, each observation belongs to each cluster with a certain probability. Of course, as a final result one can usually review an actual assignment of observations to clusters, based on the (largest) classification probability (Gyllenberg et al., 2000).

The algorithm is similar to the K-means procedure in that a set of parameters are recomputed until a desired convergence value is achieved. The parameters are recomputed until a desired convergence value is achieved. The finite mixtures model assumes all attributes to be independent random variables.

A mixture is a set of *N* probability distributions where each distribution represents a cluster. An individual instance is assigned a probability that it would have a certain set of attribute values given it was a member of a specific cluster. In the simplest case N=2 the probability distributes are assumed to be normal and data instances consist of a single real-valued attribute. Using the scenario, the job of the algorithm is to determine the value of five parameters specifically,


the general procedure is given below,


$$f(\mathbf{x}) = \frac{1}{\sqrt{2\pi}\sigma\sigma\right)e^{\frac{-(\mathbf{?}-\mu)^2}{2\sigma^2}}}\tag{10}$$

In the two-cluster case, there are two probability distribution formulae each having differing mean and standard deviation values.


The algorithm terminates when a formula that measures cluster quality no longer shows significant increases. One measure of cluster quality is the likelihood that the data came from the dataset determined by the clustering. The likelihood computation is simply the multiplication of the sum of the probabilities for each of the instances. With two clusters *A* and *B* containing instances x1, x2, … xn where PA = PB = 0.5 the computation is:

$$\left[\left.5P(\mathbf{x}\_{1}\,\middle|\,A) + \left.5P(\mathbf{x}\_{1}\,\middle|\,B)\right\|\right] \left[\left.5P(\mathbf{x}\_{2}\,\middle|\,A) + \left.5P(\mathbf{x}\_{2}\,\middle|\,B)\right] \dots \left[\left.5P(\mathbf{x}\_{n}\,\middle|\,A) + \left.5P(\mathbf{x}\_{n}\,\middle|\,B)\right]\right] \right.\tag{11}$$

Algorithms for CAD Tools VLSI Design 145

Step 3. Initialize the weight for network. Each weight should be set to a random value

Step 6. Compares the actual output with desired outputs and finds a measure of error.

Step 7. After comparison it finds in which direction (+ or -) to change each weight in order to

Step 8. Find the amount by which to change each weight. It applies the corrections to the weight and repeat all above steps with all training vectors until the error for all the vectors

**6. Evaluation of fuzzy ARTMAP with DBSCAN in VLSI partition application** 

This section describes a new model for partitioning a circuit using DBSCAN and fuzzy

The basic ART system is an unsupervised learning model. It typically consists of a comparison field and a recognition field composed of neurons, a vigilance parameter, and a reset module. The vigilance parameter has considerable influence on the system, higher vigilance produces highly detailed memories (many, fine-grained categories), while lower vigilance results in more general memories (fewer, more-general categories). The comparison field takes an input vector (a one-dimensional array of values) and transfers it to its best match in the recognition field. Its best match is the single neuron whose set of weights (weight vector) most closely matches the input vector. Each recognition field neuron outputs a negative signal (proportional to that neuron's quality of match to the input vector) to each of the other recognition field neurons and inhibits their output accordingly. In this way the recognition field exhibits lateral inhibition, allowing each neuron in it, to represent a category to which input vectors they are classified. After the input vector is classified, the reset module compares the strength of the recognition match to the vigilance parameter. If the vigilance threshold is met, training commences. Otherwise, if the match level does not meet the vigilance parameter, the firing recognition neuron is inhibited until a new input vector is applied. The training commences only upon completion of a search procedure. In the search procedure, recognition neurons are disabled one by one, by the reset function until the vigilance parameter is satisfied by a recognition match. If no

<sup>1</sup> xJh=g( xpn)= 1 ( xpn) *wjkh e wjkh* (12)

<sup>1</sup> xio=g( xjh)= 1( xjh) *wjko e wijo* (13)

between –0.1 to 1.

reduce error.

Step 9: End.

ARTMAP neural network.

**6.1 Overview of art map** 

Step 4. Calculates activation of hidden nodes

Step 5. Calculate the output from output layers

in training set is reduced to an acceptable value.

Algorithm is similar to K-mean procedure, in that sets of parameters are re-computed until desired convergence value is achieved. General procedure is


The tool shed output of this algorithm would be the probability for each cluster. EM assigns a probability distribution to each instance, which indicates the probability of it belonging to each of the clusters.

In the context of recognizing the sub circuit from circuit with minimum interconnections between them, artificial neurons is structured into three normal types of layers input, hidden and output which can create artificial neural networks. The layers of input neuron are responsible for inputting a feature vectors that is, centroid and probability, which are extracted from K-means and EM algorithms respectively. The number of neurons in this output layer is determined by size of set of desired output, with each possible output being represented by separate neuron. Between these two layers there can be many hidden layers. These internal layers contain many of the neuron in various interconnected structures.
