6. Genetic clustering

Figure 6. Flow chart of the C-FCM algorithm.

26 Recent Applications in Data Clustering

Figure 7. C-FCM algorithm results. (a)Original image; (b)–(i). C-FCM Cluster No:1–8.

Genetic algorithm is very popular method in evolutionary computation processes. This method is firstly developed by Holland in 1975 [18]. This algorithm includes natural evolutionary processes. This method optimizes a population of the structure by using a set of evolutionary operators.

This method maintains a population of structures and these structures consisting of individuals. Each individual is evaluated by a function named as fitness function. These processes includes selection, recombination and mutation processes.

In genetic algorithms (GAs), each individual represents a candidate solution in binary form. This individual called as chromosome. After an initial population is generated, randomly crossover and mutation processes are executed for each iteration step.

For genetic algorithm examination, the following terms are useful for describing the concept of genetic algorithms. These are gene, chromosome, population (mass), reproduction process, and conformity value.

Gen is a unit that carries partial information. By bringing together these units, the chromosomal sequence that forms the solution cluster comes into play; for this reason, the genome decides well how to code it.

Chromosomes are structures that contain the information of the problem solving. Population is formed by the combination of chromosomes. At the initiative of the designer, what information is to be found on the chromosome.

The population is called the heap of possible solutions. In the GA process, the population size remains constant, but the bad chromosomes separate from the stack. The size of the heap is a very important concept, which must be well established, as the overcrowded heap increases the time of possible heuristic approach, while the small heap may cause no possible solution at all.

The reproduction process is the process of selecting the sequences to be transferred from the current stack to the next stack. The sequences carried are genetically the most appropriate sequences. The requirement for transition is whether the level of conformity specified has been achieved.

The fitness value, in genetic algorithms, which specifies which index will transfer the next generation, which index will be lost. Conformity value reflects the purpose of the problem.

Bandyopadhyay and Maulik [19] attempted to use GA to automatically determine the number of clusters K in 2002. The GA clustering aims assigning the data into k th cluster using genetic processes.

Showing the basic concept of the GA-based clustering, genetic guided algorithm [20] is used. Fitness value is represented in fuzzy clusters as shown in equivalent (13);

$$j(M) = \sum\_{j=1}^{N} \left( \sum\_{i=1}^{K} D\_{ij}^{1/(1-m)} \right)^{1-m} \tag{13}$$

where Diji∈½ � 1;K and j∈ ½ � 1; N . Dij represent the distance between the i th cluster prototype vector and the data object of the number of j. m represents the fuzzification coefficient. Calculation process of the genetic algorithm is shown in Figure 8.

In this genetic parameter optimized algorithm, firstly P individuals are initialized, and each individual represents Kxd prototype matrix encoded as gray codes. After this prototype matrix representation, fitness values are calculated. After this calculated fitness value, tournament selection is used for parental member reproduction. For generating new parents, twopoint crossover and bitwise mutation are done. For these new individuals, the highest fitness values are obtained using this fitness equation. All these processes are applied until termination condition (Dij < δ) is satisfied.

Chromosomes are randomly selected, and best parent are selected in tournament selection process. After tournament selection process, two-point crosover process and bitwise mutuation are done, respectively. These processes are shown in Figures 9 and 10, respectively.

Figure 9 represents crossover process. In two-point crossover process, bitwise crossover points are determined. After determination process, crossover process will be done.

There are some crossover processes are used in different works [21–23]. These types are 1-point crossover, K-point crossover, shuffle crosover, and so on [24].

Mutation process helps the genetically variations from parent chromosomes to child chromosomes. There are many mutation operators defined in the literature. These are bit-flip-mutation, swap-mutation and inversion-mutation [25]. In mutation process, mutation points are given. In this example, three mutation points are determined shown in Figure 10. After mutation process, mutation points will be changed. If the mutation rate is high, the main


generation may be lost. In general terms, the rate of mutation in GA is between given 0.05 and

Partitional Clustering

29

http://dx.doi.org/10.5772/intechopen.75836

GA-based fuzzy clustering process in four clusters shown in Figure 11. GA is applied for the given original image shown in Figure 11a for four clusters. This image can be represented in

0.15% of the parent.

Figure 10. Mutation process.

Figure 9. Two-point crossover process.

Figure 9. Two-point crossover process.

Showing the basic concept of the GA-based clustering, genetic guided algorithm [20] is used.

X K

D<sup>1</sup>=ð Þ <sup>1</sup>�<sup>m</sup> ij !<sup>1</sup>�<sup>m</sup>

(13)

th cluster prototype

i¼1

vector and the data object of the number of j. m represents the fuzzification coefficient.

In this genetic parameter optimized algorithm, firstly P individuals are initialized, and each individual represents Kxd prototype matrix encoded as gray codes. After this prototype matrix representation, fitness values are calculated. After this calculated fitness value, tournament selection is used for parental member reproduction. For generating new parents, twopoint crossover and bitwise mutation are done. For these new individuals, the highest fitness values are obtained using this fitness equation. All these processes are applied until termina-

Chromosomes are randomly selected, and best parent are selected in tournament selection process. After tournament selection process, two-point crosover process and bitwise mutuation are done, respectively. These processes are shown in Figures 9 and 10, respectively. Figure 9 represents crossover process. In two-point crossover process, bitwise crossover points

There are some crossover processes are used in different works [21–23]. These types are 1-point

Mutation process helps the genetically variations from parent chromosomes to child chromosomes. There are many mutation operators defined in the literature. These are bit-flip-mutation, swap-mutation and inversion-mutation [25]. In mutation process, mutation points are given. In this example, three mutation points are determined shown in Figure 10. After mutation process, mutation points will be changed. If the mutation rate is high, the main

Fitness value is represented in fuzzy clusters as shown in equivalent (13);

j Mð Þ¼ <sup>X</sup> N

where Diji∈½ � 1;K and j∈ ½ � 1; N . Dij represent the distance between the i

are determined. After determination process, crossover process will be done.

crossover, K-point crossover, shuffle crosover, and so on [24].

Calculation process of the genetic algorithm is shown in Figure 8.

tion condition (Dij < δ) is satisfied.

28 Recent Applications in Data Clustering

Figure 8. Flow chart of the genetic clustering.

j¼1

Figure 10. Mutation process.

generation may be lost. In general terms, the rate of mutation in GA is between given 0.05 and 0.15% of the parent.

GA-based fuzzy clustering process in four clusters shown in Figure 11. GA is applied for the given original image shown in Figure 11a for four clusters. This image can be represented in

Another method for overcoming high computational requirements is the genetic optimization. For large datasets, genetic optimization should be used with fuzzy clustering. In this tech-

Partitional Clustering

31

http://dx.doi.org/10.5772/intechopen.75836

Clusters cannot be always sharply separated in some datasets. For this type of data sets, fuzzybased clustering overcomes this crisp clustering. Fuzzy clustering allows to find the nearest optimum cluster to assign into the clusters. This technique provides more information about

The most important point of the search techniques of the partitional clustering is the optimum parameter selection. Parameter selection is an optimization problem. Overcoming this optimization problem, parameter selection can be done by using genetic algorithms. Genetic algo-

In partitional clustering, determination of cluster size is important. This selection differs from the data sets to data sets. If data set includes more features to classify in a cluster, more clusters will be needed. This cluster number unfortunately is not known for many clustering problems. Generally experiences give the cluster number. Estimation of the cluster number is one of the

As a result, by using partitional clustering techniques, image understanding can be done by using the given techniques. For many applications such as biomedical image understanding, robotic applications and security applications, these techniques can be useful for pre-processing

This study was performed at Gazi University, Engineering Faculty, Electrical & Electronics Engineering Department. MATLAB® 2016b software platform was used for this study. The

I would like to thank my wife Hande Kutbay who has given me a lot of support in my life. I thank my daughter, Gökçe Kutbay, who makes life difficult but makes sense. Special thanks to my mother Cemile KUTBAY and my father Hür Uğur KUTBAY for being in my life. In

addition, I would like to thank Assoc. Prof. Dr. Fırat Hardalaç due to academic help.

study was performed in a computer with 12 GB RAM, Intel i5 processor at 2.6 Ghz.

nique, all individuals or pixels are genetically optimized so as to fuzzy sets.

rithms can be useful solution for very-large scale data sets.

the data structure.

major problems for validation.

of some algorithms.

Acknowledgements

Competing interests

Thanks

The author declares having no conflicts of interests.

Figure 11. Genetic algorithm results. (a) Original image, (b) Cluster No:1, (c) Cluster No:2, (d) Cluster No:3, and (e) Cluster No:4.

four clusters as in Figure 11b–e, respectively. This algorithm can only be applied for grayscaled images. Cluster-1 shows the roads for the given landscape, cluster-2 shows the green areas in the landscape, cluster-3 shows the sea for the given landscape and cluster-4 represents the air for the given landscape.
