**C(W) Algorithm (Multiplicative Version)**

*Inputs*: The graph G(N, E), cost vector c(e), e ∈ E, initial Wo value for W.

Edges are ordered so that the costs satisfy c(e(1)) ≤ c(e(2)) ≤ … ≤ c(e(|E|)).

Set W = Wo and sLast = |E|

#### **Begin Outer Loop**

While W < Large

*Initialization(A)*. Set Wnext = Large, K = {1, …, n}, and for each k ∈ K let L(k) = k,

N<sup>k</sup> = {k}, E<sup>k</sup> = ∅, and MinCost(k) = Large.

*Initialization(B)*. Let i′ = p(1) and i″ = q(1) and select e(1) (= (i′, i″)), to create the first nondegenerate cluster (containing more than one node and hence more than 0 edges) by identifying k′ = L(i′) and k″ = L(i″) and absorbing Nk″ into Nk′ to create the cluster Nk′: = Nk′∪ Nk″ = {i′, i″} with edge set Ek′ = e(1). Set MinCost(k′) = c(e(1)) and conclude by eliminating the superfluous cluster Nk″ (now contained within Nk′) by setting K: = K \ {k″}. Finally, initialize the edge index s by setting s = 1.

We employ the customary convention that a loop of the form "While x < Constant" will be bypassed if the beginning value of x does not satisfy "x < Constant" and that the execution of the loop will not be interrupted if x is changed so that x ≥ Constant within the loop (though the execution will then terminate at the loop's conclusion). Hence, for example, in the Inner Loop when s: = s + 1 results in s = sLast, the loop will continue its execution until the current

A Class of Parametric Tree-Based Clustering Methods http://dx.doi.org/10.5772/intechopen.76406 139

Remark 1: The multiplicative version of the C(W) Algorithm results by modifying Case (2) to replace W + MinCost0 by W∙MinCost0 and to replace Wnext = Min(Wnext, c(e(s)) – MinCost0) by Wnext = Min(Wnext, c(e(s))/MinCost0). (Hence, addition is replaced by multiplication and subtraction is replaced by division.) These approaches will generate the same collection of clusters under the assumption that all c(e) > 0 for the following reason: a positive value W′ can always be found for the multiplicative case that will cause Wnext to screen out the same set of elements as any positive value W for the additive case, and vice versa. This relationship can

Remark 2: The assignment W = Wnext at the end of the outer loop can be replaced by setting W:= Wnext + Δ for a chosen increment Δ to generate only a subset of the possible C(W) collections. Experimentation with a given class of cluster applications may additionally lead to identifying upper and lower bounds on W (or specific intervals for W) that prove most effec-

Remark 3: To reduce the updating effort of Case (3), the indexes i′ = p(s) and i″ = q(s) can be interchanged (hence also interchanging k′ and k″) to assure that |Nq(s)| ≤ |Np(s)|. (More

Remark 4: The justification of terminating the outer loop of the algorithm when W = Large (after setting W = Wnextat the conclusion of the inner loop) derives from the observation that Wnext = Large implies the condition c(e(s)) > W + MinCost0 is never satisfied in Case (2). (When this terminating condition occurs in a connected graph, the method will have generated a min cost spanning tree.) Moreover, if the algorithm is repeated for W = Large, the same

Remark 5: When Wo = 0 (or Wo = 1 for the multiplicative case), each resulting node-disjoint

Remark 6: In a complete graph, the algorithm will leave at most one node isolated (with

complete or not connected, no node that is not isolated in G will be left isolated in the collection C(W) for W sufficiently large. (To permit additional isolated nodes, a limit clim may be

Remark 7: When there are tied (duplicate) cost values c(e), all orderings of e(1) to e(|E|) satisfying c(e(1)) ≤ c(e(2)) ≤ … ≤ c(e(|E|)) will produce the same collection of clusters C(W) in the following sense: For a given value of W, all orderings will produce the same node sets N<sup>k</sup>

imposed that prevents C(W) from including any edges e such that c(e) > clim.)

defining C(W), and the sum of costs over the edge sets E<sup>k</sup>

) in the collection C(W) consists of a tree in which the cost c(e) for all edges

= ∅) at the conclusion of the Inner Loop for any W. In a graph that is not

will also be the same, though the

comprehensive ways of reducing computation are identified in Sections 4 and 7.)

iteration ends.

tive for that class.

outcome will result.

, E<sup>k</sup>

edges within these sets may differ.

is the same.

subgraph (N<sup>k</sup>

N<sup>k</sup> = {k} and E<sup>k</sup>

e ∈ E<sup>k</sup>

We now make several observations about the algorithm.

also be extended to cover the situation where all c(e)are nonnegative.

#### **Begin Inner Loop**

While s < sLast

Set s: = s + 1 and identify edge e(s) = (p(s), q(s)). Let i′ = p(s), i″ = q(s) and let k′ = L(i′) and k″ = L(i″). There are three cases:


Endwhile.

// The node and edge sets for the collection of clusters C(W) for the current W are given.

// by N<sup>k</sup> and E<sup>k</sup> for k ∈ K. The node sets can alternatively be recovered by reference to.

// the values L(i), i = 1, …, n.

W = Wnext

Endwhile

#### **End of C(W) Algorithm**

<sup>1</sup> Case (3) generalizes Initialization(B).

We employ the customary convention that a loop of the form "While x < Constant" will be bypassed if the beginning value of x does not satisfy "x < Constant" and that the execution of the loop will not be interrupted if x is changed so that x ≥ Constant within the loop (though the execution will then terminate at the loop's conclusion). Hence, for example, in the Inner Loop when s: = s + 1 results in s = sLast, the loop will continue its execution until the current iteration ends.

We now make several observations about the algorithm.

Edges are ordered so that the costs satisfy c(e(1)) ≤ c(e(2)) ≤ … ≤ c(e(|E|)).

= ∅, and MinCost(k) = Large.

Finally, initialize the edge index s by setting s = 1.

and k″ = L(i″). There are three cases:

iteration of the Inner Loop.

Case (3) (If (1) and (2) do not apply)<sup>1</sup>

setting K : = K \ {k″}.

*Initialization(A)*. Set Wnext = Large, K = {1, …, n}, and for each k ∈ K let L(k) = k,

*Initialization(B)*. Let i′ = p(1) and i″ = q(1) and select e(1) (= (i′, i″)), to create the first nondegenerate cluster (containing more than one node and hence more than 0 edges) by identifying k′ = L(i′) and k″ = L(i″) and absorbing Nk″ into Nk′ to create the cluster Nk′: = Nk′∪ Nk″ = {i′, i″} with edge set Ek′ = e(1). Set MinCost(k′) = c(e(1)) and conclude by eliminating the superfluous cluster Nk″ (now contained within Nk′) by setting K: = K \ {k″}.

Set s: = s + 1 and identify edge e(s) = (p(s), q(s)). Let i′ = p(s), i″ = q(s) and let k′ = L(i′)

Case (1): If k′ = k″ (i′ and i″ belong to the same cluster), then continue to the next

Case (2): If c(e(s)) > W + MinCost0, for MinCost0 = Min(MinCost(k′), MinCost(k″)), then edge e(s) is forbidden to be added to join the clusters Nk′ and Nk″ into a single cluster. In this case, compute Wnext = Min(Wnext, c(e(s)) –MinCost0)

// The node and edge sets for the collection of clusters C(W) for the current W are given.

cluster Nk′ := Nk′∪Nk″ with its associated edge set Ek″: = Ek′∪ Ek″∪{e(s)}. Correspondingly, update L(i) by setting L(i) = k′for all i∈ Nk″, and set MinCost(k′) := Min(MinCost(k′), MinCost(k″), c(e(s)). Finally, eliminate the superfluous cluster Nk″(whose elements are now contained within Nk′) by

for k ∈ K. The node sets can alternatively be recovered by reference to.

: Absorb Nk″ into Nk′ to create the larger

and continue to the next iteration of the Inner Loop.

Set W = Wo and sLast = |E|

138 Recent Applications in Data Clustering

**Begin Outer Loop** While W < Large

N<sup>k</sup> = {k}, E<sup>k</sup>

**Begin Inner Loop** While s < sLast

Endwhile.

// by N<sup>k</sup>

Endwhile

1

W = Wnext

**End of C(W) Algorithm**

Case (3) generalizes Initialization(B).

and E<sup>k</sup>

// the values L(i), i = 1, …, n.

Remark 1: The multiplicative version of the C(W) Algorithm results by modifying Case (2) to replace W + MinCost0 by W∙MinCost0 and to replace Wnext = Min(Wnext, c(e(s)) – MinCost0) by Wnext = Min(Wnext, c(e(s))/MinCost0). (Hence, addition is replaced by multiplication and subtraction is replaced by division.) These approaches will generate the same collection of clusters under the assumption that all c(e) > 0 for the following reason: a positive value W′ can always be found for the multiplicative case that will cause Wnext to screen out the same set of elements as any positive value W for the additive case, and vice versa. This relationship can also be extended to cover the situation where all c(e)are nonnegative.

Remark 2: The assignment W = Wnext at the end of the outer loop can be replaced by setting W:= Wnext + Δ for a chosen increment Δ to generate only a subset of the possible C(W) collections. Experimentation with a given class of cluster applications may additionally lead to identifying upper and lower bounds on W (or specific intervals for W) that prove most effective for that class.

Remark 3: To reduce the updating effort of Case (3), the indexes i′ = p(s) and i″ = q(s) can be interchanged (hence also interchanging k′ and k″) to assure that |Nq(s)| ≤ |Np(s)|. (More comprehensive ways of reducing computation are identified in Sections 4 and 7.)

Remark 4: The justification of terminating the outer loop of the algorithm when W = Large (after setting W = Wnextat the conclusion of the inner loop) derives from the observation that Wnext = Large implies the condition c(e(s)) > W + MinCost0 is never satisfied in Case (2). (When this terminating condition occurs in a connected graph, the method will have generated a min cost spanning tree.) Moreover, if the algorithm is repeated for W = Large, the same outcome will result.

Remark 5: When Wo = 0 (or Wo = 1 for the multiplicative case), each resulting node-disjoint subgraph (N<sup>k</sup> , E<sup>k</sup> ) in the collection C(W) consists of a tree in which the cost c(e) for all edges e ∈ E<sup>k</sup> is the same.

Remark 6: In a complete graph, the algorithm will leave at most one node isolated (with N<sup>k</sup> = {k} and E<sup>k</sup> = ∅) at the conclusion of the Inner Loop for any W. In a graph that is not complete or not connected, no node that is not isolated in G will be left isolated in the collection C(W) for W sufficiently large. (To permit additional isolated nodes, a limit clim may be imposed that prevents C(W) from including any edges e such that c(e) > clim.)

Remark 7: When there are tied (duplicate) cost values c(e), all orderings of e(1) to e(|E|) satisfying c(e(1)) ≤ c(e(2)) ≤ … ≤ c(e(|E|)) will produce the same collection of clusters C(W) in the following sense: For a given value of W, all orderings will produce the same node sets N<sup>k</sup> defining C(W), and the sum of costs over the edge sets E<sup>k</sup> will also be the same, though the edges within these sets may differ.
