**4.2. Advanced starting for successive W values**

**4. Fundamental relationships for accelerating the algorithm**

sLast = |E|, hence making it unnecessary to examine all edges of the graph.

unnecessary to update Wnext by reference to ko in the future.

consists of a single node k, and E<sup>k</sup>

**4.1. Early termination of the inner loop**

140 Recent Applications in Data Clustering

implies that N<sup>k</sup>

two possibilities arise:

currently added.

and MinCost(k′) will be unchanged;

MinCost(k(m)) at the end of the ordered list.

A number of key relationships hold for the C(W) Algorithm that make it possible to accelerate its execution. We discuss the relationships here in broad outline and then incorporate them in Section 7 within a template for a computer code that applies not only to C(W) but to additional related types of cluster collections C(Y) and C(Z) whose algorithms are described in Sections 5 and 6.

The Inner Loop can typically terminate far in advance of satisfying the condition s = sLast for

First note that the process of examining the edges in ascending cost order implies that once c(e(s)) > W + MinCost(k) for a given k = ko∈ K, then the inequality c(e(v)) > W + MinCost(ko) will also hold for all subsequent edges e(v) for v > s. Hence, by Case (2) of the algorithm, no nodes or edges will be adjoined to the cluster sets Nko and Eko for v > s. In addition, it will be

It may further be observed that the MinCost(k) values are generated in a sequence that makes it possible to readily identify (without sorting) the values k(1), k(2), …, k(m), so that Min Cost(k(1)) ≤ MinCost(k(2)) ≤ … ≤ MinCost(k(m)). It is convenient to define m so that these values refer just to those k ∈ K such that MinCost(k) < Large. (Recall that MinCost(k) = Large

Thus if c(e(s)) > W + MinCost(k(m)), we know that none of the clusters indexed from k(1) to k(m) can take part in the creation of new clusters. Alternatively, if we start by checking whether c(e(s)) > W + MinCost(k(h)) holds for h = 1 and work forward until finding the first index k(h\*) for which the inequality does not hold, then on future encounters with Case (2) it is possible to start from k(h\*) rather than k(1) to begin checking whether c(e(s)) > W + MinCost(k(h)).

In consideration of these relationships, it should be kept in mind that when two clusters k′ and k″ are joined, then MinCost(k″) will no longer be referenced (since the cluster k″ will no longer exist). To see the consequences of this, suppose that k′ and k″ are interchanged, if necessary, so that MinCost(k′) ≤ MinCost(k″). Then when Nk″ is absorbed into Nk′, the following

**i.** MinCost(k′) < Large (hence MinCost(k′) identifies the cost of an edge previously added)

**ii.** MinCost(k′) = Large, and the new MinCost(k′) will be the value c(e(s)) of the edge e(s)

This implies that in the sequence MinCost(k(1)) ≤ MinCost(k(2)) ≤ … ≤ MinCost(k(m)), the value MinCost(k″) will drop out, and the value MinCost(k′) will either be unchanged and retain its position, or else it will change from a Large value to become the new value

However, applying this knowledge to shortcut the checks performed in Case (2) does not make it possible to save appreciable computation, since the amount of effort to perform the

= ∅.)

We can also accelerate the computation of the algorithm by saving information to produce an advanced start on successive executions of the Inner Loop. The underlying relationships are as follows.

Let s2(1), s2(2), …, s2(v<sup>2</sup> ) denote the s indexes starting with s2(1) = 1 (when Wnext = Large) where the values s2(v) for v > 1 identify successive edges e(s) for which a new (smaller) value of Wnext is identified in Case (2). Also, starting with W(1) = Large, let W(1), W(2), …, W(v″) denote the corresponding values for Wnext identified at these points (hence, for v<sup>2</sup> > 1, W(1) > W(2) > … > W(v<sup>2</sup> )). Similarly, let s3(1), s3(2), …, s3(v<sup>3</sup> ) denote the s indexes starting with s3(1) = 1 where the values s3(v) for v > 1 identify successive edges e(s) that are added in Case (3) of the Inner Loop(to generate the current cluster collection C(W)).

**5. Algorithm C(Y): a node-based algorithmic variant**

related format and is supported by a similar rationale.

to costs associated with nodes in N<sup>k</sup>

k″ = L(i″) by MinCostB(i″)).

than that of the earlier edge.)

Set Y = Yoand sLast = |E|.

**Begin Outer Loop**

While Y < Large

**C(Y) Algorithm (Node-Based Version)**

algorithm.)

It is possible to formulate a node-based variant of the C(W) algorithm which follows a closely

In the node-based approach, we replace the parameter W by a parameter Y which is linked

precisely, the costs associated with nodes are also derived from edges—i.e., the edges that meet these nodes—though these edges are different from those referenced in the C(W)

Accompanying this parameter change, we replace the value MinCost(k) associated with the sets indexed by k ∈ K with a value MinCostB(i) associated with the nodes i∈ N, and more particularly, we replace MinCost(k′) for k′ = L(i′) by MinCostB(i′), and replace MinCost(k″) for

This replacement changes the updating rule when Nk″ is absorbed into Nk′ in Case (3). Specifically, the values MinCostB(i′) and MinCostB(i″)) are updated by setting MinCostB(i): = Min(MinCostB(i),c(e(s)) for i = i′ and i″, in contrast to the update involving MinCost(k′) and

The reason for these changes is as follows. In the node-based version, to permit the edge e(s) = (i′, i″) (= (p(s),q(s))) to be added and hence to join the subgraphs (Nk′, Ek′) and (Nk″, Ek″), we require that c(e(s)) ≤ Y + MinCostB(i) for both i = i′ and i″. Hence we require c(e(s)) ≤ Y + MinCostB0, for MinCostB0 = Min(MinCostB(i′), MinCostB(i″)). On the other hand, if c(e(s)) > Y + MinCostB0, we are prevented from adding edge e(s), and by the preceding relationships this causes the first

To update MinCostB(i′) and MinCostB(i″) in Case (3), we must account for the fact that each of these two values is affected only by the cost of the edge e(s), and hence will either retain its present value or become equal to c(e(s)), according to which is smaller. (It may be noted that once node i for i = i′ or i″ has been assigned an edge cost c(e(s)), then MinCostB(i) will not change thereafter, since any edge e(s) that is added later to meet node I will have a cost no less

Based on these observations, we can state the form of the C(Y) algorithm as follows.

MinCost(k″) (which setsMinCost(k′): = Min(MinCost(k′),MinCost(k″), c(e(s))).

part of Case (2) to retain exactly the same form as in the C(W) algorithm.

*Inputs*: The graph G(N, E), cost vector c(e), e ∈ E, initial Yo value for Y.

Edges are ordered so that the costs satisfy c(e(1)) ≤ c(e(2)) ≤ … ≤ c(e(|E|)).

rather than to costs associated with edges in E<sup>k</sup>

A Class of Parametric Tree-Based Clustering Methods http://dx.doi.org/10.5772/intechopen.76406

. (More

143

After completing the Inner Loop for W = Wo (while saving this information), upon assigning W the value Wnext = W(v<sup>2</sup> ), the fact that Wnext < W(v) for v < v<sup>2</sup> implies that the algorithm will perform exactly the same sequence of steps until reaching s = s2(v<sup>2</sup> ), at which point the edge e(s) for s = s2(v<sup>2</sup> ), will be added to the construction (although this edge was not added on the previous execution of the inner loop).

Consequently, all edges e(s3(v)) for s3(v) < s2(v2 ) will again be added to the current construction, and the values s2(v) for v < v2 will also be unchanged. Hence, letting v\* = Max(v: s3(v) < s2(v2 )) we can start the current construction by simply adding the edges e(s) for s = s3(1) to s3(v\*), followed by adding the edge e(s) for s = s2(v<sup>2</sup> ) (whose index s2(v<sup>2</sup> ) therefore becomes recorded as the new index s3(v\* + 1)). Then the customary Inner Loop for W > Wo can be executed starting with s initialized by setting s = s2(v<sup>2</sup> ) instead of s = 1. Subsequent executions of the Inner Loop continue to save the same information, which is used again to create an advanced start in the manner described.

By this means, we avoid examining all edges e(s) for s < s2(v<sup>2</sup> ) that were not added to the previous construction. We also avoid having to re-do the checks to determine that the remaining edges qualify to be added. Together this can amount to a considerable savings in computation.

A possibility arises to save additional computation by using more memory. Each time a new candidate for Wnext is identified, in the process of identifying the indexes s2(1), s2(2), …, s2(v2 ), we can save a current copy of the arrays N<sup>k</sup> , E<sup>k</sup> , MinCost(k) and K used by the algorithm, avoiding the burden of excessive memory by overwriting the previous copy each time a new one is made. Then the latest copy will be available at the point where the edge e(s) for s = s2(v<sup>2</sup> ) is added on the current execution of the loop, making it possible to recover the arrays without having to regenerate them to resume the current loop.

However, it may not be possible to take advantage of a current copy of the saved arrays on every iteration of the Inner Loop (unless previous copies are not overwritten when new ones are made). After re-starting by recovering the arrays for s2(v<sup>2</sup> ), if now a new Wnext value is determined for s > s2(v<sup>2</sup> ) (referring to the v2 of the previous execution) then we can proceed by again making a copy of the arrays for the next execution of the loop. But if it no new value of Wnext is found for s > s2(v<sup>2</sup> ), then the previous value W(v2 –1) (for s = s2(v<sup>2</sup> –1) will be the new final Wnext value, and no copies of the arrays remain in memory for this value.

Consequently, in this latter case we resort to the construction that does not rely on the copied arrays, generating the arrays instead in the process of adding edges. Thus, on the next execution of the inner loop we will again have the copies available. Hence in this fashion we will be able to take advantage of the copied arrays at least on every second execution of the loop, if not more frequently.

As previously noted, the foregoing relationships and their implications are embodied in a format suitable for creating a computer code in Section 7, after we first describe two additional algorithmic variants that can be exploited by analogous relationships.
