**3. Fault-tolerant radio coverage and connectivity**

This section presents our approach for fault-tolerant radio coverage and connectivity of wireless mesh networks in dynamic propagation environments. A premature version of this approach, considering only radio coverage, has been published in [22].

### **3.1. Fault-tolerance approach**

6 Will-be-set-by-IN-TECH

connection. The approach is to transform the planning problem into a linear optimization problem which is a combination of a set covering problem and a network flow problem. As a result, the backbone is a connected graph, but with no fault-tolerance. Another disadvantage is the intractability of the proposed approaches. For some inputs, the algorithm takes too much time for the result to be useful. This is because the underlying linear optimization problem is a binary integer problem which is well known for its NP-completeness. Paper [39] addresses this issue by a decomposition method, but the algorithm still runs about 22 hours for a network with 58 nodes. This is acceptable for the mentioned scenarios, but for network reconfiguration in automation scenarios a faster algorithm is required. Extending these algorithms to fault-tolerance would mean an additional increase in the complexity. Paper [41] considers the problem of coverage control in wireless sensor networks, including various aspects like activating/deactivating of the nodes, finding the coverage characteristics of a given network, and sensor node deployment. However, all considerations include only the aspect of last mile coverage, i.e. the sensing function of the nodes. They do not consider the problem of the backbone connectivity for communicating the sensed data to a central

Our approach is to extend the existing methods from infrastructure network planning to planning multi-hop wireless mesh networks with fault-tolerance aspects. Other papers about fault-tolerance in wireless multi-hop networks can benefit from our approach for generating a fault-tolerant topology. Papers considering fault-tolerant routing, for instance [4, 19, 27], have a prerequisite of biconnected backbone network, but do not address the base station planning problem. The base station planning problem has been little addressed so far because in most mobile ad-hoc and sensor network scenarios the number and position of the nodes are considered uncontrolled or hardly controlled. However, in automation scenarios the networks are typically planned to provide service in some predefined geographical area (e.g. production hall). This requires careful base station planning for ensuring high availability of the radio

The topology control problem is to configure a given an instance of a multi-hop network such that it is connected and a quality of service property is fulfilled. Depending on the configured parameter, these methods adjust the transmission power [6] or the time of activity and sleeping periods of the nodes [5]. Paper [6] presents an algorithm for distributed adjustment of the transmission powers of the nodes with the purpose of minimizing the interference and keeping the network topology connected with a high probability. Paper [5] presents a distributed protocol for topology management which determines the active and sleeping periods for the nodes in such a way that the network is connected, the energy consumption is minimized, and the data is delivered with real-time guarantees. Paper [40] considers the issue of data forwarding in industrial wireless sensor networks and the integration in a wired backbone. It proposes a chain-based communication protocol for real-time communication over multiple hops. It is common for all topology control protocols that they operate on some existing instance of a multi-hop network. For achieving the required quality of service property, these protocols require some topological properties of the network (like connectivity or k-connectivity). The difference is that our base station planning algorithm plans a given network to be deployed with the desired topological properties. In this way, our algorithm can be used in the first phase of planning the topological properties of the network. In a second phase a topology control algorithm can be used to additionally adjust the transmission powers

or active/sleep times of the nodes for achieving the required QoS property.

instance.

coverage.

We consider the goal of this chapter at a general abstraction level. It is to guarantee availability of the services (radio coverage and connectivity) of a system (wireless mesh network) which is exposed to dynamic external behavior (the dynamic propagation environment). The environmental dynamics is an external factor to the wireless network. It results from the changing surroundings of the wireless network.

For this general type of problem, a well-known method exists in the field of dependable computing. This is the *fault-tolerance* approach [3]. Fault-tolerance avoids service failures in the presence of faults. *Service failure*, or *failure*, is the inability of a system to perform a service according to the service specification. *Error* is a part of the system state which may lead to a subsequent service failure. A *fault* is the cause for an error. The fault-tolerant system design includes *fault model definition*, *error detection* and *system recovery*. The fault model definition identifies a set of faults, for which service failures do not occur. The error detection identifies errors in the system, caused by the faults. The system recovery transforms a system with errors to a system without errors. The idea is to detect errors and perform system recovery *before* the errors lead to failures. In this way, the fault-tolerance approach avoids failures if faults from the fault model occur. In this chapter we apply the fault-tolerance approach for guaranteeing availability of radio coverage and connectivity of wireless mesh networks in dynamic propagation environments.

#### **Fault model definition**

A fault in our system is the *environmental dynamics.* Environmental dynamics are changes of the radio attenuation properties of the environment (e.g. new obstacles, movement of obstacles, increased humidity). The *attenuation* describes the ability of the radio propagation environment to absorb and weaken the radio waves. An increased attenuation has a negative effect on radio coverage and connectivity. Regarding radio coverage, it reduces the radio signal strength at the service locations. This can lead to the fact that some service locations are not covered. The effect on connectivity is that some backbone links can be lost. This can disconnect the backbone network. If no measures are taken, the fault *environmental dynamics* can lead to service failures. A fault is the event of environmental dynamics which decreases the *ARSS* (Average Radio Signal Strength) up to a user-specified amount Δ*ARSS*.

#### **Fault-tolerant system design**

Our system design uses redundancy for tolerating the faults. Figure 3 shows the state machine of our fault-tolerant system. The figure shows the system states, their attributes and their entry actions. The initial state is the normal state. In addition to the *correct service*, the normal system state contains *redundancy* for compensating the faults at run-time. In this normal state the system performs *concurrent error detection*, meaning that the error detection takes place during the normal service delivery. In the *error* state the redundancy is lost due to a fault, but the service is correct because the initial redundancy has compensated the negative effects of

#### 8 Will-be-set-by-IN-TECH 210 Wireless Mesh Networks – Effi cient Link Scheduling, Channel Assignment and Network Planning Strategies Achieving Fault-Tolerant Network Topology in Wireless Mesh Networks <sup>9</sup>

the fault. In this state, the system performs system recovery. The system recovery restores the initial redundancy. In the following sections we will specify how we applied this concept to the services *radio coverage* and to *connectivity*. For each service we will define the correct service specification, the redundancy and the error. A failure for both services occurs when the service consumer (a mobile station) tries to use the service and the service is not correct. Our fault-tolerant system design avoids the failures.

Normal state Error

Error occurs

Recovery finishes

Connectivity

Wireless Mesh Network

For detecting connectivity errors we use a monitoring at the routing layer and a classic biconnectivity testing algorithm from graph theory [9]. This algorithm uses information about the backbone graph and determines whether it is biconnected or not. If the graph is not biconnected, then there is an error. The required information for biconnectivity testing are the edges (links) among the vertices (base stations) of the graph. In our scenario, this information is globally available at the management appliance. As a part of the routing protocol, the base stations monitor the backbone link states by exchanging control messages with other base stations [17]. The state of every link is determined by *two communication endpoints* (base stations). One of them sends control messages and the other one determines the link state based on a statistic on the received messages. The link state information is periodically updated and communicated, so the management appliance has an actual global view of the backbone network. Based on this global view, the management appliance performs

Biconnectivity testing

Error Error

Correct service, No redundancy System recovery

Achieving Fault-Tolerant Network Topology in Wireless Mesh Networks 211

System recovery

Base Station Planning

Reconfiguration instructions

Reconfiguration

New base stations

Correct service, Redundancy Error detection

**Figure 3.** The states of our fault-tolerant system

Initial state

Radio coverage

Error detection

measurements

*3.2.1. Connectivity error detection*

Radio Coverage Assessment

Link states Radio signal

**Figure 4.** The error detection and system recovery of our fault-tolerant system

#### *3.1.1. Radio coverage*

#### Correct service

Radio coverage is correct if every service is covered by at least one base station with a radio signal strength of at least *ARSSMin*.

#### Redundancy

In order to ensure correct radio coverage in case of faults, the normal system state uses radio signal strength redundancy. This means that every service location is covered by at least one base station with a radio signal strength of at least *ARSSRED*. *ARSSRED* is the value of the redundant radio signal strength needed for compensating the environmental dynamics during the error detection and system recovery (*ARSSRED* = *ARSSMin* + Δ*ARSS*).

#### Error

In the error state, the radio coverage is not as good as the radio coverage in the normal state, but the radio coverage is still correct. An error exists, if at some service location the *ARSS* is less than the redundancy value, but it exceeds the minimum threshold for correct coverage: *ARSSRED* > *ARSS* ≥ *ARSSMin*.

#### *3.1.2. Connectivity*

#### Correct service

Connectivity is correct if the backbone graph is connected.

#### Redundancy

In order to ensure correct connectivity in case of faults, the backbone graph is *biconnected* (2-connected). A graph is biconnected if any two vertices can be joined by two independent paths [9]. This backbone redundancy compensates for the loss of a backbone link as a result of a fault.

#### Error

In the error state, the backbone graph is not biconnected, but it is connected. The loss of biconnectivity can be caused by environmental dynamics leading to link loss. The loss of a link is not necessarily a connectivity error. It is an error only if it leads to loss of the biconnectivity.

#### **3.2. Error detection**

When faults occur and lead to errors, the errors have to be automatically detected by the system. Since we are considering two services, radio coverage and connectivity, we need methods for detecting radio coverage errors and connectivity errors. Figure 4 shows our methods for error detection and their integration in our fault-tolerant system design.

**Figure 3.** The states of our fault-tolerant system

8 Will-be-set-by-IN-TECH

the fault. In this state, the system performs system recovery. The system recovery restores the initial redundancy. In the following sections we will specify how we applied this concept to the services *radio coverage* and to *connectivity*. For each service we will define the correct service specification, the redundancy and the error. A failure for both services occurs when the service consumer (a mobile station) tries to use the service and the service is not correct.

Radio coverage is correct if every service is covered by at least one base station with a radio

In order to ensure correct radio coverage in case of faults, the normal system state uses radio signal strength redundancy. This means that every service location is covered by at least one base station with a radio signal strength of at least *ARSSRED*. *ARSSRED* is the value of the redundant radio signal strength needed for compensating the environmental dynamics

In the error state, the radio coverage is not as good as the radio coverage in the normal state, but the radio coverage is still correct. An error exists, if at some service location the *ARSS* is less than the redundancy value, but it exceeds the minimum threshold for correct coverage:

In order to ensure correct connectivity in case of faults, the backbone graph is *biconnected* (2-connected). A graph is biconnected if any two vertices can be joined by two independent paths [9]. This backbone redundancy compensates for the loss of a backbone link as a result

In the error state, the backbone graph is not biconnected, but it is connected. The loss of biconnectivity can be caused by environmental dynamics leading to link loss. The loss of a link is not necessarily a connectivity error. It is an error only if it leads to loss of the biconnectivity.

When faults occur and lead to errors, the errors have to be automatically detected by the system. Since we are considering two services, radio coverage and connectivity, we need methods for detecting radio coverage errors and connectivity errors. Figure 4 shows our

methods for error detection and their integration in our fault-tolerant system design.

during the error detection and system recovery (*ARSSRED* = *ARSSMin* + Δ*ARSS*).

Our fault-tolerant system design avoids the failures.

*3.1.1. Radio coverage* Correct service

Redundancy

Error

signal strength of at least *ARSSMin*.

*ARSSRED* > *ARSS* ≥ *ARSSMin*.

Connectivity is correct if the backbone graph is connected.

*3.1.2. Connectivity* Correct service

Redundancy

of a fault. Error

**3.2. Error detection**

**Figure 4.** The error detection and system recovery of our fault-tolerant system

#### *3.2.1. Connectivity error detection*

For detecting connectivity errors we use a monitoring at the routing layer and a classic biconnectivity testing algorithm from graph theory [9]. This algorithm uses information about the backbone graph and determines whether it is biconnected or not. If the graph is not biconnected, then there is an error. The required information for biconnectivity testing are the edges (links) among the vertices (base stations) of the graph. In our scenario, this information is globally available at the management appliance. As a part of the routing protocol, the base stations monitor the backbone link states by exchanging control messages with other base stations [17]. The state of every link is determined by *two communication endpoints* (base stations). One of them sends control messages and the other one determines the link state based on a statistic on the received messages. The link state information is periodically updated and communicated, so the management appliance has an actual global view of the backbone network. Based on this global view, the management appliance performs

10 Will-be-set-by-IN-TECH 212 Wireless Mesh Networks – Effi cient Link Scheduling, Channel Assignment and Network Planning Strategies Achieving Fault-Tolerant Network Topology in Wireless Mesh Networks <sup>11</sup>

biconnectivity testing. The fact that *every link state is determined by two communication endpoints* enables us to detect connectivity errors by *monitoring* at the routing layer. If the backbone link state information is not available globally, distributed biconnectivity testing algorithms can be used (e.g. [34]).

state model, the optimization approach and the graph consolidation approach. This algorithm is published in [25]; in addition this section describes the integration with the presented

Achieving Fault-Tolerant Network Topology in Wireless Mesh Networks 213

The problem of the base station planning algorithm is to find a minimum number of base stations to be installed which transform a wireless mesh network with radio coverage errors and/or connectivity errors to a system without errors. The existing algorithms for this type of problem in wireless mesh networks are computationally intractable, or do not provide the required fault-tolerance (see section 2.2 for a discussion). The following input information is

• Service location information. This is information about the service locations which have to

• Candidate sites information. This is information about possible locations of the base stations. The candidate sites and the service locations are specified by the deployment

• Radio coverage information. This information is obtained from the radio propagation model. This is for every service location, the candidate sites which cover this service

• Connectivity information: for every candidate site, the candidate sites which have a link in the backbone network, if base stations were installed at all candidate sites. For this purpose, we use our calibrated radio propagation model and a link state model (section

The base station planning algorithm has to determine the number and positions of base

• The radio coverage and the connectivity enter the normal state. The normal state includes

• The algorithm should provide an acceptable relation between base stations minimality and running time. The running time of the algorithm should be appropriate for error detection

The challenge of the defined problem is the connectivity requirement. The coverage requirement can be formally defined as a local property which depends only on the considered entities (e.g. a base station covers a service location). For the connectivity, the requirement is global. It includes all network paths among all pair of base stations. The existence of a path between two base stations depends not only on the considered base stations, but on the number and positions of all other base stations in the network. The fault-tolerance (biconnectivity) requirement increases the complexity of the problem. It has been shown that finding a minimum number of base stations for this type of problematic is an NP-complete problem. For this reason, we are looking for an approach, having a good balance between

location, if base stations were installed at all candidate sites.

redundancy in the services which has been defined in section 3.

and system recovery in a dynamic propagation environment.

• The currently installed base stations and their positions

fault-tolerance framework.

**4.1. Problem definition**

be covered.

staff.

4.3).

stations to be installed such that:

minimality and running time.

given to the base station planning algorithm:

#### *3.2.2. Radio coverage error detection*

The information required for radio coverage error detection is the radio signal strength *at every service location*. However, a communication endpoint at every service location does not exist. Therefore, radio coverage errors can not be detected by monitoring, as with the connectivity errors. Nevertheless, a method for detecting these errors is needed because the environmental dynamics affect the radio coverage. The radio coverage should be guaranteed for every service location *before* a mobile station moves to those locations.

Our approach is to use a model-based assessment for detecting radio coverage errors at the physical layer. We use a radio propagation model for assessing the radio signal strength at every service location. This model has a tight relation to the propagation environment. We use measurements from the wireless network for calibrating the model to the reality.

In the state-of-the art assessment approaches the radio propagation models are static; meaning that they do not reflect the dynamics of the environment. The innovation of our approach is that the radio propagation model *automatically calibrates* to the real environment. *Radio model calibration* is the process of adjusting the model-parameters in such a way that the model reflects better a set of measurements from the actual propagation environment. *Radio coverage assessment* is the model-based estimation of the radio signal strength for the purpose of error detection. The radio model calibration method is out of scope of this chapter, but the reader can find a detailed description in [20, 23, 24].

#### **3.3. System recovery**

The system recovery transforms a system with errors to a system without errors. In our approach we use the same mechanism for recovery from radio coverage errors and for recovery from connectivity errors. This mechanism adds new base stations to the network. The new base stations improve the radio coverage by increasing the radio signal strength at the service locations. The new base stations also improve the connectivity by adding new links to the backbone network. Given a wireless mesh network with radio coverage and/or connectivity errors we have to decide how many base stations there is to install and and where to install them in order to correct the errors. For this purpose, we have developed an automatic base station planning algorithm (see section 4.

The error recovery includes automatic base station planning and manual reconfiguration (see figure 4). The management appliance runs the base station planning algorithm and gives instructions to the operating staff for the reconfiguration. The operating staff performs the reconfiguration which restores the redundancy of the services.

### **4. Automatic base station planning**

This section describes our algorithm for automatic base station planning. It starts with a problem definition for the base station planning, followed by an overview of our approach in section 4.2. The following sections define the details of the algorithm, namely the used link state model, the optimization approach and the graph consolidation approach. This algorithm is published in [25]; in addition this section describes the integration with the presented fault-tolerance framework.
