**2. Related work**

2 Will-be-set-by-IN-TECH

nodes which move within the service area and communicate to other stations via the WMN. The stations in a WMN use a *multi-hop routing protocol* for communication. This protocol automatically discovers the network topology and delivers the messages to the destination; if needed over multiple hops. We can think of a WMN as an infrastructure wireless network in which the backbone is replaced by a wireless one and the communication is done in a

We consider a wireless mesh network which supports a business process and is under the administration of an organization. This is not a MANET (Mobile Ad-hoc Network) consisting of self-dependent mobile nodes, like it is often in the literature. The organization has control over the network infrastructure and aims at providing radio coverage and connectivity in a clearly defined service area. The *management appliance* is a central instance for basic configuration and diagnosis of the WMN, including topology monitoring, protocol settings,

*Radio coverage* and *connectivity* are basic *services* of a wireless mesh network which are required for communication. Radio coverage ensures that the mobile stations can access the network infrastructure (backbone) while they are located or moving in the service area. Connectivity

The service *radio coverage* is *correct*, if the service area is *covered* by the base stations. The service area is covered, if the unification of radio cells of all base stations contains the whole service area. The radio cell of a base station is a part of the space around it, in which a mobile station observes the base station with a radio signal strength sufficient for communication. The sufficient radio signal strength in the service area is a basic requirement for the mobile stations to be able to access the WMN. The radio coverage service ensures this sufficient signal strength in the service area. Service location is a point of the service area, specified by its coordinates. A service location is covered, if the unification of radio cells of all base stations

The service *connectivity* is correct, if the backbone graph is connected. The *backbone graph* is a graph with the base stations as vertices and the routing layer links among them as edges. A *link* exists if two wireless devices can communicate through the wireless medium obeying some qualitative parameters (see section 4.3 for more information). The backbone graph represents the network topology at the routing layer. This graph is connected, if a *path* (a sequence of edges) exists between every two vertices. A connected backbone graph means a connected routing layer topology which is a basic requirement for communication through

At the example WMN in figure 1 the radio coverage and the connectivity are correct. The unification of radio cells contains the service area and the backbone graph is connected.

In this chapter, we address the problem of guaranteeing radio coverage of Wireless Mesh

the WMN. The connectivity service ensures that the backbone graph is connected.

(multi-hop) ad-hoc way.

traffic management, etc.

**1.1. Radio coverage**

contains the service location.

**1.3. Problem exposition and contributions**

Networks, which are exposed to environmental dynamics.

**1.2. Connectivity**

ensures that the topology of the backbone is connected.

Firstly, we will present related work aiming at availability of the radio coverage and connectivity. Then we will discuss related work to the automatic base station planning algorithm.

#### **2.1. Availability of the radio coverage**

The availability of the service *radio coverage* is a necessary condition for reliable communication in wireless networks. The issue of reliable communication via wireless medium has been extensively investigated during the design of every wireless communication system. Since the wireless medium is unshielded, the effect of the environment on the wireless communication is specific to the environment. Different methods have been developed for increasing the reliability of the communication through the wireless medium. Most of them are at the physical layer. For instance the robust modulation methods (e.g. MIMO), frequency hopping, spread spectrum transmission, redundancy in the antennas and redundancy of the transmitters. At the data link layer, error correction codes and retransmissions are typical measures. These methods mostly address the time-variability of the wireless channel caused by multi-path propagation. However, all these methods require some minimum radio signal strength at the receiver which is a basic requirement for decoding the frames successfully. Providing this minimum radio signal strength is a matter of network deployment and configuration in the particular environment.

kind. During the radio coverage repair the presence of a expert is required for troubleshooting

Achieving Fault-Tolerant Network Topology in Wireless Mesh Networks 207

For compensating the dynamics of the environment, the static method uses static radio signal strength redundancy (called fade margin). In communication systems design the term *fade margin* (or margin) is the amount of signal strength reserve. This is the power, added to the needed minimum level for reception of the frames at the receiver. The fade margin is configured during the planning phase via adequate selection of transmitters and antennas [8, 37]. The fade margin is used for compensating temporal variations in the environment. When the environment changes, the radio coverage eventually degrades. But if the redundancy is sufficient, the radio coverage is still correct and the applications are not affected. However, the radio coverage could have entered a critical state; meaning that further changes in the environment may lead to service failure. Since there are no automatic monitoring functions for the radio coverage, this state of lost redundancy is not detected, and remains in the system. In this state, the next change in the environment can lead to service

In the context of this chapter, we have high availability requirements. We have an environment which can change in unpredictable way during the network's life-cycle which is typically larger than 10-20 years. For this reason, it is hardly possible to plan sufficient static redundancy for all possible changes of the environment. They are not known at the deployment phase. Even if this would be possible, it would be extremely inefficient. Consequently, a new method is needed for guaranteeing radio coverage. When the factory-layout changes for adapting to a new market, the method should enable an easy adaption of the WMN and should guarantee high availability of the radio coverage and the

In this section we focus on the deployment and operation of the base stations which is an essential function for connectivity. For the routing protocol and the topology discovery we

Industrial automation networks have usually been isolated, single-cell networks or classic infrastructure networks with multiple cells. This means that base station planning is required only for the 'last mile', i.e. the connection between a base station and a mobile station, e.g. [8]. In the case of multi-hop wireless mesh networks, the planning of the backbone network is a new research aspect that needs to be considered. Research on radio network planning consider network throughput as a main planning goal, e.g. [7]. However, the most common requirement of industrial networks is availability. With the introduction of technologies for multi-hop communication in industrial environments (e.g. Zigbee, Wireless HART), the base station planning problem gains importance. Paper [37], for instance, presents the challenges for developing a planning tool for industrial wireless sensor networks. However, to the best of our knowledge, no systematic approach exists for planning multi-hop wireless networks

The existing algorithms for the base station planning in wireless mesh networks [2, 39] have a different goal. It is to design a mesh network with a minimum number of base stations such that the end-to-end throughput requirements of application flows are fulfilled. These requirements are typical for Internet access in areas with no alternative high-speed wired

with respect to fault-tolerance requirements of industrial automation networks.

and base station planning.

failures.

connectivity.

**2.2. Connectivity and base station planning**

base on the research within our working group (e.g. [15, 29, 32]).

The state-of-the-art method for ensuring radio coverage has a static nature (e.g. [8, 10, 35]). Figure 2 shows the general procedure of this method. The method ensures radio coverage

**Figure 2.** Static deployment method for radio coverage

during the network deployment before the network starts operation. Usually, an *expert* plans the base stations properties so that the requirements for the radio coverage are fulfilled. The expert makes this planning based on knowledge about the environment and the requirements. For this purpose, measurements in the particular environment are typically needed. Then, the base stations are installed. After the installation, a manual site survey is conducted with the purpose of proving that the requirements are satisfied. The site survey includes manual measurements of the radio signal strength on selected service locations in the whole area. If the requirements are not satisfied adjustments should be made. The adjustments are site-specific and may include removing obstacles, changing frequencies, or adding new equipment [10]. When the requirements are fulfilled, the wireless network enters the operational phase. In the operational phase, *there is no automatic function for monitoring and maintaining the radio coverage*. The only way to do this is by making a manual site survey which is expensive in terms of time and effort. The loss of radio coverage can only be detected by the mobile stations and the applications. The network connection is lost and no communication is possible. The repair of radio coverage is started when the applications report a problem of this kind. During the radio coverage repair the presence of a expert is required for troubleshooting and base station planning.

For compensating the dynamics of the environment, the static method uses static radio signal strength redundancy (called fade margin). In communication systems design the term *fade margin* (or margin) is the amount of signal strength reserve. This is the power, added to the needed minimum level for reception of the frames at the receiver. The fade margin is configured during the planning phase via adequate selection of transmitters and antennas [8, 37]. The fade margin is used for compensating temporal variations in the environment. When the environment changes, the radio coverage eventually degrades. But if the redundancy is sufficient, the radio coverage is still correct and the applications are not affected. However, the radio coverage could have entered a critical state; meaning that further changes in the environment may lead to service failure. Since there are no automatic monitoring functions for the radio coverage, this state of lost redundancy is not detected, and remains in the system. In this state, the next change in the environment can lead to service failures.

In the context of this chapter, we have high availability requirements. We have an environment which can change in unpredictable way during the network's life-cycle which is typically larger than 10-20 years. For this reason, it is hardly possible to plan sufficient static redundancy for all possible changes of the environment. They are not known at the deployment phase. Even if this would be possible, it would be extremely inefficient. Consequently, a new method is needed for guaranteeing radio coverage. When the factory-layout changes for adapting to a new market, the method should enable an easy adaption of the WMN and should guarantee high availability of the radio coverage and the connectivity.

#### **2.2. Connectivity and base station planning**

4 Will-be-set-by-IN-TECH

system. Since the wireless medium is unshielded, the effect of the environment on the wireless communication is specific to the environment. Different methods have been developed for increasing the reliability of the communication through the wireless medium. Most of them are at the physical layer. For instance the robust modulation methods (e.g. MIMO), frequency hopping, spread spectrum transmission, redundancy in the antennas and redundancy of the transmitters. At the data link layer, error correction codes and retransmissions are typical measures. These methods mostly address the time-variability of the wireless channel caused by multi-path propagation. However, all these methods require some minimum radio signal strength at the receiver which is a basic requirement for decoding the frames successfully. Providing this minimum radio signal strength is a matter of network deployment

The state-of-the-art method for ensuring radio coverage has a static nature (e.g. [8, 10, 35]). Figure 2 shows the general procedure of this method. The method ensures radio coverage

during the network deployment before the network starts operation. Usually, an *expert* plans the base stations properties so that the requirements for the radio coverage are fulfilled. The expert makes this planning based on knowledge about the environment and the requirements. For this purpose, measurements in the particular environment are typically needed. Then, the base stations are installed. After the installation, a manual site survey is conducted with the purpose of proving that the requirements are satisfied. The site survey includes manual measurements of the radio signal strength on selected service locations in the whole area. If the requirements are not satisfied adjustments should be made. The adjustments are site-specific and may include removing obstacles, changing frequencies, or adding new equipment [10]. When the requirements are fulfilled, the wireless network enters the operational phase. In the operational phase, *there is no automatic function for monitoring and maintaining the radio coverage*. The only way to do this is by making a manual site survey which is expensive in terms of time and effort. The loss of radio coverage can only be detected by the mobile stations and the applications. The network connection is lost and no communication is possible. The repair of radio coverage is started when the applications report a problem of this

{Service and application failure }

Measurements, troubleshooting

and configuration in the particular environment.

Installation

Base station planning

{Requirements}

Radio Coverage Assessment (Site survey)

> Operational phase

**Figure 2.** Static deployment method for radio coverage

In this section we focus on the deployment and operation of the base stations which is an essential function for connectivity. For the routing protocol and the topology discovery we base on the research within our working group (e.g. [15, 29, 32]).

Industrial automation networks have usually been isolated, single-cell networks or classic infrastructure networks with multiple cells. This means that base station planning is required only for the 'last mile', i.e. the connection between a base station and a mobile station, e.g. [8]. In the case of multi-hop wireless mesh networks, the planning of the backbone network is a new research aspect that needs to be considered. Research on radio network planning consider network throughput as a main planning goal, e.g. [7]. However, the most common requirement of industrial networks is availability. With the introduction of technologies for multi-hop communication in industrial environments (e.g. Zigbee, Wireless HART), the base station planning problem gains importance. Paper [37], for instance, presents the challenges for developing a planning tool for industrial wireless sensor networks. However, to the best of our knowledge, no systematic approach exists for planning multi-hop wireless networks with respect to fault-tolerance requirements of industrial automation networks.

The existing algorithms for the base station planning in wireless mesh networks [2, 39] have a different goal. It is to design a mesh network with a minimum number of base stations such that the end-to-end throughput requirements of application flows are fulfilled. These requirements are typical for Internet access in areas with no alternative high-speed wired

#### 6 Will-be-set-by-IN-TECH 208 Wireless Mesh Networks – Effi cient Link Scheduling, Channel Assignment and Network Planning Strategies Achieving Fault-Tolerant Network Topology in Wireless Mesh Networks <sup>7</sup>

connection. The approach is to transform the planning problem into a linear optimization problem which is a combination of a set covering problem and a network flow problem. As a result, the backbone is a connected graph, but with no fault-tolerance. Another disadvantage is the intractability of the proposed approaches. For some inputs, the algorithm takes too much time for the result to be useful. This is because the underlying linear optimization problem is a binary integer problem which is well known for its NP-completeness. Paper [39] addresses this issue by a decomposition method, but the algorithm still runs about 22 hours for a network with 58 nodes. This is acceptable for the mentioned scenarios, but for network reconfiguration in automation scenarios a faster algorithm is required. Extending these algorithms to fault-tolerance would mean an additional increase in the complexity. Paper [41] considers the problem of coverage control in wireless sensor networks, including various aspects like activating/deactivating of the nodes, finding the coverage characteristics of a given network, and sensor node deployment. However, all considerations include only the aspect of last mile coverage, i.e. the sensing function of the nodes. They do not consider the problem of the backbone connectivity for communicating the sensed data to a central instance.

**3. Fault-tolerant radio coverage and connectivity**

**3.1. Fault-tolerance approach**

dynamic propagation environments.

**Fault model definition**

**Fault-tolerant system design**

changing surroundings of the wireless network.

approach, considering only radio coverage, has been published in [22].

This section presents our approach for fault-tolerant radio coverage and connectivity of wireless mesh networks in dynamic propagation environments. A premature version of this

Achieving Fault-Tolerant Network Topology in Wireless Mesh Networks 209

We consider the goal of this chapter at a general abstraction level. It is to guarantee availability of the services (radio coverage and connectivity) of a system (wireless mesh network) which is exposed to dynamic external behavior (the dynamic propagation environment). The environmental dynamics is an external factor to the wireless network. It results from the

For this general type of problem, a well-known method exists in the field of dependable computing. This is the *fault-tolerance* approach [3]. Fault-tolerance avoids service failures in the presence of faults. *Service failure*, or *failure*, is the inability of a system to perform a service according to the service specification. *Error* is a part of the system state which may lead to a subsequent service failure. A *fault* is the cause for an error. The fault-tolerant system design includes *fault model definition*, *error detection* and *system recovery*. The fault model definition identifies a set of faults, for which service failures do not occur. The error detection identifies errors in the system, caused by the faults. The system recovery transforms a system with errors to a system without errors. The idea is to detect errors and perform system recovery *before* the errors lead to failures. In this way, the fault-tolerance approach avoids failures if faults from the fault model occur. In this chapter we apply the fault-tolerance approach for guaranteeing availability of radio coverage and connectivity of wireless mesh networks in

A fault in our system is the *environmental dynamics.* Environmental dynamics are changes of the radio attenuation properties of the environment (e.g. new obstacles, movement of obstacles, increased humidity). The *attenuation* describes the ability of the radio propagation environment to absorb and weaken the radio waves. An increased attenuation has a negative effect on radio coverage and connectivity. Regarding radio coverage, it reduces the radio signal strength at the service locations. This can lead to the fact that some service locations are not covered. The effect on connectivity is that some backbone links can be lost. This can disconnect the backbone network. If no measures are taken, the fault *environmental dynamics* can lead to service failures. A fault is the event of environmental dynamics which decreases

Our system design uses redundancy for tolerating the faults. Figure 3 shows the state machine of our fault-tolerant system. The figure shows the system states, their attributes and their entry actions. The initial state is the normal state. In addition to the *correct service*, the normal system state contains *redundancy* for compensating the faults at run-time. In this normal state the system performs *concurrent error detection*, meaning that the error detection takes place during the normal service delivery. In the *error* state the redundancy is lost due to a fault, but the service is correct because the initial redundancy has compensated the negative effects of

the *ARSS* (Average Radio Signal Strength) up to a user-specified amount Δ*ARSS*.

Our approach is to extend the existing methods from infrastructure network planning to planning multi-hop wireless mesh networks with fault-tolerance aspects. Other papers about fault-tolerance in wireless multi-hop networks can benefit from our approach for generating a fault-tolerant topology. Papers considering fault-tolerant routing, for instance [4, 19, 27], have a prerequisite of biconnected backbone network, but do not address the base station planning problem. The base station planning problem has been little addressed so far because in most mobile ad-hoc and sensor network scenarios the number and position of the nodes are considered uncontrolled or hardly controlled. However, in automation scenarios the networks are typically planned to provide service in some predefined geographical area (e.g. production hall). This requires careful base station planning for ensuring high availability of the radio coverage.

The topology control problem is to configure a given an instance of a multi-hop network such that it is connected and a quality of service property is fulfilled. Depending on the configured parameter, these methods adjust the transmission power [6] or the time of activity and sleeping periods of the nodes [5]. Paper [6] presents an algorithm for distributed adjustment of the transmission powers of the nodes with the purpose of minimizing the interference and keeping the network topology connected with a high probability. Paper [5] presents a distributed protocol for topology management which determines the active and sleeping periods for the nodes in such a way that the network is connected, the energy consumption is minimized, and the data is delivered with real-time guarantees. Paper [40] considers the issue of data forwarding in industrial wireless sensor networks and the integration in a wired backbone. It proposes a chain-based communication protocol for real-time communication over multiple hops. It is common for all topology control protocols that they operate on some existing instance of a multi-hop network. For achieving the required quality of service property, these protocols require some topological properties of the network (like connectivity or k-connectivity). The difference is that our base station planning algorithm plans a given network to be deployed with the desired topological properties. In this way, our algorithm can be used in the first phase of planning the topological properties of the network. In a second phase a topology control algorithm can be used to additionally adjust the transmission powers or active/sleep times of the nodes for achieving the required QoS property.
