**1. Introduction**

As the Internet grows, networks become larger and more complex, and the number of components, such as routers, switches, and fiber cables, increases. In complicated network systems, it is difficult to implement global network management across several Internet service providers (ISPs) that use a lot of network components in a large-scale network topology. Fault management is a particularly important network management issue in complex network systems because the Internet has become essential to business and research. However, we are only beginning to learn how to deal with global network failures in large networks.

Failures have been reported in Sprint Internet protocol (IP) backbone, which shows that failures can be observed in everyday operation (Iannanccone et al., 2002; Markopulou et al., 2004). However, the network failures observed by (Iannaccone et al., 2002) and (Markopoulou et al., 2004) were short-lived and small scale, and their impacts were analyzed only in the context of a single ISP. Most network backup or fault restoration methods have been studied and proposed for the various layers such as wavelength division multiplexing (WDM), multi-protocol label switching (MPLS), or IP (Fumagalli & Valcarenghi, 2000; Gerstel & Ramaswami, 2000; Ramamurthy et al, 2003; Saharabuddhe et al., 2004; Sharma & Hellstrand, 2003). Yet, the proposed backup and restoration methods have not been fully implemented and deployed in the real network. Since real networks are more complicated than theoretical ones, the impacts of network failures on users and ISP's cannot be completely predicted and analyzed. Significant network failures due to natural disasters such as earthquakes, floods, or fires could have particularly wide impact on several ISPs.

We discuss the results of the critical network failures that occurred after the Taiwan earthquake in Dec. 2006, which cut fibers and caused network failures. We also explain how restoration methods such as automatic border gateway protocol (BGP) (Lougheed & Rekhter, 1989) re-routing, BGP policy change, and switch reconfiguration were conducted. We hope that the experience and knowledge we gained during the process of recovering

Experience with Restoration of Asia Pacific Network Failures from Taiwan Earthquake 363

When AIII (Asian Internet Interconnection Initiative [AIII], n.d.) started in 1996, this project was the first next generation Internet R&E network in Asia. The basic idea of AIII is to provide Internet services via satellite. There are AIII members in TH, MY, ID, and SG (Tbl. 3). NP also recently joined the project. The AIII's major effort is to build up the access points of non-broadband networks. In Asia, it was very hard to complete the network over telephones lines, and preparing earth stations represented a better chance of accessing the Internet. The biggest drawback of this project is that the total bandwidth is limited to between 1.5 and 8 Mbps. In 1996, 2 Mbps was enough bandwidth to start research activities. Now, however, even 8 Mbps is insufficient for network technology research activities. These days, AIII concentrates on developing and deploying certain advanced technology on their network, for example, IPv6 unicast, IPv6 multicast, Uni Directional Link Routing (UDLR),

APAN (Asia-Pacific Advanced Network [APAN], n.d.) started in 1997. APAN is the research consortium as well as it operates a next generation Internet service. The bandwidths of the backbone networks of APAN in 1998 were between 1 Mbps and 35 Mbps. The US next generation Internet project "very high speed Backbone Network Service" (vBNS) already had 155 Mbps bandwidth service in their backbone network. The operating policy of APAN was to provide high performance data transfer service, because, in 1998, it was still impossible to implement a huge bandwidth network in the Asia Pacific area. Now, the APAN network covers the Asia Pacific area, providing bandwidth between 45 Mbps and 10 Gbps. In 2011, APAN became the none-profit organization and APAN strongly supports

TEIN or TEIN1 was an EU project that connected Europe and Korea. It started with a bandwidth of 10 Mbps. TEIN2 (Delivery of Advanced Network Technology to Europe [DANTE], n.d.) is a bit different from TEIN in that it has two goals. One is to provide an access network to Europe, and the other is to develop an interconnection network in the Southeast Asian area whose bandwidth is between 45 Mbps and 1 Gbps. The main characteristic of the design topology is that it is not the star-shaped. That is, the TEIN2 network itself has its own backbone with connecting the four NOCs, and the number of routes for communicating between NOCs are more than two. It provides each Southeast Asian site with several routes to access the others. TEIN3 started in 2010 and TW, KH, LA,

In the old star-shaped R&E network topology (Fig. 1) communication between the point sites had to go through the center of the star. Communication along these routes was frequently delayed due to the configuration. Now, however, the R&E network community has grown, and the former point sites are now sometimes the center of a star. In some cases

**2. Research and education network activities in the Asia Pacific area** 

**2.1 Asian Internet Interconnection Initiative (AIII)** 

**2.2 Asia-Pacific Advanced Network (APAN)** 

the research activities over the R&E networks in Asia Pacific area.

**2.3 Trans-Eurasian Information Network 2 (TEIN2)** 

IN, LK, NP, PK, BD and BT (Tbl. 3) joined this activity.

this growth generated several routes between two sites (Fig. 2).

**3. Background of R&E network 3.1 R&E network status in 2006** 

and advanced TCP.

from this huge natural disaster, which affected the global Internet, can be shared and can contribute to future Internet network management research. To the best of our knowledge, this is the first detailed study of network restoration after global network failures due to a natural disaster.

Although many natural disasters have occurred in the 21st century, until recently there had been no simultaneous outage of the global Internet backbone. However, the earthquake that occurred around Taiwan in 2006 made several Asia Pacific Research and Education (R&E) networks unreachable. At 21:26 and 21:34 on December 26th (UTC+9), 2006, there was a big undersea earthquake off the coast of Taiwan twice, which measured 7.1 and 6.9 respectively on the moment magnitude (Hanks & Kanamori, 1979). This earthquake caused significant damage to the undersea fiber cable systems in that area. Several ISPs were affected because each cable system is shared by multiple ISP's. This earthquake had the effect of dividing the Asia Pacific R&E networks into an eastern and a western group. The Asia Pacific R&E networks were, in particular, seriously damaged but were fully restored after several restoration steps, including automatic BGP re-routing, BGP policy changes, and switch port reconfigurations, were taken.

The first step in recovery after the earthquake was taken automatically by BGP routers, which detoured traffic along redundant routes. In BGP routing, there are usually multiple redundant AS paths. Redundant BGP routes have served as backup paths but have provided poor quality connectivity, i.e., long round trip time (RTT). Because of the congestion on the narrow-bandwidth link that was subsequently reported, operators took manual control of traffic to improve communication quality. The second step was a traffic engineering process intended to prevent narrow-bandwidth links from filling up with detoured traffic. The operators changed the BGP routing policy related to the congested ASs. In spite of the routing-level restoration, a few institutions were still not directly connected to the R&E network community because they had only a single link to the network. For these single-link networks, the commodity link was used temporarily for connectivity. However, the commodity link was not stable and not sufficient to carry a huge amount of bandwidth or to provide next generation Internet service. To restore the singlelink networks, cable connection configurations at the switches were changed.

The fiber break caused by the Taiwan earthquake raised restoration issues related to BGP rerouting. In such an emergency, the backup routes should be chosen based on available bandwidth and RTT. Since the fiber break required an urgent network recovery process, network operators configured re-routing based on their experience with bandwidth and RTT.

From this experience, we have learned that redundant physical backup links and routes are important to providing bandwidth and connectivity and that the Quality-of-Service (QoS) after recovery is also important. From the viewpoint of restoration after network failures, there are still challenges that cannot be automatically overcome by network management systems. A systematic risk management plan that includes collaboration among operators of the next-generation Internet is needed.

The remainder of this chapter is as follows. In Section 2, the Asia Pacific R&E networks that were damaged by the earthquake or related events are introduced. Section 3 introduces the R&E connection especially in Asia Paicifc area and the issues caused by such inter-connectivity of R&E networks. Section 4 is a detailed report of the network failures that were observed after the earthquake. Section 5 describes the processes to restoring the disrupted communications in the area. Section 6 discusses what we have learned from the observation of the network failures and recovery processes. Finally, we conclude the paper in Section 7.
