**5.2 Traffic engineering with BGP policy change**

BGP by itself does not provide any information regarding link capacity or available bandwidth. Moreover, due to recent VLAN (Varadaraja, 1997) technology, the distance between two ASs has no relation to physical distance. Thus, QoS information of the detour routes must be examined by the operators. This makes systems reliant on human knowledge of traffic engineering. To remove the congestion due to the long detour AS path, we changed the BGP routing policy as shown in Fig. 13 and Fig. 14.

The members of TEIN2 (VN, MY, SG, ID, PH) lost their connections to APAN Tokyo XP because the fiber broke. AARNET NOC proposed backup routes for accessing APAN Tokyo XP through AU and Hawaii. However, this solution caused congestion on both the CN-KR and Hawaii-JP links. Besides, CN traffic took an asymmetrical path.

Therefore, to solve the traffic engineering issue, Tokyo XP made a decision to divide CN traffic by announcing CN IP prefixes through KR NOC and grouping CN prefixes at Tokyo XP. The results were monitored by Cisco NetFlow (CISCO, n.d.). The operators found out that half of the KR traffic was from CN. After a careful examination, it was discovered that a part of the CN traffic was from CERNET but the other part was from TEIN2.

Fig. 19. Traffic from KR monitored at APAN JP (Dec. 26 2006 - Jan. 26 2007)}

Experience with Restoration of Asia Pacific Network Failures from Taiwan Earthquake 377

plugged to the TEIN2-HK router. Finally, CUHK was directly connected to TEIN2. Finally, with this solution, CUHK and CSTNET were supplied with huge-bandwidth and short-RTT

From the process of recovering from network failures across several ISP's in Asia Pacific R&E networks, we encountered several network management challenges especially regarding fault management. In this subsection, we describe the lessons learned during the

BGP routing policies are usually made to avoid asymmetric or useless AS paths by setting the appropriate local preference values. However, these alternative AS paths worked as backup paths. Before the network failures from earthquake, Asia Pacific R&E network operators thought that removing the useless routes was urgent, because routing became too complicated after TEIN2 started. However, this complicated routing was able to provide valuable connections during network failures. This shows that maintaining full-mesh style

Though BGP re-routing over the redundant AS paths was successful for the first step in restoration, it was not sufficient to provide full backup service without congestion by considering the traffic load. Since BGP routing does not carry QoS information, such as link capacity, link utilization, or available bandwidth, traffic re-routed to the backup AS path had experienced poor QoS, such as long delays. Therefore, QoS-aware BGP routing or traffic

During the restoration process, we used various network monitoring tools such as an MRTG (Oetiker, n.d.a), a network weather map, a BGP routing table visualizer, and a flow monitor. At first, the link outage was noticed on the network weather map, and the abrupt change of traffic load was noticed on the MRTG. However, the fast fault detection method that encompasses physical, link, routing, and application layers is necessary because it was able to identify the exact failure points and visualize their impacts on the network. In addition, a simulator or emulator that could show the results with the network topology and the traffic load before and after failures would be very useful in predicting the effects of faultmanagement decisions. While we took various restoration steps, we had to process the information collected by each different network-monitoring tool. Finally, the operators interpreted the situation and implemented recovery decisions manually. If the iperf (GOOGLE, n.d) or bwctl (INTERNET2, n.d.) is available throughout the network, the endto-end available bandwidth between ASs can be easily estimated. For example, to access Sydney from Tokyo, there are two possible routes. One is Tokyo – Seattle - Sydney, and the other is Tokyo – Honolulu - Sydney. The former provides 10 Gbps but has a long RTT. The latter route includes a bottleneck along the 155 Mbps path but has a short RTT. In addition, to make the final decision, we had to check the flow data, because MRTG or RRDTool (Oetiker, n.d.b) do not classify traffic breakdowns by their source/destination ASs. When the traffic from KR increased suddenly, the operators could not understand the reasons. This shows that integrated network monitoring or management systems would be very

connections to the R&E network.

**6.1 Fault-tolerant fiber topology design** 

engineering-aware BGP routing is necessary.

**6.2 Integrated network management** 

routing information is very important for fault-tolerant routing.

**6. Lessons** 

recovery operations.

Figure 19 shows the traffic load for each source AS. Although the total traffic is about 0.4 Gbps, the real KR traffic was about 0.2 Gbps. 0.1 Gbps is occupied by CERNET traffic and the rest by TEIN2.
