*Condition-Based Maintenance for Data Center Operations Management DOI: http://dx.doi.org/10.5772/intechopen.93945*

for downtime inspections. The process and routine of inspection defines as difference in the length of time manner, therefore it creates the utility of the P-F Interval. The evasion of off-line inspections, which frequently cause of data center downtime and ruin reputation, can apply CBM methods for economically feasibility. The most usually applied techniques of CBM monitoring are:


downtime, eliminates unnecessary maintenance, and cuts related costs. Thereby, maintenance activities require only when they need after the decision analysis for maintenance conditions such as repairs or replacements before the failure [3]. There are various techniques and technology to implement for data collecting, processing, diagnostics, and prognostics for performing CBM through the system performance operations. Lee (1998) [4] describes CBM strategic approach into

First, the data-driven scenario has applied historical and statistical data to comprehend a numerical model of systematic determinants such as mean time between failure (MTBF), mean time to repair (MTTR), and maximum tolerable period of disruption (MTPD) [5]. However, this scenario has depended on the accuracy of sensing devices, operational data, data interpretation, and perceived condition of

Second, model-based scenario has deployed an analytical algorithm such as simulation modeling to demonstrate the system reliability, system degradation, and system efficiency. Mostly, this m0del-based need high-level application software for simulated models such as MATLAB or reliability block diagram (RBD).

Last, knowledge-based scenario has depended on human experience by applying from the past real case based analysis or deriving data from the past project information related to data collecting, gathering, analyzing, decision, and execution. Moreover, they are systematic approach of engineering knowledge and maintenance attention to system facilities to guarantee their proper functions and to reduce their deterioration rate. Sometime knowledge-based can be perform through

CBM approaches provisioning load or trend profile the earliest probable prediction of device or system failure, with optimal advantage by reduced maintenance time, labor and inventory costs, eliminated downtime, increased device or system life, and cut capital expenditures. The P-F Curve in **Figure 2** depicts the performance condition of device or system, which declines overtime series, this condition leads to functional failure or potential failure. The CBM system is an on-line monitoring, controlling, and inspecting that prepare the greatest P-F Intervals, which are scarcely interrupting than traditional TBM. This helps inspector for a planning

three scenarios: data-driven, model-based, and knowledge-based.

*Operations Management - Emerging Trend in the Digital Era*

stressful situation.

**Figure 2.**

**38**

*Optimization the P-F interval under CBM method [6].*

machine learning or AI in the future.


#### **2.3 Data center reliability**

Data center reliability is reinforced by creating redundant topology to each system such as utility supplies, backup power supplies (generators and UPSs), fiber optic communication connections, networking connectivity, environmental controls, and security devices. The report from Emerson [7], as presented in **Figure 3**, is described some critical devices that related to system failure. The racking top 3 incidents are UPS battery, over capacity of UPS, and human error.

The prognostics method, the condition monitoring process can be performed either continuously or periodically. Sensing devices and data collection systems may be required for continuous monitoring through DCIM [8, 9]. Graphically, how the prognostics method performs is demonstrated in **Figure 4**. The deterioration trend of the device condition is represented via the horizontal and vertical axes, which present the operating times, trend monitoring, condition levels, and forecast point

#### **Figure 3.**

*Root causes and failure analysis inside data center operations.*

**System/ class**

Generator fuel run time

Impact of downtime

Annual allowable planned maintenance (hours)

Availability as %

*BICSI 002 system reliability classification.*

**Table 1.**

**Figure 6.**

**41**

Description Single path without any

*DOI: http://dx.doi.org/10.5772/intechopen.93945*

one of the following: alternative power source; UPS; proper IT Grounding

Utility Single feed Single

**Class F0 Class**

*Condition-Based Maintenance for Data Center Operations Management*

>400 100–

>99.00 99.00–

*Condition failure mode of power distribution systems in data center.*

**F1**

Single path

feed

Topology N or <N N N+1 N+1 2 N, 2

Redundancy No requirement N N N + 1 Greater

400

99.90

No requirement 8 hrs. 24 hrs. 72 hrs. 96 hrs.

99.90– 99.99

Sub-local Local Regional Multi-regional Enterprise

Redundant component/ single path

**Class F2 Class F3 Class F4**

Concurrently maintainable and operable

1 source with single input electrically devise from backup generator input

50–99 0–49 0

99.99–99.999 99.999–

Single feed 1 source with 2 inputs of

Fault tolerant

Dual feed from different utility substations

(N + 1)

than N + 1

wide

99.9999

**Figure 4.** *The principle of the prognostics method.*

respectively. The failure limit line determines the borderline between the operating and failure zones. If the forecasted trend line reaches or exceeds the failure limit, appropriate maintenance may be planned and scheduled ahead of time before the forecast point [10]. The ability to predict the future deterioration trend is the core of the prognostics method in the preventive maintenance strategy.

PPM can be defined as a strategic approach to improve the availability and reliability performance of a particular data center device or system. CBM is one type of PPM that extrapolates and predicts device or system condition over time, utilizing probability equations to assess and predict the downtime risks.

How to prevent those courses of data center failures? First, redundant system design is the first solution to prevent primary failure while selected devices and systems with highest MTBF rate is other best option. Uptime Tier Classification [11] and BICSI-002 [12] are classified the solution to prevent against the causes of failure. **Figure 5** presents the level of prevention of Uptime that Tier 4 is the highest level and Tier 1 is the lowest level of system protection while **Table 1** represents the level of prevention of BICSI 002 that Class F0 is the lowest level and Class F4 is the highest level of system protection respectively. The annual allowable planned for maintenance is the crucial factor to prevent data center downtime.

**Figure 5.** *Uptime data center tier classification.*

