**Novel Rule Base Development from IED-Resident Big Data for Protective Relay Analysis Expert System**

Mohammad Lutfi Othman, Ishak Aris and Thammaiah Ananthapadmanabha

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/63756

#### **Abstract**

Many Expert Systems for intelligent electronic device (IED) performance analyses such as those for protective relays have been developed to ascertain operations, maximize availability, and subsequently minimize misoperation risks. However, manual handling of overwhelming volume of relay resident big data and heavy dependence on the protection experts' contrasting knowledge and inundating relay manuals have hindered the maintenance of the Expert Systems. Thus, the objective of this chapter is to study the design of an Expert System called Protective Relay Analysis System (PRAY), which is imbedded with a rule base construction module. This module is to provide the facility of intelligently maintaining the knowledge base of PRAY through the prior discovery of relay operations (association) rules from a novel integrated data mining approach of Rough-Set-Genetic-Algorithm-based rule discovery and Rule Quality Measure. The developed PRAY runs its relay analysis by, first, validating whether a protective relay under test operates correctly as expected by way of comparison between hypothesized and actual relay behavior. In the case of relay maloperations or misoper‐ ations, it diagnoses presented symptoms by identifying their causes. This study illustrates how, with the prior hybrid-data-mining-based knowledge base mainte‐ nance of an Expert System, regular and rigorous analyses of protective relay perform‐ ances carried out by power utility entities can be conveniently achieved.

**Keywords:** association rule, data mining, digital protective relay, expert system, pow‐ er system protection analysis, rough set theory

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### **1. Introduction**

According to the IEEE Working Group D10 of the Line Protection Subcommittee, Power System Relaying Committee, Expert Systems have been proposed since early 1980s to be potential tools for engineers to develop intelligent performance analysis systems for the intelligent electronic devices (IEDs) such as protective relays [1]. Some of the works where protection performance analyses can be identified are in the area of offline tasks such as settings coordination, post‐ fault analysis, and fault diagnosis [2–13].

Kezunovic et al. [6] explain the substation automated fault analysis using Expert System method based on the retrieved disturbance data acquired by digital fault recorders (DFRs). This fault analysis helps protection engineers identify the correctness of protective relay operation. **Figure 1** illustrates the block diagram of the Expert System. The knowledge base in the CLIPS (an Expert System shell) rules used in the forward chaining inference engine using processed data is built by interviewing experts, using an empirical approach based on Electromagnetic Transient Program (EMTP) simulation and utilizing actual big field substa‐ tion data.

**Figure 1.** The Expert System block diagram [6].

Luo and Kezunovic's [10] implementation of the Expert System in automated protection analysis is more specifically tailored at detailed analysis of a specific protective relay by relying on recorded big data found only within it. **Figure 2** illustrates the block diagram of the analysis system created based on CLIPS language within Visual C++ framework. The analysis system is developed revolving around the strategy of comparing predicted (hypothesized) and actual (factual) protection operation in terms of statuses and corresponding timings of logic operands. Any matching between the predicted and actual protection operations validates the correct‐ ness of the actual status and timing of that operand. Otherwise, certain misoperation is identified, and diagnosis is initiated to trace the reasons. Predicted statuses and timings of active logic operands are basically a hypothesization of relay operations, which is done by way of forward chaining reasoning. They form the knowledge base in the rules used in the CLIPS inference engine.

Novel Rule Base Development from IED-Resident Big Data for Protective Relay Analysis Expert System http://dx.doi.org/10.5772/63756 3

**Figure 2.** The Expert System block diagram for validation and diagnosis of protective relay [10].

**1. Introduction**

2 Big Data on Real-World Applications

tion data.

fault analysis, and fault diagnosis [2–13].

**Figure 1.** The Expert System block diagram [6].

inference engine.

According to the IEEE Working Group D10 of the Line Protection Subcommittee, Power System Relaying Committee, Expert Systems have been proposed since early 1980s to be potential tools for engineers to develop intelligent performance analysis systems for the intelligent electronic devices (IEDs) such as protective relays [1]. Some of the works where protection performance analyses can be identified are in the area of offline tasks such as settings coordination, post‐

Kezunovic et al. [6] explain the substation automated fault analysis using Expert System method based on the retrieved disturbance data acquired by digital fault recorders (DFRs). This fault analysis helps protection engineers identify the correctness of protective relay operation. **Figure 1** illustrates the block diagram of the Expert System. The knowledge base in the CLIPS (an Expert System shell) rules used in the forward chaining inference engine using processed data is built by interviewing experts, using an empirical approach based on Electromagnetic Transient Program (EMTP) simulation and utilizing actual big field substa‐

Luo and Kezunovic's [10] implementation of the Expert System in automated protection analysis is more specifically tailored at detailed analysis of a specific protective relay by relying on recorded big data found only within it. **Figure 2** illustrates the block diagram of the analysis system created based on CLIPS language within Visual C++ framework. The analysis system is developed revolving around the strategy of comparing predicted (hypothesized) and actual (factual) protection operation in terms of statuses and corresponding timings of logic operands. Any matching between the predicted and actual protection operations validates the correct‐ ness of the actual status and timing of that operand. Otherwise, certain misoperation is identified, and diagnosis is initiated to trace the reasons. Predicted statuses and timings of active logic operands are basically a hypothesization of relay operations, which is done by way of forward chaining reasoning. They form the knowledge base in the rules used in the CLIPS

**Figure 3.** Structure of Expert System for protection coordination [13].

Tuitemwong and Premrudeepreechacharn [13] implement ES analysis for improving protec‐ tion coordination settings of protective devices in distribution system under the presence of distributed generators (DG). By way of selecting suitable protection coordination settings, this analysis system determines the correct protection system performance in a DG-present power distribution system. The proposed structure of ES is shown in **Figure 3**. The inference engine uses coordination rules and selection rules to generate satisfactory coordination settings based on the processed equipment data, circuit data, protection data, and DG data in the knowledge base. In the case of conflicting settings, the user can make his own decision. The rules are set for the specific distribution system protection and maybe changed when necessary.

The common problem with the aforementioned implementation of rule-based Expert System in protection system analysis is the difficult upgrading of its knowledge base that is made up of "if-then" rules used for decision-making inference engine. Upgrading by expansion and refinement are necessary so as to adapt the Expert System to the continuously changing power network topologies, protection strategies, and multiplicity in protective relay functions [14]. However, acquiring knowledge of relay operation characteristics for upgrading of the knowledge base has not been an easy task due to


It is beneficial if a novel technique could be formulated so as to relieve the untoward effort needed to acquire knowledge in building and maintaining the knowledge base. This technique should allow adjustment of knowledge base by training a protective relay device for as many disturbances as exhaustively possible in order to produce a complete inventory of rules. To help realize this, the authors' previous work of an integrated data mining approach under the Knowledge Discovery in Database (KDD) framework shall be the prior step before the eventual Expert System knowledge base upgrading strategy is subsequently performed [15–17].

### **2. Integrated data mining approach to hypothesize expected relay behavior from recorded relay event report**

Under the KDD framework, Othman et al. [15–17] investigate the implementation of a novel integrated data mining approach under supervised learning in order to discover the knowl‐ edge (or "hypothesize") and the expected relay behavior. This knowledge extraction from the resident large event reports of a digital distance protective relay comes in the form of associ‐ ation rules as shown in **Figure 4**. The integrated data mining encompasses the adoption of the following computational intelligence methods:

**i.** Rough set theory: Used to *select* the minimal subsets (i.e., reduction) of attributes while maintaining the original syntax of the relay's big data of event report.

**ii.** Genetic algorithm: Used to *explore* the optimal sets of the above subsets of reduced attributes from which simple yet accurate prediction rules (i.e., decision algorithm) can be constructed.

Tuitemwong and Premrudeepreechacharn [13] implement ES analysis for improving protec‐ tion coordination settings of protective devices in distribution system under the presence of distributed generators (DG). By way of selecting suitable protection coordination settings, this analysis system determines the correct protection system performance in a DG-present power distribution system. The proposed structure of ES is shown in **Figure 3**. The inference engine uses coordination rules and selection rules to generate satisfactory coordination settings based on the processed equipment data, circuit data, protection data, and DG data in the knowledge base. In the case of conflicting settings, the user can make his own decision. The rules are set

for the specific distribution system protection and maybe changed when necessary.

knowledge base has not been an easy task due to

**behavior from recorded relay event report**

following computational intelligence methods:

relay manuals.

4 Big Data on Real-World Applications

The common problem with the aforementioned implementation of rule-based Expert System in protection system analysis is the difficult upgrading of its knowledge base that is made up of "if-then" rules used for decision-making inference engine. Upgrading by expansion and refinement are necessary so as to adapt the Expert System to the continuously changing power network topologies, protection strategies, and multiplicity in protective relay functions [14]. However, acquiring knowledge of relay operation characteristics for upgrading of the

**i.** the burdensome manual handling of voluminous protective relay stored data and

**ii.** the heavy dependence on the protection experts' differing knowledge and inundating

It is beneficial if a novel technique could be formulated so as to relieve the untoward effort needed to acquire knowledge in building and maintaining the knowledge base. This technique should allow adjustment of knowledge base by training a protective relay device for as many disturbances as exhaustively possible in order to produce a complete inventory of rules. To help realize this, the authors' previous work of an integrated data mining approach under the Knowledge Discovery in Database (KDD) framework shall be the prior step before the eventual Expert System knowledge base upgrading strategy is subsequently performed [15–17].

**2. Integrated data mining approach to hypothesize expected relay**

Under the KDD framework, Othman et al. [15–17] investigate the implementation of a novel integrated data mining approach under supervised learning in order to discover the knowl‐ edge (or "hypothesize") and the expected relay behavior. This knowledge extraction from the resident large event reports of a digital distance protective relay comes in the form of associ‐ ation rules as shown in **Figure 4**. The integrated data mining encompasses the adoption of the

**i.** Rough set theory: Used to *select* the minimal subsets (i.e., reduction) of attributes while maintaining the original syntax of the relay's big data of event report.

**iii.** Rule quality measure: Used to *extract* the pertinent association rule from a host of the above original population of prediction rules to determine tripping logic of relay upon fault detection. This is what is referred as hypothesization of protective relay operation. This final version of knowledge representation shall be the main constit‐ uent for the Expert System knowledge base.

**Figure 4.** Data mining analysis steps in hypothesizing distance relay operation characteristics from big relay event da‐ ta.

In the study, the large event report is a PSCAD-simulated raw operation recording of an AREVA-modeled distance protective relay as shown in **Table 1** (only a portion of time events is shown to reduce page usage). This big data, which is prior to data preparation, is a repre‐ sentation of the relay's decision system (*DS*) for zone 1 A–G fault—the so-called predatapreparation *DS* [18].


**Table 1.** Predata-preparation of distance protective relay's decision system for zone 1 A-G fault (only a portion of attribute columns (from a total of 108) and time events are shown to reduce page usage).

The decision system is an information table of event report that can be considered as a pair of finite and nonempty set (*U, A*). *U* is the universe of objects (i.e., time tagged relay events *tn*, thus called event report) and *A* is the set of attributes {e.g., *ir, irp, vam, iam, ibm, icm, CB52a\_B, CB52b\_B, VTmcb\_B, CRZ4, pg\_Z3PkUp, pg\_Z4PkUp, pp\_Z1PkUp, pp\_Z2PkUp, AGflt, c50\_Z1, b50\_Z3, Dist\_ab\_Z2, pg\_TrpZ1f, TrpBOPZ1, WI\_CRTrp, Trip\_PhA*, etc.}. Each attribute *a*∈*A* defines an information function such that, *fa*: *U* → *Va*, where *Va* is the set of values of the attribute *a*, called the domain of *a*. For instance, the set of values of the attribute *pg\_Z1PkUp* (the "zone 1 ground distance pick-up" element) is expressed as *pg\_Z1PkUp: U* → {0, 1}, which defines the relay element's active states according to the presence of ground fault in the protected section of transmission line (i.e., no-fault present or zone-1-ground-fault present).

sentation of the relay's decision system (*DS*) for zone 1 A–G fault—the so-called predata-

**Table 1.** Predata-preparation of distance protective relay's decision system for zone 1 A-G fault (only a portion of

The decision system is an information table of event report that can be considered as a pair of finite and nonempty set (*U, A*). *U* is the universe of objects (i.e., time tagged relay events *tn*,

attribute columns (from a total of 108) and time events are shown to reduce page usage).

preparation *DS* [18].

6 Big Data on Real-World Applications


**Table 2.** The predata-mining *DS* of distance protective relay subjected to zone 1 A-G fault.

Here, *A* is *A* = *C* ∪ *D* which is a nonempty finite union set of condition and decision attributes (condition attributes *ci* ⊂ *C* suggest the multifunctional protective elements and analog measurands while decision attribute *di* ⊂ *D* suggests the relay's trip output).

This big data is a hindrance in a laborious manual extraction of relay operation characteristics for the Expert System development. Thus, the aforementioned novel integrated data mining strategy is necessary to address this issue.

The resulting prepared decision table (after data selection, preprocessing, and transformation) of the distance protective relay's decision system is shown in **Table 2**. It is also called postdatapreparation *DS* or predata-mining *DS*. "." denotes data patterns that are similar to events immediately before and after them. Thus, they are not presented in order to reduce the table dimension. It is noticeable that the number of attributes has been substantially reduced by the data preparation strategy to merely 46 from the original 108 in the large raw event report.

The important analysis steps in the framework of Rough Set based data mining for deriving the distance relay decision algorithm from its event database is illustrated in **Figure 4** and discussed herewith.

The *computation of reducts* which is a process of reducing the number attributes while still maintaining the original data syntax is performed to start with. Within this the following substeps are executed:


Next *prediction rules* (denoted as ) are generated in which the above discovered reducts serve as the templates for the prediction rules to be created from. This is principally done by superimposing each reduct in the reduct set over the original decision table *DS* and then reading off the domain values of the condition and decision attributes. The resulting logical patterns, denoted as ), that relate descriptions of condition to decision classes shall have the representation shown in Eq. (**1**):

Novel Rule Base Development from IED-Resident Big Data for Protective Relay Analysis Expert System http://dx.doi.org/10.5772/63756 9

$$C \stackrel{pud}{\Longrightarrow} D: \text{IF } \mathcal{c}\_i = \mathcal{v}\_{\boldsymbol{c}\_i} \\ \text{AND} \dots \text{AND} \, \mathcal{c}\_k = \mathcal{v}\_{\boldsymbol{c}\_k} \\ \text{THEN} \, Trip = \mathcal{v}\_{\text{Pro}} \tag{1}$$

These prediction rules that are an exact representation of the characteristics of the relay decision system (table) *DS* can be described as the relay decision algorithm and can be designated as *ALG*(*DS*), i.e.,

$$ALG\left(DS\right) = \bigcup\_{\iota:\iota \subset \iota} \left(C \stackrel{\scriptstyle \mathcal{P}^{\iota \circ \iota}}{\Rightarrow} D\right) \tag{2}$$

where (*C* ⇒ *pred D*)*t* is the set of minimal prediction rules *C* ⇒ *pred D* for an event *t* ∈ ∪, i.e.,

$$\text{tr}\left(C \stackrel{\text{prod}}{\Rightarrow} D\right)\_i: \text{IF } \mathcal{c}\_i = \text{\textquotedblleft}\_{\text{c}\_i}(t) \text{AND} \dots \text{AND} \mathcal{c}\_{\text{c}} = \text{\textquotedblright}\_{\text{c}\_i}(t) \text{THEN } Trip = \text{\textquotedblleft}\_{\text{nop}}(t) \tag{3}$$

This *ALG*(*DS*) can be evaluated for its accuracy as follows:

Here, *A* is *A* = *C* ∪ *D* which is a nonempty finite union set of condition and decision attributes (condition attributes *ci* ⊂ *C* suggest the multifunctional protective elements and analog

This big data is a hindrance in a laborious manual extraction of relay operation characteristics for the Expert System development. Thus, the aforementioned novel integrated data mining

The resulting prepared decision table (after data selection, preprocessing, and transformation) of the distance protective relay's decision system is shown in **Table 2**. It is also called postdatapreparation *DS* or predata-mining *DS*. "." denotes data patterns that are similar to events immediately before and after them. Thus, they are not presented in order to reduce the table dimension. It is noticeable that the number of attributes has been substantially reduced by the data preparation strategy to merely 46 from the original 108 in the large raw event report.

The important analysis steps in the framework of Rough Set based data mining for deriving the distance relay decision algorithm from its event database is illustrated in **Figure 4** and

The *computation of reducts* which is a process of reducing the number attributes while still maintaining the original data syntax is performed to start with. Within this the following

**a.** Computation of the *D*-discernibility matrix of *C* (denoted as ). An element of is defined as the set of all condition attributes which discern events *ti*

**b.** Subsequent derivation of the discernibility function *fC*(*D*) in Conjunctive Normal Form (CNF) (also called POS form in Boolean algebra) from *MC*(*D*). The CNF is reduced to final form after absorption law and omission of duplicates of disjunctive terms (sums) are applied minus the multiplication among each of the disjunctive terms of the final CNF.

**c.** In empirical database such as in this relay event data analysis, the calculation toward arriving at the final Disjunctive Normal Form (DNF) in order to find the eventual reducts is extremely computationally intensive. (DNF is obtained if the multiplication among each of the disjunctive terms of the final CNF is performed). In this case, the generation of reducts is considered as an NP-hard problem [19]. Thus, Genetic Algorithm is adopted to compute approximations of reducts by finding the minimally approximate hitting sets (analogous to reducts) from the sets corresponding to the discernibility function [20, 21].

Next *prediction rules* (denoted as ) are generated in which the above discovered reducts serve as the templates for the prediction rules to be created from. This is principally done by superimposing each reduct in the reduct set over the original decision table *DS* and then reading off the domain values of the condition and decision attributes. The resulting logical patterns, denoted as ), that relate descriptions of condition to decision classes shall have

do not belong to the same equivalence class of the relation *U*|*IND*(*D*).

and *tj*

and

⊂ *D* suggests the relay's trip output).

measurands while decision attribute *di*

8 Big Data on Real-World Applications

strategy is necessary to address this issue.

discussed herewith.

substeps are executed:

the representation shown in Eq. (**1**):


The discovered *ALG*(*DS*) has been evaluated and verified by Othman et al. [15–17] to be able to be used to predict and discriminate future relay events having unknown trip state in unsupervised learning. This evaluation is necessary prior to allowing the eventual deduction of the relay association rule to take place.

Finally, postpruning (or filtering) is performed on the generated prediction rules (*C* ⇒ *pred D*) so as

to discover relay *association rules* (denoted as *C* ⇒ *pred D*). These pertinent association rules essentially characterize the tripping decision logic of protective relay upon fault detection. This has been referred at the outset as the hypothesization of protective relay operation. This final version of knowledge representation shall be the main constituent for the Expert System knowledge base.

Because there are too large prediction rules to be filtered from, it is difficult to manually determine which rules are more useful, interesting, or important. Therefore, a measure of rule quality called *G2 Likelihood Ratio Statistic* as well as a measure of rule interestingness are used to select the most appropriate relay association rules and filter away the unwanted ones.

As mentioned above, these finally discovered relay association rules essentially describe the logical pattern of the correlating descriptions of conditions (i.e., *C*, the attribute set for various multifunctional protection elements) and the decision class (i.e., *D*, the attribute for trip assertion status). Thus, the symbol *CD* is used to illustrate *C-D* association and "*CD-association rule*" has been labeled as such to recognize it.

The final *CD*-association rule for one such fault condition as zone 1 A–G fault is shown in Eq. (**4**). Different fault condition would provide correspondingly different association rules to describe the relay's behavior.

(123)AND 52 \_ (closed)AND \_ (123) AND (AGflt)AND 50 \_ 3(A) AND 50 \_ 4(A)AND 50 \_ 1(A)AND p50\_Z3(A)AND 50(1234)AND 32(Fwd)AND (0)AND 50(1234)AND \_ (123)AND \_ (1) (A) *Zag CB A pg PkUp FltType pp Z pp Z p Z r Q Zload Q Dist ag pg Trp Trip* **IF THEN** (4)

It is important to note that Eq. (**4**) defines the necessary triggering of the required relay multifunctional protective elements (antecedent) in order to recognize the zone 1 phase-A-toground fault and consequently assert the trip signal (consequent) to open pole A of the circuit breaker concerned. This is what the protection engineers would like to know in understanding the domain of the distance relay in responding to the fault.

Thus, it is necessary to verify how true it is that this rule can be used to interpret the distance relay behavior subjected to zone 1 A–G fault as represented by the predata-mining *DS* in Table 2. Out of all the relay events in the entire length of the relay event report, relay events *t90*and *t91* identified as the *fault detection* and *trip signal assertion* instances, respectively, will be our emphasis for cross reference to verify the exactness of the above-mentioned rationalized *CD*association rule. In Table 2, the rule is seen to be an exact interpretation of the relay events *t90*and *t91*. Thus, the discovered rationalized *CD*-association rule is verified.

The eventually discovered (*C* ⇒ *assoc D*), and thus the desired hypothesis, has been proven to be an exact manifestation of the relay operation characteristics hidden in the event report [15–17]. The intelligent data mining framework provides the potential facility to conveniently discover exhaustively available knowledge of relay behavior from big event data subjected to exhaus‐ tively possible fault contingencies. Ultimately, a complete rule base for inference execution of an Expert System for relay operation analysis can be developed. This is the motivation of developing an Expert System called Protective Relay Analysis System (PRAY) that provides a platform for gathering previously discovered rules for its knowledge base construction.

#### **3. Developing protective relay analysis system (PRAY) expert system**

The concept of protective relay performance analysis is related to the convention that in any analysis known or correct events must first be hypothesized (expected operations are as‐ sumed), then an analysis is performed to confirm (validate) or refute the hypothesis by running matching exercise between expected and actual operations of the device under test [22]. If it is determined that the protective relay operation was incorrect, the diagnosis for cause must be performed [8]. This fundamental concept shall form the very basis of developing PRAY for distance protection.

assertion status). Thus, the symbol *CD* is used to illustrate *C-D* association and "*CD-association*

The final *CD*-association rule for one such fault condition as zone 1 A–G fault is shown in Eq. (**4**). Different fault condition would provide correspondingly different association rules to

(0)AND 50(1234)AND \_ (123)AND \_ (1) (A)

It is important to note that Eq. (**4**) defines the necessary triggering of the required relay multifunctional protective elements (antecedent) in order to recognize the zone 1 phase-A-toground fault and consequently assert the trip signal (consequent) to open pole A of the circuit breaker concerned. This is what the protection engineers would like to know in understanding

Thus, it is necessary to verify how true it is that this rule can be used to interpret the distance relay behavior subjected to zone 1 A–G fault as represented by the predata-mining *DS* in Table 2. Out of all the relay events in the entire length of the relay event report, relay events *t90*and *t91* identified as the *fault detection* and *trip signal assertion* instances, respectively, will be our emphasis for cross reference to verify the exactness of the above-mentioned rationalized *CD*association rule. In Table 2, the rule is seen to be an exact interpretation of the relay events *t90*and

exact manifestation of the relay operation characteristics hidden in the event report [15–17]. The intelligent data mining framework provides the potential facility to conveniently discover exhaustively available knowledge of relay behavior from big event data subjected to exhaus‐ tively possible fault contingencies. Ultimately, a complete rule base for inference execution of an Expert System for relay operation analysis can be developed. This is the motivation of developing an Expert System called Protective Relay Analysis System (PRAY) that provides a platform for gathering previously discovered rules for its knowledge base construction.

**3. Developing protective relay analysis system (PRAY) expert system**

The concept of protective relay performance analysis is related to the convention that in any analysis known or correct events must first be hypothesized (expected operations are as‐ sumed), then an analysis is performed to confirm (validate) or refute the hypothesis by running matching exercise between expected and actual operations of the device under test [22]. If it is

**THEN**

*D*), and thus the desired hypothesis, has been proven to be an

(4)

*Zload Q Dist ag pg Trp Trip*

(123)AND 52 \_ (closed)AND \_ (123)

*Zag CB A pg PkUp*

*r Q*

the domain of the distance relay in responding to the fault.

*t91*. Thus, the discovered rationalized *CD*-association rule is verified.

*assoc*

*rule*" has been labeled as such to recognize it.

AND (AGflt)AND 50 \_ 3(A) AND 50 \_ 4(A)AND 50 \_ 1(A)AND p50\_Z3(A)AND 50(1234)AND 32(Fwd)AND

*FltType pp Z pp Z p Z*

describe the relay's behavior.

10 Big Data on Real-World Applications

The eventually discovered (*C* ⇒

**IF**

PRAY is developed as an application tool under LabVIEW framework from National Instru‐ ments [23]. The main components of PRAY are as shown in **Figure 5** and described as follows:

**Figure 5.** Architecture of Protective Relay Analysis System (PRAY).

**i.** Construction of a rule base for PRAY's inference engine by collating as an array all relay *CD*-association rules discovered from the KDD processes performed on trained relay. All attributes of each rule in the rule base shall be time tagged and arranged in a chronological order so that validation and diagnosis of the analyzed relay's operations can be presented in an apparent operations logical sequence.


#### **3.1. PRAY inputs**

The different inputs needed by PRAY for its analysis functions are as follows:


can be correctly cross validated with a *CD*-association rule chosen from the PRAY rule base.


#### **3.2. PRAY reasoning strategy for validation and diagnosis**

The reasoning for validation and diagnosis of relay operations analysis starts with identifica‐ tion of fault type, faulted zone, and distance to fault by PRAY itself. The information from the fault type and picked-up faulted zone is then used to determine the index in the rule base array to determine the subarray containing the appropriate relay *CD*-association rule to be used in analyzing the relay under analysis. This chosen rule shall act as the hypothesis of anticipated operations of individual protective elements in the relay under analysis when a particular fault has occurred. All the antecedents and consequent in the rule have been initially arranged in sequential order during the rule base construction according to the time instances that have been tagged alongside them. Time tagging is important so that validation and diagnosis of relay operations can be executed according to the logical sequence stipulated by the hypoth‐ esis. This logical sequence is in fact indicative of relay operations logic. The following is a fictitious example of relay operation hypothesis based on a chosen relay *CD*-association rule:

0.000 *CB52\_B*(closed) *Q32*(Fwd) 0.096 *p50\_Z1*(B)

0.097 *FltType*(BGflt)

0.100 *Q50*(1234) *r50*(1234)

0.104 *Zload*(0)

relay. All attributes of each rule in the rule base shall be time tagged and arranged in a chronological order so that validation and diagnosis of the analyzed relay's

fault-type channel. Using these channels, further identification processes of fault type, faulted zone, and distance to fault are executed and later used in singling out

up, an expected relay *CD*-association rule to be best chosen as a hypothesis for the

operations can be presented in an apparent operations logical sequence.

the most suitable relay *CD*-association rule from the rule base.

prediction of operations logic of the relay under analysis.

suggestion.

12 Big Data on Real-World Applications

**3.1. PRAY inputs**

**ii.** Construction of phase and ground distance impedance channels (attributes) and

**iii.** Inferring, from the rule base according to both impending fault type and zone of pick-

**iv.** Validation of occurrence of protective element pick-ups and their correctness of operations against hypothesis of the selected relay *CD*-association rule.

**v.** Symptom of relay element misoperation and its diagnosis as well as possible solution

**vi.** Graphical plots of ground and phase impedance locus against respective ground and

neous filtered voltages and currents and logic operands are also plotted.

**i.** Relay *CD*-association rules: These rules saved as a plain text format in the KDD

**ii.** Analyzed relay event reports in the form of raw and prepared decision systems, (relay

*DS*s): The raw relay *DS* is a converted data from relay resident IEEE COMTRADE format to DIAdem native format (.tdm), which is needed for processing in LabVIEW [25]. The prepared relay *DS* is a resultant file after the same data preparation process as that in the KDD for trained relay. This prepared relay *DS* in DIAdem format (.tdm) is of the same data structure as that used in the KDD; the latter is ready for the Rough Set data mining albeit not executed on for the expert system analysis. Having the same data structure is important so that the prepared *DS* of the relay under analysis

The different inputs needed by PRAY for its analysis functions are as follows:

by applying law of absorption.

phase distance quadrilateral characteristics. The distance characteristics are con‐ structed based on parameter settings taken from the relay under analysis. Instanta‐

process are collated via graphical user interface (GUI) dialog input. The user is prompted for sufficient number of rules to be imported. The collated rules are converted into an array to form a rule base for the Expert System inference engine. Each rule input is an outcome of KDD after the Rough-Set-and-Genetic-Algorithmbased data mining and Rule Quality Measure (*G2* Likelihood Ratio Statistic) in ROSETTA [24]. In its untreated form, each rule input consists of a number of sub-*CD*-association rules. These subrules are rationalized into a single *C* ⇒ *D* form by taking conjunction of them and using the concept of Boolean function manipulation

0.107 *Dist\_bg*(123) *Zbg*(123) *pg\_PkUp*(123) *pg\_Trp*(1)

```
0.108 Trip(B)
```
The consequent *Trip*(B) is associated with antecedents occurring beforehand. Any protective elements (antecedents) on the same row having the same time tagging indicate that they pick up (or stay in certain states) in concurrence. Expectedly, the last row having the highest tagged time must be the consequent (decision attribute) *Trip*(B).

The validation strategy of the operations of the analyzed relay starts by iterating through all antecedents in the hypothesis and comparing each one with that of the corresponding attribute of the prepared *DS* of the relay under analysis. Matched values result in messages describing the correctness of operations of the respective protective elements. On the other hand, any differences in the cross matches (either due to wrong pick-up values or nonassertion of the respective protective elements) will produce messages describing the relay's failed elements. The result of the validation is presented starting from the consequent (decision attribute, "*Trip*") at the top followed by antecedents arranged in descending sequence according to the order of the time tags in the hypothesis.

Diagnosis is carried out on failed, inoperative or misoperative protective elements. To view the cause–effect of events, a hierarchical tree is constructed based on the hypothesis where nodes are all hierarchically time sequenced, increasing in time from downstream nodes toward root node. The root node (top most) is the consequent of all the downstream antecedent nodes. Antecedents at the same nodes (i.e., having the same indentation) are concurrent in time instance. For the above-mentioned hypothesis, the diagnosis shall follow the following hierarchy:

*Trip*(B)


#### **4. PRAY analysis system results**

In the rule base construction of PRAY, each of the imported *CD*-association rules, prior to being rationalized using the concept of Boolean function manipulation by applying the law of absorption, would be formatted by ROSETTA into a text file. When imported into PRAY, the file will be cleared of all unnecessary data such as comments and rule interestingness numerical measures leaving only the required relay *CD*-association rules for subsequent rationalization.

**Figure 6** illustrates the GUI for the constructed rule base. Size of rule base and the selected subarray (0-indexed) of collated rule base array are shown. The size of the rule base reflects the number of training of various fault contingencies the trained relay has been subjected to.


**Figure 6.** GUI for constructed rule base.

The validation strategy of the operations of the analyzed relay starts by iterating through all antecedents in the hypothesis and comparing each one with that of the corresponding attribute of the prepared *DS* of the relay under analysis. Matched values result in messages describing the correctness of operations of the respective protective elements. On the other hand, any differences in the cross matches (either due to wrong pick-up values or nonassertion of the respective protective elements) will produce messages describing the relay's failed elements. The result of the validation is presented starting from the consequent (decision attribute, "*Trip*") at the top followed by antecedents arranged in descending sequence according to the

Diagnosis is carried out on failed, inoperative or misoperative protective elements. To view the cause–effect of events, a hierarchical tree is constructed based on the hypothesis where nodes are all hierarchically time sequenced, increasing in time from downstream nodes toward root node. The root node (top most) is the consequent of all the downstream antecedent nodes. Antecedents at the same nodes (i.e., having the same indentation) are concurrent in time instance. For the above-mentioned hypothesis, the diagnosis shall follow the following

In the rule base construction of PRAY, each of the imported *CD*-association rules, prior to being rationalized using the concept of Boolean function manipulation by applying the law of

order of the time tags in the hypothesis.

14 Big Data on Real-World Applications

hierarchy:

*Trip*(B)



 - *Q50*(1234) - *r50*(1234)

 - *FltType*(BGflt) - *p50\_Z1*(B)



**4. PRAY analysis system results**


 - *pg\_Trp*(1) - *Zload*(0)

> **Figure 7** illustrates the GUI for analysis of a distance protective relay operation that has been subjected to a zone-1-AG fault. Using data in the relay's raw tdm file, PRAY discovered that an AG fault has indeed occurred in zone 1 of the relay under analysis at approximately 39 km from its location in the substation. From this information, an appropriate relay *CD*-association rule has been chosen and displayed in the GUI. This rule shall be used to analyze whether any appropriate measures have been taken by the relay under analysis to clear the fault. In validating the individual operations of protective elements, the Validation field displays the correctness of actions taken by the relay after cross matching anticipated operations of individual protective elements hypothesized by the rule with the corresponding attributes obtained from the preprocessed tdm relay file under analysis. The consequent "Trip" is validated to have correctly sent a pole A trip signal to the circuit breaker. This is followed by correct antecedent statuses arranged in descending sequence according to the hypothesis. The relay tripping time of 1.2 ms is compliant with the TNB requirement of 25 ms for zone 1 operation. The circuit breaker operating time and fault clearance time are also displayed in the GUI.

**Figure 7.** GUI for analysis of distance protective relay operations.

**Figure 8.** GUI for ground distance quadrilateral characteristics plots.

**Figure 8** shows the graphical plots of ground impedance locus against respective ground distance quadrilateral characteristics. Since the fault is AG occurring in zone 1, it is noted that only trajectory of *Zag* traverses through into zone 1 of the ground quadrilateral characteristics and all phase impedances stay as outliers of the phase quadrilateral characteristics as expected.

**Figure 9.** Validation of misoperative relay.

**Figure 7.** GUI for analysis of distance protective relay operations.

16 Big Data on Real-World Applications

**Figure 8.** GUI for ground distance quadrilateral characteristics plots.

**Figure 9** illustrates a screenshot of PRAY's validation for a distance relay that had failed to operate (maloperated) when the transmission line it was protecting was subjected to a zone-1- AG fault. PRAY discovered that an AG fault had occurred in one of the relays under analysis at approximately 40 km forward its location in the substation. (This is actually the same fault occurred in the above analysis of the same relay operating successfully.) From this information, an appropriate relay *CD*-association rule had been chosen as the hypothesis (similar to the above) and used to validate that appropriate measures had not been taken to clear the fault. The consequent "Trip" was validated to have not sent a pole-A trip signal to the circuit breaker. The descending sequence of antecedents indicated that although there were correct operations of negative sequence overcurrent (*Q50*) and residual overcurrent supervision (*r50*) elements, signifying the impending A–G imbalanced fault, the zone-1 overcurrent supervision element (*p50\_Z1*) had failed to do likewise. This was believed to have attributed to the relay's failure to trip. Looking at the operation logic of different protective elements at different levels of sequence in the Diagnosis field's hierarchical tree, it is apparent that the failure by the overcurrent element *p50\_Z1* is diagnosed to be the possible cause of the relay maloperation. Finding the symptom related to the malfunctional *p50\_Z1* element as shown in **Figure 10** reveals that an incorrect threshold setting could have caused its failure.

**Figure 10.** Diagnosis of misoperative relay.

#### **5. Summary**

The developed Protective Relay Analysis (PRAY) Expert System has demonstrated how the problems related to the maintenance of rule base of an Expert System can be addressed. By collating all the necessary relay *CD*-association rules discovered previously from the earlier KDD processes involving integrated-Rough-Set-and-Genetic-Algorithm data mining, Rule Quality Measure, and rule interestingness and importance judgments (as discussed in the authors' cited works), a maintainable knowledge base for inference strategy can be conven‐ iently prepared. Although this study revolves around analyzing a modeled distance relay's big event data by hypothesis discovery, validation, and diagnosis, it is envisaged that using this approach a more rigorous analysis implementation of actual protective relay of different types can be embarked on.

#### **Acknowledgements**

This work was supported by the Universiti Putra Malaysia under the Geran Putra IPB scheme with the project no. GP-IPB/2013/9412101.

#### **Nomenclature**


**Figure 10.** Diagnosis of misoperative relay.

18 Big Data on Real-World Applications

types can be embarked on.

**Acknowledgements**

**Nomenclature**

with the project no. GP-IPB/2013/9412101.

The developed Protective Relay Analysis (PRAY) Expert System has demonstrated how the problems related to the maintenance of rule base of an Expert System can be addressed. By collating all the necessary relay *CD*-association rules discovered previously from the earlier KDD processes involving integrated-Rough-Set-and-Genetic-Algorithm data mining, Rule Quality Measure, and rule interestingness and importance judgments (as discussed in the authors' cited works), a maintainable knowledge base for inference strategy can be conven‐ iently prepared. Although this study revolves around analyzing a modeled distance relay's big event data by hypothesis discovery, validation, and diagnosis, it is envisaged that using this approach a more rigorous analysis implementation of actual protective relay of different

This work was supported by the Universiti Putra Malaysia under the Geran Putra IPB scheme

**5. Summary**


### **Author details**

Mohammad Lutfi Othman1\*, Ishak Aris1 and Thammaiah Ananthapadmanabha2

\*Address all correspondence to: lutfi@upm.edu.my

1 Center for Advanced Power and Energy Research, Department of Electrical and Electronics Engineering, Faculty of Engineering, Universiti Putra Malaysia, Serdang, Selangor, Malaysia

2 Department of Electrical and Electronics Engineering, The National Institute of Engineer‐ ing, Mysore, Karnataka, India

#### **References**


[7] M. Kezunovic, I. Rikalo, C. W. Fromen, and D. R. Sevcik. New automated fault analysis approaches using intelligent system technologies. CiteSeerx Scientific Literature Digital Library and Search Engine, College of Info. Sci. and Tech., Pennsylvania State Univ [Internet]. 1998 [Updated: 1998]. Available from: http://eppe.tamu.edu/k/ee/ china94.pdf [Accessed: 21 Dec 2015].

*Trip* relay pole trip signals

20 Big Data on Real-World Applications

**Author details**

**References**

respect to *D Zbg* zone of ground distance pick-up.

Mohammad Lutfi Othman1\*, Ishak Aris1

Delivery. 1994;9(2):720–728.

Republic of China; 29 Oct–5 Nov 1993.

ing, Mysore, Karnataka, India

1986;1(4):83–90.

1949.

*Zload* impedance encroaching load characteristic

\*Address all correspondence to: lutfi@upm.edu.my

*U*|*IND*(*D*) indiscernibility-relation/equivalence-class/elementary-sets about universe of relay events *U* with

1 Center for Advanced Power and Energy Research, Department of Electrical and Electronics Engineering, Faculty of Engineering, Universiti Putra Malaysia, Serdang, Selangor, Malaysia

2 Department of Electrical and Electronics Engineering, The National Institute of Engineer‐

[1] M. Ennas, L. Budler, T. W. Cease, A. Elneweihi, E. Guro, and M. Kezunovic. Potential applications of expert systems to power system protection. IEEE Transaction on Power

[2] C. Fukui and J. Kawakami. An expert system fault section estimation using information from protective relays and circuit breakers. IEEE Transaction on Power Delivery.

[3] Y. Sun and C. C. Liu. RETEX (Relay Testing Expert): an expert system for analysis of relay testing data. IEEE Transaction on Power Delivery. 1992;7(2):986–994.

[4] D. Kosy, V. Grinberg, and M. Siegel. Screening digital relay data to detect power network fault response anomalies. In: SPIE Proc. 2nd International Symposium on Measurement Technology and Intelligent Instruments (ISMTII); Wuhan, People

[5] M. Kezunovic, P. Spasojevic, C. Fromen, and D. Sevcik. An expert system for trans‐ mission substation event analysis. IEEE Transaction on Power Delivery. 1993;8(4):1942–

[6] M. Kezunovic, I. Rikalo, and C. W. Fromen. Expert system reasoning streamlines disturbance analysis. IEEE Computer Applications in Power. 1994;7(2):15–19.

and Thammaiah Ananthapadmanabha2


Simulation: Transactions of Society for Modelling and Simulation International. 2014;90(6):660–686.

