**3.2 Perceptor**

Similar to the concept of Percept in the agent of AI [39], both in the brain and CDS, a perception process is performed on incoming measurements. The perceptor of the CDS extracts useful information from the noisy measurements, which

**Figure 1.** *Architectural structure of CDS for the nonlinear SG.*

*Cognitive Dynamic System for AC State Estimation and Cyber-Attack Detection in Smart Grid DOI: http://dx.doi.org/10.5772/intechopen.94093*

subsequently the executive uses to optimize its actions and improve the information gain for the next cycles. Those actions, performed by the executive under CC, are called cognitive actions. However unlike the role of the percept in AI, the perceptor perceives the environment directly and extracts relevant information from it which in turn the cognitive controller, residing in the executive, uses to sense the environment indirectly. In order to perform its function, the perceptor is made up of the generative model and the Bayesian filter, which are reciprocally coupled to each other.

#### *3.2.1 Generative model*

As defined in [6], the first component of the perceptor for the CDS is conceptually the *Bayesian generative model* [6], which acts as classifier for the observables received from the environment. However, in [7], it was argued that due to the dynamic nature of the SG, the Bayesian generative model would not be suitable for this specific application. Due to the complexity of the SG and its adoption for almost all applications, it is of upmost importance to detect anomalies or cyber-attacks as soon as possible before they can infect the network further, thereby starting a domino effect of cascaded problems throughout the entire network and end users. Therefore, inspired from quickest detection theory, the generative model proposed for the perceptor was based on cumulative sum (CUSUM) and is written as follows:

$$\mathbf{B}\_k = \sum\_{i=k-L}^k \mathbf{x}\_i \tag{16}$$

Where *k* refers to the current cycle number, *L* is the window over which the past states is being accumulated, **B***<sup>k</sup>* is the vector retaining the cumulative sum for each cycle and **x***<sup>i</sup>* is the vector of the states' output from the AC state estimator for the cycle *i*. While CUSUM-based detection methods has been very effective in detecting FDI attacks in [40, 41], they fall short when the attacker has prior knowledge of the threshold applied. Indeed, the latter can then craft an attack that remains undetected. However, the CDS allows us to bypass this problem through the use of the dynamic *entropic state*, as will be elaborated later. The entropic state is the foundation of control and attack detection in this CDS structure adapted for the SG. Lastly, the CUSUM based generative model also possesses some other desirable traits such as the smoothing out of noise operating under the slow dynamics of the SG.

#### *3.2.2 Bayesian filter*

The second component of the perceptor is the Bayesian filter, which is coupled to the generative model. Although the equations describing the SG for state estimation are nonlinear in nature, we can linearize the state estimates using the *Kalman* filter and assuming that it is operating under additive white Gaussian noise [42]. Since we are assuming that the power system is quasi-static in nature in this paper [43–45], we can use the well-known Kalman filter as the Bayesian filter in the perceptor. The Kalman filter is based on the state-space model which operates on a pair of equations known as the Process equation and the Measurement equation respectively. Moreover, under quasi-static assumptions, we can assume that the state variables **x** at the time *k*+1 will only deviate by a small amount from its previous values at its previous cycle *k*. Consequently we can simplify this relationship to the following equation:

$$\mathbf{x}\_{k+1} = \mathbf{x}\_k + \alpha\_k \tag{17}$$

where *ω<sup>k</sup>* is independent Gaussian noise vector with zero mean. Based on (17), we can propose the measurement equation as follows:

$$\mathbf{Y}\_k = \mathbf{L}\_k \mathbf{B}\_k + a \mathbf{y}\_k \tag{18}$$

and the covariance of matrix *ω<sup>k</sup>* as:

$$\mathbf{R} = \text{diag}\left[\sigma\_{\text{av}}^2\right], \sigma\_{\text{av}}^2 = \text{var}[\omega\_{\text{i}}] \tag{19}$$

As we are assuming that the system is operating under quasi-static conditions, a random walk model can be employed as the process equation as follows:

$$\mathbf{B}\_{k+1} = \mathbf{F}\_k \mathbf{B}\_k + \mathbf{v}\_k \tag{20}$$

where **v***<sup>k</sup>* is the process noise vector which is assumed to be statistically independent and zero mean. The covariance matrix of **v***<sup>k</sup>* is:

$$\mathbf{Q} = \text{diag}\left[\sigma\_{\mathbf{v}}^2\right], \sigma\_{\mathbf{v}}^2 = \text{var}[\mathbf{v}\_i] \tag{21}$$

Referring to (18) and (20), the system matrix **L***<sup>k</sup>* and the predictive transition matrix **F***<sup>k</sup>* are assumed to be identity respectively. In regards to the measurement and process equations mentioned previously, the computational steps of the Kalman filter starts with some predefined initial estimates of the states **B**^*<sup>k</sup>*∣*<sup>k</sup>*, and predicted error covariance, **P***<sup>k</sup>*∣*<sup>k</sup>*, which are used for the time update steps as follows:

The predicted estimated states of the generative model and predicted error covariance, **<sup>B</sup>**^*<sup>k</sup>*þ1∣*<sup>k</sup>* and **<sup>P</sup>***<sup>k</sup>*þ1∣*<sup>k</sup>* respectively, are calculated using the following equations:

$$
\hat{\mathbf{B}}\_{k+1|k} = \mathbf{F}\_{k+1,k}\hat{\mathbf{B}}\_{k|k} + \mathbf{v}\_k \tag{22}
$$

$$\mathbf{P}\_{k+1|k} = \mathbf{F}\_{k+1,k} \mathbf{P}\_{k|k} \mathbf{F}\_{k+1,k}^T + \mathbf{Q} \tag{23}$$

When the next cycle starts, those two estimates are then used for the measurement update stages to calculate the Kalman gain, **K***k*, filtered accumulated estimate, **B**^*<sup>k</sup>*∣*<sup>k</sup>*, and to update the process covariance matrix, **P***<sup>k</sup>*∣*<sup>k</sup>*, according to the equations below:

$$\mathbf{K}\_{k} = \mathbf{P}\_{k|k-1} \mathbf{L}\_{k}^{T} \left(\mathbf{L}\_{k} \mathbf{P}\_{k|k-1} \mathbf{L}\_{k}^{T} + \mathbf{R}\right)^{-1} \tag{24}$$

$$
\hat{\mathbf{B}}\_{k|k} = \hat{\mathbf{B}}\_{k|k-1} + \mathbf{K}\_k \left( \mathbf{Y}\_k - \mathbf{L}\_k \hat{\mathbf{B}}\_{k|k-1} \right) \tag{25}
$$

$$\mathbf{P}\_{k|k} = \mathbf{P}\_{k|k-1} - \mathbf{K}\_k \mathbf{L}\_k \mathbf{P}\_{k|k-1} \tag{26}$$

As a result, through the iteration of the time update and measurement update steps, the preceding *a posteriori* estimates are used to predict new *a priori* estimates.

#### **3.3 Feedback channel**

The feedback channel has very distinctive roles in the CDS as it completes the PAC by bringing together the perceptor and the executive. It is mainly related to control and cyber-attack detection in the SG. In order for the CDS to supervise the SG, the feedback channel holds the entropic-information processor, which is tasked with calculating the *entropic state* and internal rewards during reinforcement

*Cognitive Dynamic System for AC State Estimation and Cyber-Attack Detection in Smart Grid DOI: http://dx.doi.org/10.5772/intechopen.94093*

learning in the executive. This will be elaborated in sub-Section 3.4 (Executive) where it is more relevant to the role of the executive during planning.

### *3.3.1 Entropic-information processor*

The directed cyclic flow of information from the perceptor to the executive is known as the *entropic state of the perceptor*. The entropic state is built on the principles of the perceptual posterior, which can be viewed as the incoming filtered posterior embodying the essence of the generative model, Kalman filter and entropy, which is derived from *Shannon's information theory* [46]. The entropic state at time *k*, in this architecture is calculated using:

$$\frac{h\_{k|k} = \operatorname{Tr}\left\{ \mathbf{P}\_{k|k-1} - \left( \text{diag}\left\{ \hat{\mathbf{B}}\_{k|k-1} - \mathbf{Y}\_k \right\}^2 \right) \right\}}{\operatorname{Tr}\left\{ \mathbf{P}\_{k|k-1} \right\}}\tag{27}$$

where *Tr* represents trace operator, diag{.} is the diagonal operator and *hk*∣*<sup>k</sup>* is the entropic state. In [7], the efficiency of (27) for control and cyber-attack detection was proven and illustrated. For this reason, it will be retained for the CDS architecture being elaborated. Mathematically, (27) simplifies the information between the filtering-error covariance **P***<sup>k</sup>*∣*k*�<sup>1</sup> and the error between the state estimate **<sup>B</sup>**^*<sup>k</sup>*∣*k*�<sup>1</sup> and current states calculated at cycle *<sup>k</sup>* into a single metric. The denominator of (27) normalizes the equation such that *hk*∣*<sup>k</sup>* can only take values ranging from 0 to 1 when the environment is operating in the absence of uncertainty. The degree of disturbance affecting the SG can then be characterized through the entropic state; the lower *hk*∣*<sup>k</sup>* is, the greater the amount of disturbance or uncertainty in the system. Since the SG will be facing different situations during its operation such as the normal day to day routine and cyber-attacks, we can further dissociate the entropic state with the two following important properties:


#### **3.4 Executive**

From a design perspective, the Executive is the most important entity of the CDS as it is responsible for control of the SG in the absence of uncertainty. With this goal in mind, it consists of Reinforcement Learning (RL) and Cognitive Control (CC), which can be further subdivided into the action space, planner, working memory and policy.

#### *3.4.1 Reinforcement learning: Bayes-UCB*

Asides from its role in the calculation of the entropic state during each PAC, the feedback channel is also involved in the calculation of internal rewards during the

planning stages of the RL [39] algorithm in the executive. RL in the CDS is based on the current entropic state at each cycle which is subsequently used to optimize an objective function for optimal control in the network. Before we elaborate on the pivotal role of RL with the other components of the executive, Bayes-UCB [47] RL algorithm will be covered briefly in order to give an overview on how it operates. Bayes-UCB represents the current state of the art from a class of multi-armed bandit algorithms called UCB algorithms [48], which are based on the principle of optimism in the face of uncertainty. In this approach to the multi-armed bandit problem, the algorithm updates the estimate of the reward distribution for each action using a Bayesian method. The action that will be applied is then chosen according to the one that will yield the highest reward. Consequently, Bayes-UCB algorithm is an index policy that uses the prior distribution to pick a dynamic quantile of the posterior estimates for the index for each action. Hence, at each discrete time *t*, the algorithm will select the action *At* that satisfies the following condition:

$$A\_t = \underset{a}{\text{argmax}} \, q\_a(t) = Q\left(1 - \frac{1}{t(\log t)^\epsilon}, \lambda\_a^{t-1}\right) \tag{28}$$

where *Q*ð Þ *α*, *π* refers to the quantile of order *α* of the distribution *π*. Moreover, by assuming that the rewards follow a Bernoulli distribution, and when the prior distribution of each action is Beta(1,1), [49] shows that (28) can be further simplified. To maintain consistency of the used notations in this paper, (28) can be reduced to:

$$A\_k = \underset{a}{\text{argmax}} \, q\_a(k) = Q(\mathbf{1} - \frac{\mathbf{1}}{k(\log(k))^\varepsilon}; \text{Beta}(\mathbf{S}\_a(k) + \mathbf{1}, N\_a(k) - \mathbf{S}\_a(k) + \mathbf{1})\tag{29}$$

where *k* is the PAC cycle number, *Sa* is the cumulative reward for action *a*, *Na* is the number of times action *a* has been chosen and **c** is real parameter. As the CDS is a construct that draws its origin from the neuroscience of the brain, it is to be emphasized that Bayes-UCB shares many common traits to the Bayesian approach of decision making in human brains [50]. Following this brief coverage of Bayes-UCB, it will be shown in the next section, pertaining to Cognitive Control, how the RL algorithm integrates the system configuration **H** of the power grid, the generative model of the perceptor and the process model in the Kalman filter together for optimal state estimation.

#### *3.4.2 Cognitive control*

CC can be considered in many ways as the heart of the CDS as it brings together all the components, described so far, for goal oriented action on the SG. CC is made up of two important modules namely the *planner* and the *policy*. The planner is involved in the extraction of a set of prospective actions from the action-space *A* and their evaluation during the planning cycles (i.e., shunt cycles [6] in CDS terminology) during each PAC. Consequently, under the influence of attention from one PAC to the next, the policy learns the most appropriate actions yielding the maximum rewards to be applied. In the context of the SG, the action space consists of discrete weight values that can be attributed to the different meters. Thus, under the influence of attention, the CDS will learn the optimal weight values for the different meters for optimal state estimation. Those meters, which are detrimental for the state estimation, will be assigned lower weight values while those, which are crucial, will be given larger weight values as the CDS keeps

*Cognitive Dynamic System for AC State Estimation and Cyber-Attack Detection in Smart Grid DOI: http://dx.doi.org/10.5772/intechopen.94093*

learning about its environment to better perform its set goal. Planning in CC brings together all the other modules previously discussed. The process starts with the selection of a randomly chosen prospective action *a i*,*j <sup>k</sup>* which represents weight value *a<sup>i</sup>* for meter *j* during cycle *k*. This hypothesized weight value is then applied virtually to the weight matrix **W** in (5) and (6) to form **W***<sup>i</sup>*,*<sup>j</sup> <sup>k</sup>* . **<sup>W</sup>***<sup>i</sup>*,*<sup>j</sup> <sup>k</sup>* is then used to calculate a new planned state estimate, **x**^*<sup>p</sup> <sup>k</sup>*, using the same procedures mentioned in the last paragraph of Section 2.2. Thus, the same preceding calculated state of the AC state estimator, **x***<sup>k</sup>*�1, is used as the initial guess for the current cycle using any of those iterative techniques cited. However, the number of iterations is limited to *Np* iterations this time around. Due to the different weight matrices being examined, each iteration of using a **W***<sup>i</sup>*,*<sup>j</sup> <sup>k</sup>* will also involve a different hypothesized gain, **<sup>G</sup>***<sup>p</sup> k*, during planning. Since state estimation is computationally costly, by doing this process with a restricted number of iterations in the methodology explained, this allows the CDS to learn during the planning stages at a lower resource cost. With **x**^*<sup>p</sup> k* denoting the planned state estimate using the modified weight matrix with the hypothesized weight, the planned cumulative sum involving **x**^*<sup>p</sup> <sup>k</sup>* is then calculated:

$$\mathbf{B}\_k^p = \sum\_{i=k-L}^{k-1} \mathbf{x}\_i + \hat{\mathbf{x}}\_k^p \tag{30}$$

where **B***<sup>p</sup> <sup>k</sup>* is the planned cumulative sum involving **<sup>x</sup>**^*<sup>p</sup> <sup>k</sup>* instead of **x**^*k*. Using this new cumulative sum, a planned entropic state, *h p <sup>k</sup>*∣*<sup>k</sup>*, is subsequently calculated as follows:

$$h\_{k|k}^p = \frac{\det\left\{ \mathbf{P}\_{k|k-1} - \left( \text{diag}\left\{ \hat{\mathbf{B}}\_{k|k-1} - \mathbf{B}\_k^p \right\}^2 \right) \right\}}{\det\left\{ \mathbf{P}\_{k|k-1} \right\}}\tag{31}$$

The presence of uncertainties in the environment, whether stochastic or probabilistic, will cause a deviation in the output of the generative model of the perceptor from the estimated hidden state of the Kalman filter. Hence, the goal of (31) is to reduce this divergence by finding the best configuration weights for the respective meters. This condition is satisfied whenever the **W***<sup>i</sup>*,*<sup>j</sup> <sup>k</sup>* generates *h p <sup>k</sup>*∣*<sup>k</sup>* closer to the optimal value of 1, which implies that the planned estimated state of the AC state estimator reduces the propagated variation in the generative model.

#### *3.4.3 Internal rewards*

Moving forward with equations that describe the planning steps, the stage is now set to define the relationship between the previous steps and the calculation of the internal rewards during RL. The hypothesized internal rewards, *r i*,*j <sup>k</sup>* , associated with each prospective action *a i*,*j <sup>k</sup>* , for cycle *k* can be written as:

$$r\_k^{i,j} = h\_{k|k}^p - h\_{k|k} \tag{32}$$

As it can be seen from (32), the objective of RL, when operating under CC, is to minimize the amount of uncertainty in the SG by searching for an improved weight configuration during every PAC that will result in a better entropic state than the previous cycle. In other words, RL attempts to restrict the amount of uncertainty or disturbance during the state estimation process to the range computed by the Kalman filter in the perceptor. Referring back to the steps described so far that led

to (32), we can see that the CDS, as defined in this specific architecture, learns from the past and present actions to pick the best actions for the future. To assist in this task, after undergoing the shunt cycles during every PAC, the working memory holds temporarily the actions that have achieved the highest quantile from Bayes-UCB in (29) and applies them to the system before starting the next PAC. Thus, when the next PAC starts and a new set of prospective actions are evaluated according to their quantile values, if any of those actions achieves a higher quantile than the quantile of its respective meter in the working memory, then the higher achieving action will replace that previously considered best action. This way of performing control in the SG can also be viewed from a Bandit perspective, whereby it can be considered as a Contextual Bandit problem where every cycle presents new situations to be faced. According to those conditions, the actions performed on the SG will modify the system configuration to a new set point, from which the RL algorithm will have to adapt. This then continues on until the CDS is brought to rest. The complete algorithm of the methodology presented in this chapter can be found in [51] where it is integrated with a cyber-attack mitigation strategy known as Cognitive Risk Control, which was not discussed in this chapter. In [51], a greater discussion on the parameters and its selection is provided and contrasted with other popular cyber-attack detection methods.

#### **4. Computational experiments**

In this section, two different experiments are carried out to show the capability of CC in this new CDS architecture adapted for the smart grid. The first experiment shows CC's potential for optimal state estimation by using the optimization of the entropic state as objective function. In the second experiment, the capability of the entropic state as an attack detector will be demonstrated in four different scenarios based on the amount of information an attacker has and his access to the sensors. As IEEE bus networks have generally been used as benchmarks for evaluation in the other papers previously referenced and relating to this topic, the IEEE 14-bus network will be used for assessing the architecture proposed in this chapter. Since this particular network comprises of a large number of measurements and states, the results for the two different experiments will focus on certain aspects of the network that are relevant to the actual simulation. For both experiments, the data used to simulate the network configuration comes from the 14-bus case file in *MATPOWER* [53] which is an Electric Power System Simulation and Optimization Tools for MATLAB and Octave. Moreover, in order to bring about the modification for the AC state estimation algorithm, the doSE function of *MATPOWER* was modified for the requirements of the architecture. Originally, the algorithm uses Honest Gaussian Newton method with a maximum number of iterations of 100 and error tolerance of 10<sup>5</sup> . It also uses a *Flat Start* initialization each time the function is called. During the *Flat Start*, all the values of the different states for the initial guess is set to 1 unit.

#### **4.1 Cognitive control for BDD**

In the first experiment, the measurement signals relating to the state values were available from the case data in *MATPOWER* [52]. For this simulation, a noisy version of those signals was then generated with a signal-to-noise ratio (SNR) of 20 dB to create **z**. From the case data, 39 measurement signals are used to calculate the 29 state values of the IEEE 14-bus network, half of which are the voltage

#### *Cognitive Dynamic System for AC State Estimation and Cyber-Attack Detection in Smart Grid DOI: http://dx.doi.org/10.5772/intechopen.94093*

magnitudes and the other are the voltage angles for the different busses involved. The total duration of this experiment is 2000s. The parameter *L*, which is the window over which the past states is being accumulated, of the generative model of the perceptor was set to 20. In regards to the initialization of the Kalman filter, the initial estimates of the values to be received from the generative model are assigned a value of 0 and the diagonal elements of **Q** were set to 0.0324. Those of **R** were assigned a value of 0.01. On the executive side of the CDS for CC, the action space is made up of 156 actions, whereby each meter can be assigned a weight value from the following: 25, 50, 100, and 200. The goal of this experiment is to highlight this architecture's properties in terms of adaptability and robustness towards optimal state estimation to changing conditions. Consequently, in order to create a perturbation in the system, the SNR of the following meters is changed to 5 dB at the mentioned times: **t** ¼ **1000s** for meter 2 and **t** ¼ **1200s** for meter 15. This simulated context can be viewed as meter malfunction or a random attack, where the attacker only has limited access to meters to perform his task. In this simulation, CC is started at **t** ¼ **300s**. As mentioned in the earlier sections, CC is not started at **t** ¼ **0s** as some time (cycles) have to be allowed so that the Kalman filter can settle on the track in order for the algorithm to be operated effectively.

Referring to **Figure 2**, it can be seen that CC makes the whole network dynamic, whereby the executive of the CDS is assigning the best weight values for the meters for optimal state estimation on a cycle to cycle basis. Consequently, the cognitive controller shows it ability to learn from the current and past cycles to choose the best actions for future. Moreover, the constant modification of the weight values adds another level of nonlinearity on top of the already very complex and nonlinear AC state estimator. While this may appear to be over-complicated at first, the results show that this is not only feasible but it also makes the SG more powerful. As it can be seen in **Figure 2**, at the first instance of meter malfunction for meter 2 at **t** ¼ **1000s**, this has virtually no effect on this system at all as the CDS has assigned a lower weight value to that meter compared from the rest. While **Figure 2** shows the graphs of weight values for the some of the meters pertinent to this simulation, it is left to reader to realize that all the meters are undergoing weight reconfigurations every cycle. Thus, the different respective weight values for the meters are not all the same since the cognitive controller is adapting to the probabilistic nature of the noisy signals continuously. It is also shown that the algorithm is able to apply more than one action during each PAC under a stable manner. At **t** ¼ **1200s**, when meter 15 starts malfunctioning, we can now really see the capability of the architecture. As shown in **Figure 2**, it takes only a couple of cycles for the cognitive controller to learn and adapt to the new situation by lowering the weight assigned to meter 15

**Figure 2.** *Graphs of some affected states, weights and entropic state.*

and compensating for it by boosting the other meters. Thus, we can see that state 6 is the most affected and state 25 is also afflicted to a lower extent. Compared to the traditional AC state estimator, the CC algorithm is able to keep this perturbation under control as demonstrated in the referenced figure. Consequently, this shows the robustness of the algorithm to adapt and act according to the evolving situations. Although some of those weight values are changed at a later point in time, this is due to the frequentist approach of the Bayes UCB coupled with the probabilistic origin of the noise. As a result of those reconfigurations in earlier situations, this highlights the cognitive ability of the controller to trust certain meters more than the others. This simulation demonstrated CC's ability to pick the best set of meters for state estimation on the go. Referring back to **Figure 2**, it can be seen the CC has performed better than the traditional algorithm. Lastly, the Chi-squares test was not implemented in this experimented as it is based on statistical properties of the signals while the approach proposed is rooted on the principle of cognition of the brain.
