**3.1. Inference propagation**

According to the framework outlined in Section 2, the current state of a building will be monitored in real-time by means of a sensor network deployed inside it. However, in order to limit the number of sensors to be installed in buildings, only a fraction of all the relevant physical variables are measured, while the remaining variables are derived from models. In other words, these models allow to estimate indirect measurements from measurements made directly by sensors. In the case study we are presenting in this Chapter, these models were developed in the form of Bayesian Networks, mainly because they are suitable to reduce complex domains into computationally manageable models. This is a key feature when computations must be performed in real-time. While numerical models (e.g. the DymolaTM model mentioned in Section 6) take hours to simulate the fluid dynamics and thermal behav‐ iour of the PdG station, Bayesian networks make computations in a matter of seconds or minutes. Hence they are the most suitable to perform real-time predictions, provided that a reliable procedure for their development is available.

Other features typical of Bayesian Networks, which might come of advantage in these applications, are their capability of managing incomplete (e.g. one or a few data are not available because the corresponding sensors are broken) and uncertain information (e.g. if we include uncertainty in sensor measurements or if inputs are relative to forecasts of disturbance actions).

Whenever a Bayesian Network estimates indirect measurements from direct measurements, in fact it implements inference algorithms. Such inferences are computationally possible, thanks to the conditional probability relationships defined among the variables of the domain under analysis. This allows to consider just the most relevant relationships among all the variables (i.e. nodes), which are kept in the form of conditioned probabilities of the kind *P(X*2*|X*1*)*, where *X*1 is a parent of *X*2 (which is its child node instead) and *X*2 is conditionally independent of any variable in the domain which is not its parent [21]. The "chain rule" descends from this concept, in that the joint probability of a group of variables in a domain can be determined by the knowledge of the state of just its parents, thus limiting the database required for its inference [22]:

CONTROLLER

8 Dynamic Programming and Bayesian Inference, Concepts and Applications

STATION

PREDICTION MODEL

DISTURBANCES MODEL

**Figure 1.** Predictive model based control framework defined for the metro station PdG-L3.

According to the framework outlined in Section 2, the current state of a building will be monitored in real-time by means of a sensor network deployed inside it. However, in order to limit the number of sensors to be installed in buildings, only a fraction of all the relevant physical variables are measured, while the remaining variables are derived from models. In other words, these models allow to estimate indirect measurements from measurements made directly by sensors. In the case study we are presenting in this Chapter, these models were developed in the form of Bayesian Networks, mainly because they are suitable to reduce complex domains into computationally manageable models. This is a key feature when computations must be performed in real-time. While numerical models (e.g. the DymolaTM model mentioned in Section 6) take hours to simulate the fluid dynamics and thermal behav‐ iour of the PdG station, Bayesian networks make computations in a matter of seconds or minutes. Hence they are the most suitable to perform real-time predictions, provided that a

Other features typical of Bayesian Networks, which might come of advantage in these applications, are their capability of managing incomplete (e.g. one or a few data are not available because the corresponding sensors are broken) and uncertain information (e.g. if we include uncertainty in sensor measurements or if inputs are relative to forecasts of disturbance

Whenever a Bayesian Network estimates indirect measurements from direct measurements, in fact it implements inference algorithms. Such inferences are computationally possible, thanks to the conditional probability relationships defined among the variables of the domain under analysis. This allows to consider just the most relevant relationships among all the variables (i.e. nodes), which are kept in the form of conditioned probabilities of the kind

**3. Basics of Bayesian Networks**

reliable procedure for their development is available.

**3.1. Inference propagation**

actions).

$$P\{X\_{1^{\prime}}, \dots, X\_{n}\} = \prod\_{i=1}^{n} P\{X\_{i} \mid X\_{1^{\prime}}, \dots, X\_{i-1}\} \tag{1}$$

More remarkably, Bayesian Networks allow to perform inferences, in other words any node can be conditioned upon new evidences, even when they are relative to multiple variables. This feature is particularly important in case a control system must work in real-time, because in that case evidences acquired about a state variable (i.e. from sensor measurements) must be propagated to update the state of the rest of the domain. This process requires conditioning, and it might be called also probability propagation or belief updating. It is performed via a flow of information throughout the network, without limitation in the number of nodes [1]. When it is run in the MPC framework, the controller will make queries to a set of nodes belonging to the networks, whose probability distributions are computed from the state of other nodes, upon which observations (or evidences) are already available (e.g. the future state of disturbance variables and the current state of the physical domain). To this purpose the Bayes theorem is exploited when there is a need to reverse the inference. In particular, if inference is run from causes to consequences it is called predictive reasoning; otherwise, if inference is directed from consequences to causes, it is called diagnostic reasoning. Inference in Bayesian Networks is solved by complex combinations of algorithms [1]. In order to show how this works in the case of BEMs, a short example will be discussed. The first step towards the development of any Bayesian Network is defining its graphic structure, which requires all the variables of the domain to be ordered and causal relationships among them to be defined. The three elementary structures used to order variables in Bayesian Networks are: causal chains, causal tress and poly-trees (Fig. 2). Then, other more complex structures may be formed as a combination or enhancement of these elementary fragments. The computational burden would change as a consequence.

**Figure 2.** Graphic representation of a causal chain made up of three nodes (a), a causal tree (b) and a causal poly-tree (c).

Just to provide an example, the first structure depicted in Fig. 2-a could be useful to represent the case of sun radiation hitting and heating up the external surface of an envelope, as a consequence rising its temperature and making heat flux towards the interior. In case thermal heat gains cannot be directly measured, an alternative indirect estimation can be made: first we measure sun radiation hitting the external surface of the wall (e.g. by means of a solar pyranometer), then a model of the envelope is developed in the form of a Bayesian Network, which estimates in real-time heat gains by means of inference propagation. These models would get sun radiation intensity measured by the pyranometer as their input, then inference would infer the most likely value of internal heat gains (Fig. 3). Such an indirect estimation needs belief propagation (or probabilistic inference) based on dedicated algorithms. The notation in Fig. 3 corresponds to the notation in Fig. 2-a, provided that *X* stands for "sun radiation", *Y* stands for "wall temperature" and *Z* stands for "heat gain". In the latter figure four states were defined for each random variable (or event) and all the probability values were assumed as a uniform probability function (i.e. all their states are equally likely), because no learning had been done. The states of the variables were separated into intervals, according to the knowledge learned about the physical system by the model's developer, who assumed sun radiation to be limited between 0 and 400 W/m2 , wall temperature between 0 and 65°C, heat gains between 0 and 4.5 W/m2 . Once probability learning is performed as explained in the next sub-section, the network can be used to evaluate how evidence about the first variable (i.e. "sun radiation") is propagated towards the rightmost variable ("heat gain") passing through the intermediate one ("wall temperature").

( )

*BEL* (*x*)=*P*(*<sup>x</sup>* <sup>|</sup>*ξ*)= *<sup>P</sup>*(*x*) <sup>∙</sup> *<sup>P</sup>*(*<sup>ξ</sup>* <sup>|</sup> *<sup>x</sup>*)


where the likelihood vector is given by:

Then, conditioning and summing upon the values of *Y*:

*P*(*ξ* | *y*, *x*)∙*P*(*y* | *x*)=∑

*y*

where *y*<sup>i</sup>

and *x*<sup>i</sup>

ing) is given by:

still be written:

from *Z*" [1].

*λ*(*x*)=*P*(*ξ* | *x*)=∑

*Y X*

*M Pyx*

( ) ( ) ( )


é ù

*Py x Py x Py x*

*Py x Py x Py x*

11 21 1

*n*

Bayesian Networks for Supporting Model Based Predictive Control of Smart Buildings

*<sup>P</sup>*(*ξ*) =*β* ∙*P*(*x*)∙*λ*(*x*) (3)

*λ*(*x*)=*P*(*ξ* | *x*)=*P*(*Y* = *y* | *x*)=*M <sup>y</sup>*|*<sup>x</sup>* (4)

*BEL* (*x*)=*P*(*x* |*ξ*)=*β* ∙*P*(*x*)∙*λ*(*x*) (5)

*<sup>P</sup>*(*<sup>ξ</sup>* <sup>|</sup> *<sup>y</sup>*)∙*P*(*<sup>y</sup>* <sup>|</sup> *<sup>x</sup>*)=*<sup>M</sup> <sup>y</sup>*|*<sup>x</sup>* <sup>∙</sup>*λ*(*y*) (6)

(2)

11

http://dx.doi.org/10.5772/58470

( ) ( ) ( )

*m m n m*

are the generic states of variable nodes *Y* and *X*, respectively. In the case in Fig.


ë û

1 2


3 both *X* and *Y* might occur in one of the four possible states. Inference propagation needs to know first the probability assigned to every state of the second node *Y* conditioned to each state of the first node *X*. A more comprehensive notation is given by the form *BEL(x)*, which reflects the overall belief accorded to proposition *X=x* by all evidence so far received, hence *BEL(x)=P(x|ξ)*. Hence this represents the set of dynamic values obtained as a result of the updated belief accorded to proposition *X=x* once all the evidences about its parents is collected in the event *ξ*. In the simplest Bayesian topology, which is represented by "chains", if evidence *ξ={Y=y}* is observed, then from Bayes theorem the belief distribution of *X* (diagnostic reason‐

As a consequence, the likelihood of *x* is the *y*'s column of the matrix in eq. (2). It is stored as an input in node *Y*, so that it can be transmitted as a message to *X*, thus enabling *X* to compute its belief distribution *BEL(x)*, having as many states as the number of rows in matrix *M*y|x.

Propagation is possible even if *Y* is not observed directly but is supported by indirect obser‐ vation *ξ={Z=z}* of a descendant *Z* of *Y*, which means the chain is of the kind in Fig. 2-a. It can

*y*

where the fact that *Y* separates *X* from *Z* was used. Due to eq. (4), *λ(y)=P(ξ|Y)=M*z|y, so in this case it is derived from the conditional probability matrix between variables *Y* and *Z*. To the purpose of conditional updating in the chain, the state of *X* is irrelevant to infer the state of *Z*, once the state of *Y* is known, which might be explained by the sentence "*Y* d-separates *X*

ê ú = =

**Figure 3.** Example where a Bayesian chain can be used to infer indirect measurements (heat gains) from direct meas‐ urements (sun radiation intensity).

In order to make inference propagation feasible, a conditional probability table must be defined for each couple of variables in the chain. The model-based knowledge embedded in any Bayesian Network is represented graphically by a directed link between any variable X (e.g. "sun radiation") and Y (e.g. "wall temperature"), which is quantified by a fixed conditional probability matrix:

Bayesian Networks for Supporting Model Based Predictive Control of Smart Buildings http://dx.doi.org/10.5772/58470 11

$$M\_{\rm YIX} = P\left(y \mid \mathbf{x}\right) = \begin{bmatrix} P\left(y\_1 \mid \mathbf{x}\_1\right) & P\left(y\_2 \mid \mathbf{x}\_1\right) & \dots & P\left(y\_n \mid \mathbf{x}\_1\right) \\ \dots & \dots & \dots & \dots \\ P\left(y\_1 \mid \mathbf{x}\_m\right) & P\left(y\_2 \mid \mathbf{x}\_m\right) & \dots & P\left(y\_n \mid \mathbf{x}\_m\right) \end{bmatrix} \tag{2}$$

where *y*<sup>i</sup> and *x*<sup>i</sup> are the generic states of variable nodes *Y* and *X*, respectively. In the case in Fig. 3 both *X* and *Y* might occur in one of the four possible states. Inference propagation needs to know first the probability assigned to every state of the second node *Y* conditioned to each state of the first node *X*. A more comprehensive notation is given by the form *BEL(x)*, which reflects the overall belief accorded to proposition *X=x* by all evidence so far received, hence *BEL(x)=P(x|ξ)*. Hence this represents the set of dynamic values obtained as a result of the updated belief accorded to proposition *X=x* once all the evidences about its parents is collected in the event *ξ*. In the simplest Bayesian topology, which is represented by "chains", if evidence *ξ={Y=y}* is observed, then from Bayes theorem the belief distribution of *X* (diagnostic reason‐ ing) is given by:

*BEL* (*x*)=*P*(*<sup>x</sup>* <sup>|</sup>*ξ*)= *<sup>P</sup>*(*x*) <sup>∙</sup> *<sup>P</sup>*(*<sup>ξ</sup>* <sup>|</sup> *<sup>x</sup>*) *<sup>P</sup>*(*ξ*) =*β* ∙*P*(*x*)∙*λ*(*x*) (3)

where the likelihood vector is given by:

Just to provide an example, the first structure depicted in Fig. 2-a could be useful to represent the case of sun radiation hitting and heating up the external surface of an envelope, as a consequence rising its temperature and making heat flux towards the interior. In case thermal heat gains cannot be directly measured, an alternative indirect estimation can be made: first we measure sun radiation hitting the external surface of the wall (e.g. by means of a solar pyranometer), then a model of the envelope is developed in the form of a Bayesian Network, which estimates in real-time heat gains by means of inference propagation. These models would get sun radiation intensity measured by the pyranometer as their input, then inference would infer the most likely value of internal heat gains (Fig. 3). Such an indirect estimation needs belief propagation (or probabilistic inference) based on dedicated algorithms. The notation in Fig. 3 corresponds to the notation in Fig. 2-a, provided that *X* stands for "sun radiation", *Y* stands for "wall temperature" and *Z* stands for "heat gain". In the latter figure four states were defined for each random variable (or event) and all the probability values were assumed as a uniform probability function (i.e. all their states are equally likely), because no learning had been done. The states of the variables were separated into intervals, according to the knowledge learned about the physical system by the model's developer, who assumed

next sub-section, the network can be used to evaluate how evidence about the first variable (i.e. "sun radiation") is propagated towards the rightmost variable ("heat gain") passing

**Figure 3.** Example where a Bayesian chain can be used to infer indirect measurements (heat gains) from direct meas‐

In order to make inference propagation feasible, a conditional probability table must be defined for each couple of variables in the chain. The model-based knowledge embedded in any Bayesian Network is represented graphically by a directed link between any variable X (e.g. "sun radiation") and Y (e.g. "wall temperature"), which is quantified by a fixed conditional

, wall temperature between 0 and 65°C,

. Once probability learning is performed as explained in the

sun radiation to be limited between 0 and 400 W/m2

10 Dynamic Programming and Bayesian Inference, Concepts and Applications

through the intermediate one ("wall temperature").

heat gains between 0 and 4.5 W/m2

urements (sun radiation intensity).

probability matrix:

$$\lambda\lambda \text{(\(x\) = P(\(\xi \mid \ge) = P(Y = y \mid \ge) = M\_{y \mid \ge}))}$$

As a consequence, the likelihood of *x* is the *y*'s column of the matrix in eq. (2). It is stored as an input in node *Y*, so that it can be transmitted as a message to *X*, thus enabling *X* to compute its belief distribution *BEL(x)*, having as many states as the number of rows in matrix *M*y|x.

Propagation is possible even if *Y* is not observed directly but is supported by indirect obser‐ vation *ξ={Z=z}* of a descendant *Z* of *Y*, which means the chain is of the kind in Fig. 2-a. It can still be written:

$$BEL\ \text{(x)} = P(\text{x} \mid \xi) = \beta \bullet P(\text{x}) \bullet \lambda(\text{x})\tag{5}$$

Then, conditioning and summing upon the values of *Y*:

$$\Lambda(\mathbf{x}) = P\{\boldsymbol{\xi} \mid \mathbf{x}\} = \sum\_{\mathbf{y}} P\{\boldsymbol{\xi} \mid \mathbf{y}, \mathbf{x}\} \bullet P\{\mathbf{y} \mid \mathbf{x}\} = \sum\_{\mathbf{y}} P\{\boldsymbol{\xi} \mid \mathbf{y}\} \bullet P\{\mathbf{y} \mid \mathbf{x}\} = M\_{\mathbf{y} \mid \mathbf{x}} \bullet \Lambda(\mathbf{y}) \tag{6}$$

where the fact that *Y* separates *X* from *Z* was used. Due to eq. (4), *λ(y)=P(ξ|Y)=M*z|y, so in this case it is derived from the conditional probability matrix between variables *Y* and *Z*. To the purpose of conditional updating in the chain, the state of *X* is irrelevant to infer the state of *Z*, once the state of *Y* is known, which might be explained by the sentence "*Y* d-separates *X* from *Z*" [1].

Starting from these concepts, further and more complex computations can be accomplished, so as to propagate inference throughout networks with any kind of admissible connections, but they all need the prior definition of conditional probability tables between linked variables.

*P*(*Xi* | *X*1, …, *Xn*) =*P*(*Xi* |*Π<sup>i</sup>*

the rest of *B*s.

of *Π*<sup>i</sup>

Given *B*s, let *ri* be the number of states of variable *X*<sup>i</sup>

encompasses all the states of *Π*<sup>i</sup> (i.e., parents of *x*<sup>i</sup>

any variable of *B*s given its parents are in the state j) [23]:

number of observations in the database *D*, in which *x*<sup>i</sup>

. Let *θ*ijk denote the physical probability of *X*<sup>i</sup>

*ϑij* ≡⋃*i*=1

Adopting this notation, the following equivalences are valid [23]:

*<sup>n</sup>* {*ϑijk* } *<sup>ϑ</sup>Bs* ≡⋃*i*=1

*p*(*ϑij* | *Bs*, *ξ*) =*c* ∙ ∏

distribution by adding empirical information meant by the parameter *N*ijk:

*p*(*ϑij* | *Bs*, *ξ*) =*c* ∙ ∏

As a consequence, a Bayesian Network is made up of a set of local conditional probability distributions and a set of assertions of conditional independence. These conditional inde‐ pendences can be described like in eq. (7), where the parents of *X*<sup>i</sup> are grouped in the set *Π*<sup>i</sup>

and are useful to "d-separate" any variable (i.e., to make it conditionally independent) from

; and let *qi* = ∏

Bayesian Networks for Supporting Model Based Predictive Control of Smart Buildings

) and all the variables *x*<sup>i</sup>

*=k* and *Π*<sup>i</sup>

*=k* given *Π*<sup>i</sup>

*<sup>n</sup>* <sup>⋃</sup> *<sup>j</sup>*=1 *qi* {*ϑij*

In other words, eq. (8a) states that the two notations are equivalent and represent the case where all the physical probabilities of *xi* =*k* are grouped, once any *x*<sup>i</sup> in domain *U* is selected and the states of its parents is fixed at any Π*<sup>i</sup>* = *j*. As a consequence, eq. (8b) represents all the physical probabilities of the joint space *B*s (i.e. the Bayesian Network structure), because it

"⋃ … " stands for the union of all the states represented by that expression. The probability distribution of child nodes must be described by a probability distribution, which may be then updated according to evidence acquired by its parents. The software HuginTM uses the EM algorithm, which defines a Dirichlet distribution for each variable *θ*ij (i.e. one distribution for

> *k*=1 *ri ϑijk Nijk* '

where *c* is a normalization constant (in the form of a gamma function), *N'*ijk is the multinomial parameters of that distribution, limited between 0 and 1, finally *ξ* is the observed evidence. The Dirichlet distribution describes the probability of one of the variables *x*<sup>i</sup> when it varies all over its states. One of the advantages provided by using a Dirichlet distribution is that it is completely defined by its multinomial parameters. In addition, its shape can be easily adapted to fit various probability density function. Finally, and more importantly, it can be demon‐ strated that the learning process is made easier in this way [23]. In fact, the values *N'*ijk represent expert knowledge introduced in the network by the developer themselves. Then, if *N*ijk is the

> *k*=1 *ri ϑijk Nijk* '

*xl* ∈*Π<sup>i</sup> rl*

) (7)

http://dx.doi.org/10.5772/58470

*=j* for *i=1,..,n, j=1,..,q*<sup>i</sup>

}, (8)


*=j*, we are able to update that

<sup>+</sup>*Nijk* -1 (10)

be the number of states

in domain *U*. Where

,

13

, *k=1,..,r*<sup>i</sup> .
