**1. Introduction**

Many real-world systems exhibit some level or form of randomness. This is reflected in their mathematical models, which are often probabilistic. There are two basic strategies to interpret the randomness captured by probabilistic models. The frequency of occurrence of a random phenomenon is a measurable quantity that can be used to predict how likely the phenomenon can be observed in future. The other approach quantifies the uncertainty about the occurrence of random phenomenon more subjectively, that is, as a degree of belief or expectation of observing the phenomenon, given domain knowledge and the past experiences. This latter approach led to a broad area of general Bayesian methods [1], Bayesian data analyses [2], Bayesian signal processing [3], Bayesian regression [4], Bayesian machine learning [5], and

Bayesian optimization [6]. In the tasks of statistical inference, the assumption of knowing (or not) a prior distribution crucially affects the feasibility as well as the structure of estimators [7]. There is also an intimate connection between Bayesian probabilistic models and making causal inferences [8], as will be discussed in Section 3.

Bayesian methods found widespread applications in many probabilistic modeling frameworks. These methods are all rooted in the surprisingly simple Bayes's theorem, that is,

$$p(\theta|\mathbf{x}) = \frac{p(\mathbf{x}|\theta)p(\theta)}{p(\mathbf{x})} \tag{1}$$

where *p*ð Þ� denotes the (for continuous random variables) or the probability (for discrete random variables). Eq. (1) quantifies how our belief about the parameter *θ* changes from its prior, *p*ð Þ*θ* , to posterior, *p*ð Þ *θ*j*x* , after observing data *x*. The conditional term, *p x*ð Þ j*θ* , represents the likelihood of the parameter *θ*, given observations, *x*. The scaling term, *p x*ð Þ, in Eq. (1) is usually referred to as evidence. It should be noted that both the parameter and the data can be assumed in an arbitrary number of dimensions.

The majority of tasks in Bayesian inference involves explicitly or implicitly solving one of the following integrals (or sums, for discrete random variables), that is,

$$\text{marginalization:}\ p(\mathbf{x}) = \int\ p(\mathbf{x},\theta)d\theta\tag{2}$$

$$\text{sammarization: } E[f(\mathbf{x})|\theta] = \begin{cases} f(\mathbf{x}) \ p(\mathbf{x}|\theta) d\mathbf{x} \\ \end{cases} \tag{3}$$

$$\text{prediction:}\ p(\mathbf{x}\_{t+1}) = \begin{cases} p(\mathbf{x}\_{t+1}|\mathbf{x}\_t) & p(\mathbf{x}\_t)d\mathbf{x}\_t \\ \end{cases} \tag{4}$$
 
$$\text{do} \qquad \text{do} \qquad \begin{cases} p(\mathbf{x}\_{t+1}|\mathbf{x}\_t) & p(\mathbf{x}\_t)d\mathbf{x}\_t \end{cases} \tag{4}$$

$$p(\mathbf{x}\_{t+1}|\mathbf{x}\_t) = \int p(\mathbf{x}\_{t+1}|\theta) \ p(\theta|\mathbf{x}\_t) d\theta.$$

Unfortunately, in most but a few real-world scenarios, the expressions (2)–(4) are not mathematically tractable. In particular, the distributions often involve multiple sums and/or integrals, and their closed-form expressions cannot be obtained. The distributions are sometimes only known up to a scaling constant. Then, even numerically computing Eq. (1) can be rather challenging, since many complex distributions are multi-modal with a large number of local minima and maxima. Moreover, in online data processing, the distributions must be updated continuously as soon as the new data arrives.

The rest of this chapter is organized as follows. The strategies for performing Bayesian inferences with intractable distributions are outlined in Section 2 including sampling, filtering, approximation, and likelihood-free methods. The applications of Bayesian inferences are discussed in Section 3 including Bayesian experiment design, Bayesian hypothesis testing, Bayesian machine learning, and Bayesian optimization. Bayesian Monte Carlo simulations are reviewed in Section 4. Although the chapter mostly reviews known concepts and frameworks in Bayesian analysis of probabilistic models, Section 4 contributes a description of augmented Monte Carlo simulations. These simulations aim at providing explainability and improving information gain.

The references cited at the end of this chapter are by no means comprehensive; rather, they are suggestions of initial readings where to find further information about the topics discussed in this chapter.
