A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang Distribution and Its Applications

Kaisar Ahmad and Sheikh Parvaiz Ahmad

## Abstract

In this chapter, Erlang distribution is considered. For parameter estimation, maximum likelihood method of estimation, method of moments and Bayesian method of estimation are applied. In Bayesian methodology, different prior distributions are employed under various loss functions to estimate the rate parameter of Erlang distribution. At the end the simulation study is conducted in R-Software to compare these methods by using mean square error with varying sample sizes. Also the real life applications are examined in order to compare the behavior of the data sets in the parametric estimation. The comparison is also done among the different loss functions.

Keywords: Erlang distribution, prior distributions, loss functions, simulation study, applications

### 1. Introduction

Erlang distribution is a continuous probability distribution with wide applicability, primarily due to its relation to the exponential and gamma distributions. The Erlang distribution was developed by Erlang [1] to examine the number of telephone calls that could be made at the same time to switching station operators. This distribution can be expressed as waiting time and message length in telephone traffic. If the duration of individual calls are exponentially distributed then the duration of succession of calls is the Erlang distribution. The Erlang variate becomes gamma variate when its shape parameter is an integer (for details see Evans et al. [2]). Bhattacharyya and Singh [3] obtained Bayes estimator for the Erlangian queue under two prior densities. Haq and Dey [4] addressed the problem of Bayesian estimation of parameters for the Erlang distribution assuming different independent informative priors. Suri et al. [5] used Erlang distribution to design a simulator for time estimation of project management process. Damodaran et al. [6] obtained the expected time between failure measures. Further, they showed that the predicted failure times are closer to the actual failure times. Jodra [7] showed the

procedure of computing the asymptotic expansion of the median of Erlang distribution.

The probability density function of an Erlang variate is given by

$$f(\mathbf{x}; \lambda, k) = \frac{\lambda^k}{(k-1)!} \mathbf{x}^{k-1} e^{-\lambda \mathbf{x}} \quad for \mathbf{x} > 0, k \in N and \lambda > 0. \tag{1}$$

2. Methods used for parameter estimation

DOI: http://dx.doi.org/10.5772/intechopen.85627

2.1 Maximum likelihood (MLH) estimation

probability of getting the observed data is maximum.

accordingly.

as the likelihood function.

91

ð Þ x1; x2; …; xn which is given by

call ^λ as ML estimators. Thus ^λ is the solution

iff <sup>∂</sup>L xð Þ <sup>j</sup><sup>λ</sup>

their extreme values at ^λ. Therefore, the equation becomes

<sup>∂</sup>L xð Þ <sup>j</sup><sup>λ</sup>

a form which is more convenient from practical point of view.

1 L xð Þ jλ

Fisher in a series of fundamental papers round about 1930.

In this chapter, we have used different approaches for parameter estimation. The first two methods come under the classical approach which was founded by

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

The alternative approach is the Bayesian approach which was first discovered by Reverend Thomas Bayes. In this chapter, we have used two different priors for the parameter estimation. Also three loss functions are used which are discussed in their respective sections. A number of symmetric and asymmetric loss functions used by various researchers; see Zellner [8], Ahmad and Ahmad [9], Ahmad et al. [10], etc. These methods of estimations are elaborated in their respective sections

The most general method of estimation is known as maximum likelihood (MLH) estimators, which was initially formulated by Gauss. Fisher in the early 1920 firstly introduced MLH as general method of estimation and later on developed by him in a series of papers. He revealed the advantages of this method by showing that it yields sufficient estimators, which are asymptotically MVUES. Thus the important feature of this method is that we look at the value of the random sample and then select our estimate of the unknown population parameter, the value of which the

Suppose the observed data sample values are ð Þ x1; x2; …; xn . When X is a discrete random variable, we can write P Xð <sup>1</sup> ¼ x1;X<sup>2</sup> ¼ x2; …;Xn ¼ xnÞ ¼ f xð Þ <sup>1</sup>; x2; …; xn , which is the value of joint probability distribution at the sample point ð Þ x1; x2; …; xn . Since the sample values has been observed and are therefore fixed numbers, we consider f x1; x2, ð Þ ; …; xn; λ as the value of a function of the parameter λ, referred to

Similarly the definition applies when the random sample comes from a continu-

Since the principle of maximum likelihood consists in finding an estimator of the parameter which maximizes the likelihood function for variation in the parameter. Thus if there exists a function ^<sup>λ</sup> <sup>¼</sup> ^λð Þ <sup>x</sup>1; <sup>x</sup>2; …; xn of the sample values which maximizes L xð Þ <sup>j</sup><sup>λ</sup> for variation in <sup>λ</sup>, then ^<sup>λ</sup> is to be taken as the estimator of <sup>λ</sup>. Usually we

<sup>∂</sup><sup>λ</sup> <sup>¼</sup> <sup>0</sup> and <sup>∂</sup><sup>2</sup>

Since L xð Þ jλ >0, so log L xð Þ jλ which shows that L xð Þ jλ and log L xð Þ jλ attains

<sup>∂</sup><sup>λ</sup> <sup>¼</sup> <sup>0</sup> )

i¼1

f xi ð Þ ; λ :

L xð Þ jλ <sup>∂</sup>λ<sup>2</sup> < 0:

<sup>∂</sup> log L xð Þ <sup>j</sup><sup>λ</sup>

<sup>∂</sup><sup>λ</sup> <sup>¼</sup> <sup>0</sup>

ous population but in that case f x1; x2, ð Þ ; …; xn; λ is the value of joint pdf at the sample point ð Þ x1; x2; …; xn . That is, the likelihood function at the sample value

L x<sup>ð</sup> <sup>1</sup>; <sup>x</sup>2; …; xnjλÞ ¼ <sup>Y</sup><sup>m</sup>

where λ and k are the rate and the shape parameters, respectively, such that is k an integer number.

#### 1.1 Graphical representation of pdf for Erlang distribution

In this chapter, Erlang distribution is considered. Some structural properties of Erlang distribution have been obtained. The parameter estimation of Erlang distribution is obtained by employing the maximum likelihood method of estimation, method of moments and Bayesian method of estimation in different sections of this chapter. In Bayesian approach, the parameters are estimated by using Jeffrey's and Quasi priors under different loss functions (Figure 1).

#### 1.2 Relationship of Erlang distribution with other distributions


Thus from the above descriptions, we can say that exponential distribution and Chi-square distribution are the sub-models of Erlang distribution.

Figure 1. Pdf's of Erlang distribution for different values of lambda and k.

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

#### 2. Methods used for parameter estimation

procedure of computing the asymptotic expansion of the median of Erlang

The probability density function of an Erlang variate is given by

xk�<sup>1</sup> e

where λ and k are the rate and the shape parameters, respectively, such that

In this chapter, Erlang distribution is considered. Some structural properties of Erlang distribution have been obtained. The parameter estimation of Erlang distribution is obtained by employing the maximum likelihood method of estimation, method of moments and Bayesian method of estimation in different sections of this chapter. In Bayesian approach, the parameters are estimated by using Jeffrey's and

i. The gamma distribution is a generalized form of the Erlang distribution.

iii. If the scale parameter is 2, then Erlang distribution reduces to Chi-square

Thus from the above descriptions, we can say that exponential distribution and

ii. If the shape parameter k is 1, then Erlang distribution reduces to exponential

�λ<sup>x</sup> for x> 0, k∈ N andλ>0: (1)

ð Þ k � 1 !

1.1 Graphical representation of pdf for Erlang distribution

Quasi priors under different loss functions (Figure 1).

distribution with 2 degrees of freedom.

Pdf's of Erlang distribution for different values of lambda and k.

1.2 Relationship of Erlang distribution with other distributions

Chi-square distribution are the sub-models of Erlang distribution.

f xð Þ¼ ; <sup>λ</sup>; <sup>k</sup> <sup>λ</sup><sup>k</sup>

distribution.

Statistical Methodologies

is k an integer number.

distribution.

Figure 1.

90

In this chapter, we have used different approaches for parameter estimation. The first two methods come under the classical approach which was founded by Fisher in a series of fundamental papers round about 1930.

The alternative approach is the Bayesian approach which was first discovered by Reverend Thomas Bayes. In this chapter, we have used two different priors for the parameter estimation. Also three loss functions are used which are discussed in their respective sections. A number of symmetric and asymmetric loss functions used by various researchers; see Zellner [8], Ahmad and Ahmad [9], Ahmad et al. [10], etc.

These methods of estimations are elaborated in their respective sections accordingly.

#### 2.1 Maximum likelihood (MLH) estimation

The most general method of estimation is known as maximum likelihood (MLH) estimators, which was initially formulated by Gauss. Fisher in the early 1920 firstly introduced MLH as general method of estimation and later on developed by him in a series of papers. He revealed the advantages of this method by showing that it yields sufficient estimators, which are asymptotically MVUES. Thus the important feature of this method is that we look at the value of the random sample and then select our estimate of the unknown population parameter, the value of which the probability of getting the observed data is maximum.

Suppose the observed data sample values are ð Þ x1; x2; …; xn . When X is a discrete random variable, we can write P Xð <sup>1</sup> ¼ x1;X<sup>2</sup> ¼ x2; …;Xn ¼ xnÞ ¼ f xð Þ <sup>1</sup>; x2; …; xn , which is the value of joint probability distribution at the sample point ð Þ x1; x2; …; xn . Since the sample values has been observed and are therefore fixed numbers, we consider f x1; x2, ð Þ ; …; xn; λ as the value of a function of the parameter λ, referred to as the likelihood function.

Similarly the definition applies when the random sample comes from a continuous population but in that case f x1; x2, ð Þ ; …; xn; λ is the value of joint pdf at the sample point ð Þ x1; x2; …; xn . That is, the likelihood function at the sample value ð Þ x1; x2; …; xn which is given by

$$L(\mathfrak{x}\_1, \mathfrak{x}\_2, \dots, \mathfrak{x}\_n|\lambda) = \prod\_{i=1}^m f(\mathfrak{x}\_i, \lambda).$$

Since the principle of maximum likelihood consists in finding an estimator of the parameter which maximizes the likelihood function for variation in the parameter. Thus if there exists a function ^<sup>λ</sup> <sup>¼</sup> ^λð Þ <sup>x</sup>1; <sup>x</sup>2; …; xn of the sample values which maximizes L xð Þ <sup>j</sup><sup>λ</sup> for variation in <sup>λ</sup>, then ^<sup>λ</sup> is to be taken as the estimator of <sup>λ</sup>. Usually we call ^λ as ML estimators. Thus ^λ is the solution

$$\sharp \mathcal{Y} \quad \frac{\partial L(\mathfrak{x}|\lambda)}{\partial \lambda} = 0 \text{ and } \frac{\partial^2 L(\mathfrak{x}|\lambda)}{\partial \lambda^2} < 0.$$

Since L xð Þ jλ >0, so log L xð Þ jλ which shows that L xð Þ jλ and log L xð Þ jλ attains their extreme values at ^λ. Therefore, the equation becomes

$$\frac{1}{L(\boldsymbol{\mathfrak{x}}|\boldsymbol{\lambda})} \frac{\partial L(\boldsymbol{\mathfrak{x}}|\boldsymbol{\lambda})}{\partial \boldsymbol{\lambda}} = \mathbf{0} \implies \frac{\partial \log L(\boldsymbol{\mathfrak{x}}|\boldsymbol{\lambda})}{\partial \boldsymbol{\lambda}} = \mathbf{0}$$

a form which is more convenient from practical point of view.

The MLH estimation of the rate parameter of Erlang distribution is obtained in the following theorem:

Theorem 2.1: Let ð Þ x1; x2; …; xn be a random sample of size n from Erlang density function Eq. (1), then the maximum likelihood estimator of λ is given by

$$
\hat{\lambda} = \frac{nk}{\sum\_{i=1}^{n} \mathcal{X}\_i}.
$$

Proof: The likelihood function of random sample of size n having Erlang density function Eq. (1) is given by

$$L(\mathbf{x}; \boldsymbol{\lambda}, \boldsymbol{k}) = \left(\frac{\left(\boldsymbol{\lambda}\right)^{\boldsymbol{k}}}{\left(\boldsymbol{k} - \mathbf{1}\right)!}\right)^{n} \prod\_{i=1}^{n} \boldsymbol{\kappa}\_{i}^{\boldsymbol{k}-1} e^{-\boldsymbol{\lambda} \sum\_{i=1}^{n} \boldsymbol{x}\_{i}}.\tag{2}$$

where mi is the ith moment about origin in the sample.

density function Eq. (1), then the moment estimator of λ is given by

μ0 <sup>r</sup> ¼ ∞ð

0 xr

μ0

μ0

The MM estimation of the rate parameter of Erlang distribution is obtained in

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

Theorem 2.2: Let ð Þ x1; x2; …; xn be a random sample of size n from Erlang

^<sup>λ</sup> <sup>¼</sup> <sup>k</sup> x � �:

<sup>m</sup>^ <sup>r</sup> <sup>¼</sup> <sup>1</sup> <sup>n</sup> <sup>∑</sup> n i¼1

Proof: If the numbers ð Þ x1; x2; …; xn represents a set of data, then an unbiased

The rth moment of two parameter Erlang distribution about origin is given by

<sup>r</sup> <sup>¼</sup> <sup>Γ</sup>ð Þ <sup>r</sup> <sup>þ</sup> <sup>k</sup> λr

<sup>2</sup> <sup>¼</sup> k kð Þ <sup>þ</sup> <sup>1</sup> <sup>λ</sup><sup>2</sup> :

> k λ2 � �

k λ � �<sup>2</sup>

ffiffiffiffiffiffiffiffiffiffi 1 k

¼ 1 k

λ2.

σ2 μ<sup>0</sup> <sup>2</sup> 1 ¼

> σ2 μ<sup>0</sup> <sup>2</sup> 1

On taking the square roots of Eq. (8), we have the coefficient of variation

The rate parameter λ can be then estimated using the following equation

σ μ0 1 ¼

μ0 <sup>1</sup> <sup>¼</sup> <sup>k</sup> λ :

ð Þ k � 1 !

� � are the estimators of respectively.

xi (5)

f xð Þ ; λ; k dx (6)

<sup>1</sup> , we get an expression which is a function of k only and

: (8)

s� �: (9)

: (7)

Then by the method of moments ^λ1; ^λ2; …; ^λ<sup>p</sup>

DOI: http://dx.doi.org/10.5772/intechopen.85627

estimator for the rth moment about origin is

where m^ <sup>r</sup> stands for the estimate of mr.

Using Eq. (5) in Eq. (6), we have

If r = 1 in Eq. (7), we get

If r = 2, then Eq. (7) becomes

When we divide σ<sup>2</sup> by μ<sup>0</sup> <sup>2</sup>

is given by

93

Thus the variance is given by <sup>σ</sup><sup>2</sup> <sup>¼</sup> <sup>k</sup>

the following theorem:

The log likelihood function is given by

$$\log L(\mathbf{x}; \boldsymbol{\lambda}, \boldsymbol{\beta}) = n\boldsymbol{k}\log\boldsymbol{\lambda} + (\boldsymbol{k} - \mathbf{1})\sum\_{i=1}^{n} \log \mathbf{x}\_{i} - \boldsymbol{\lambda}\sum\_{i=1}^{n} \boldsymbol{\chi}\_{i} - n\log\left(\boldsymbol{k} - \mathbf{1}\right)\text{.}\tag{3}$$

Differentiating Eq. (3) w.r.t. λ and equating to zero, we get

$$\frac{\partial}{\partial \boldsymbol{\lambda}} \left( nk \log \boldsymbol{\lambda} + (k-1) \sum\_{i=1}^{n} \log \boldsymbol{\chi}\_{i} - \boldsymbol{\lambda} \sum\_{i=1}^{n} \boldsymbol{\chi}\_{i} - n \log (k-1)! . \right) = \mathbf{0}$$

$$\hat{\boldsymbol{\lambda}} = \frac{nk}{\sum\_{i=1}^{n} \boldsymbol{\chi}\_{i}}. \tag{4}$$

#### 2.2 Method of moments (MM)

One of the simplest and oldest methods of estimation is the method of moments. The method of moments was discovered by Karl Pearson in 1894. It is a method of estimation of population parameters such as mean, variance, etc. (which need not be moments), by equating sample moments with unobservable population moments and then solving those equations for the quantities to be estimated. The method of moments is special case when we need to estimate some known function of finite number of unknown moments.

Suppose f x; λ1; λ2; …; λ<sup>p</sup> � � be the density function of the parent population with p parameters λ1; λ2; …; λ<sup>p</sup> � �. Let μ<sup>0</sup> <sup>s</sup> be the sth moment of a random variable about origin and is given by

$$\mu'\_{\mathfrak{s}} = \bigcap\_{-\infty}^{\infty} \mathfrak{x}^{\mathfrak{s}} f\left(\mathfrak{x}; \lambda\_1, \lambda\_2, \dots, \lambda\_p\right) \; ; r = 1, 2, \dots, p.$$

In general μ<sup>0</sup> <sup>1</sup>; μ<sup>0</sup> <sup>2</sup>; …; μ<sup>0</sup> p � � will be the functions of parameters <sup>λ</sup>1; <sup>λ</sup>2; …; <sup>λ</sup><sup>p</sup> � �. Let xi ð Þ ; i ¼ 1; 2; …; n be a random sample of size n from the given population. The method of moments consists in solving the p-equations (i) for λ1; λ2; …; λ<sup>p</sup> � � in terms of μ<sup>0</sup> <sup>1</sup>; μ<sup>0</sup> <sup>2</sup>; …; μ<sup>0</sup> p � �. Then replacing these moments <sup>μ</sup><sup>0</sup> s ; <sup>s</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>; <sup>3</sup>; …; <sup>p</sup> � � by the sample moments

$$\mathbf{e}, \mathbf{g}, \,\,\hat{\lambda}\_i = \hat{\lambda}\left(\mu'\_1, \mu'\_2, \dots, \mu'\_p\right) = \lambda\_i\left(m'\_1, m'\_2, \dots, m'\_p\right) \,; \, i = 1, 2, \dots, p.$$

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

where mi is the ith moment about origin in the sample.

Then by the method of moments ^λ1; ^λ2; …; ^λ<sup>p</sup> � � are the estimators of respectively. The MM estimation of the rate parameter of Erlang distribution is obtained in the following theorem:

Theorem 2.2: Let ð Þ x1; x2; …; xn be a random sample of size n from Erlang density function Eq. (1), then the moment estimator of λ is given by

$$
\hat{\lambda} = \begin{pmatrix} k \\ \overline{\underline{x}} \end{pmatrix}.
$$

Proof: If the numbers ð Þ x1; x2; …; xn represents a set of data, then an unbiased estimator for the rth moment about origin is

$$
\hat{m}\_r = \frac{1}{n} \sum\_{i=1}^n \boldsymbol{\omega}\_i \tag{5}
$$

where m^ <sup>r</sup> stands for the estimate of mr.

The rth moment of two parameter Erlang distribution about origin is given by

$$\mu\_r' = \int\_0^\infty \varkappa^r f(\varkappa; \lambda, k) d\varkappa \tag{6}$$

Using Eq. (5) in Eq. (6), we have

$$
\mu\_r' = \frac{\Gamma(r+k)}{\lambda^r(k-1)!}.\tag{7}
$$

If r = 1 in Eq. (7), we get

The MLH estimation of the rate parameter of Erlang distribution is obtained in

Theorem 2.1: Let ð Þ x1; x2; …; xn be a random sample of size n from Erlang density function Eq. (1), then the maximum likelihood estimator of λ is given by

> ^<sup>λ</sup> <sup>¼</sup> nk ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi :

ð Þ k � 1 ! !<sup>n</sup> <sup>Y</sup><sup>n</sup>

> n i¼1

log xi � λ ∑

One of the simplest and oldest methods of estimation is the method of moments. The method of moments was discovered by Karl Pearson in 1894. It is a method of estimation of population parameters such as mean, variance, etc. (which need not be moments), by equating sample moments with unobservable population moments and then solving those equations for the quantities to be estimated. The method of moments is special case when we need to estimate some known function

� � be the density function of the parent population with p

� � ;<sup>r</sup> <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, p:

will be the functions of parameters λ1; λ2; …; λ<sup>p</sup>

s

; i ¼ 1, 2, …, p:

<sup>s</sup> be the sth moment of a random variable about

� �

^<sup>λ</sup> <sup>¼</sup> nk ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi

L xð Þ¼ ; <sup>λ</sup>; <sup>k</sup> ð Þ<sup>λ</sup> <sup>k</sup>

Differentiating Eq. (3) w.r.t. λ and equating to zero, we get

n i¼1

The log likelihood function is given by

log L xð Þ¼ ; λ; β nk log λ þ ð Þ k � 1 ∑

<sup>∂</sup><sup>λ</sup> nk log <sup>λ</sup> <sup>þ</sup> ð Þ <sup>k</sup> � <sup>1</sup> <sup>∑</sup>

Proof: The likelihood function of random sample of size n having Erlang density

i¼1 xi k�1 e �λ ∑ n i¼1 xi

log xi � λ ∑

n i¼1

n i¼1

xi � n log ð Þ k � 1 !:

: (2)

xi � n log ð Þ k � 1 !: (3)

¼ 0

� �. Let

� � in terms

; <sup>s</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>; <sup>3</sup>; …; <sup>p</sup> � � by the sample

: (4)

the following theorem:

Statistical Methodologies

function Eq. (1) is given by

∂

2.2 Method of moments (MM)

of finite number of unknown moments.

� �. Let μ<sup>0</sup>

μ0 <sup>s</sup> ¼

� �

<sup>1</sup>; μ<sup>0</sup> <sup>2</sup>; …; μ<sup>0</sup> p

<sup>1</sup>; μ<sup>0</sup> <sup>2</sup>; …; μ<sup>0</sup> p

� �

∞ð

xs

f x; λ1; λ2; …; λ<sup>p</sup>

xi ð Þ ; i ¼ 1; 2; …; n be a random sample of size n from the given population. The method of moments consists in solving the p-equations (i) for λ1; λ2; …; λ<sup>p</sup>

<sup>1</sup>; m<sup>0</sup>

<sup>2</sup>; ::…; m<sup>0</sup> p

� �

. Then replacing these moments μ<sup>0</sup>

¼ λ<sup>i</sup> m<sup>0</sup>

�∞

Suppose f x; λ1; λ2; …; λ<sup>p</sup>

parameters λ1; λ2; …; λ<sup>p</sup>

origin and is given by

In general μ<sup>0</sup>

� �

e.g., ^λ<sup>i</sup> <sup>¼</sup> ^λ μ<sup>0</sup>

of μ<sup>0</sup> <sup>1</sup>; μ<sup>0</sup> <sup>2</sup>; …; μ<sup>0</sup> p

92

moments

$$
\mu\_1' = \frac{k}{\lambda} \cdot
$$

If r = 2, then Eq. (7) becomes

$$
\mu\_2' = \frac{k(k+1)}{\lambda^2}.
$$

Thus the variance is given by <sup>σ</sup><sup>2</sup> <sup>¼</sup> <sup>k</sup> λ2.

When we divide σ<sup>2</sup> by μ<sup>0</sup> <sup>2</sup> <sup>1</sup> , we get an expression which is a function of k only and is given by

$$\frac{\sigma^2}{\mu\_1'^2} = \frac{\left(\frac{k}{\lambda^2}\right)}{\left(\frac{k}{\lambda}\right)^2}$$

$$\frac{\sigma^2}{\mu\_1'^2} = \frac{1}{k}.\tag{8}$$

On taking the square roots of Eq. (8), we have the coefficient of variation

$$
\frac{\sigma}{\mu\_1'} = \sqrt{\left(\frac{1}{k}\right)}.\tag{9}
$$

The rate parameter λ can be then estimated using the following equation

$$m\_1' = \mu\_1'.$$

directly to make inferences about the parameters. Probability statements about parameters must be interpreted as "degree of belief." We revise our beliefs about parameters after getting the data by using Bayes' theorem. This gives our posterior distribution which gives the relative weights we give to each parameter value after analyzing the data. The posterior distribution comes from two sources: the prior distribution and the observed data. This means that the inference is based on the

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

actual occurring data, not all possible data sets that might have occurred.

are given as below:

information.

integral Ð

is improper.

∞

�∞

which are given below:

3.1.1 Jeffrey's prior

Jeffery's [36].

partials

95

3.1 Prior distributions used

degree of belief regarding the parameter.

DOI: http://dx.doi.org/10.5772/intechopen.85627

In this section, posterior distribution of Erlang distribution is obtained by using Jeffrey's prior and Quasi prior. The rate parameter of Erlang distribution is estimated with the help of different loss functions. For parameter estimation we have used the approach as is used by Ahmad et al. [34], Ahmad et al. [35], etc. Some important prior distributions and loss functions which we have used in this article

Prior distribution is the basic part of Bayesian analysis which represents all that is known or assumed about the parameter. Usually the prior information is subjective and is based on a person's own experience and judgment, a statement of one's

Another important feature of the Bayesian analysis is the choice of the prior distribution. If the data have sufficient signal, even a bad prior will still not greatly influence the posterior. We can examine the impact of prior by observing the stability of posterior distribution related to different choices of priors. If the posterior distribution is highly dependent on the prior, then the data (the likelihood function) may not contain sufficient information. However, if the posterior is relatively stable over a choice of priors, then the data indeed contains significant

Prior distribution may be categorical in different ways. One common classification is a dichotomy that separated "proper" and "improper" priors.

A prior distribution is proper if it does not depend on the data and the value of

data and the distribution does not integrate or sum to one then we say that the prior

In this chapter, we have used two different priors Jeffrey's prior and Quasi prior

An invariant form for the prior probability in estimation problems is given by

<sup>I</sup>ð Þ<sup>λ</sup> <sup>p</sup> <sup>∝</sup> �<sup>E</sup> <sup>∂</sup><sup>2</sup> log <sup>L</sup>ð Þ <sup>λ</sup>j<sup>x</sup>

where Ið Þλ is the Fisher information for the parameter λ.When there are multiple parameters I is the Fisher information matrix, the matrix of the expected second

∂λ<sup>2</sup> � � � � <sup>1</sup>

2 :

The general formula of the Jeffreys prior, which is defined by

<sup>g</sup>ð Þ<sup>λ</sup> <sup>∝</sup> ffiffiffiffiffiffiffiffi

gð Þλ dλ or summation ∑ gð Þλ is one. If the prior does not depend on the

If r = 1, in Eq. (5), then

$$m\_1' = \overline{\mathfrak{X}}.$$

Also if r = 1 in Eq. (7), then

$$
\mu\_1' = \frac{k}{\lambda} \cdot
$$

Thus,

m0 <sup>1</sup> ¼ μ<sup>0</sup> 1

<sup>x</sup> <sup>¼</sup> <sup>k</sup> λ , where x is the mean of the data and

$$
\hat{\lambda} = \begin{pmatrix} k \\ \overline{\overline{\boldsymbol{\pi}}} \end{pmatrix}.
$$

#### 3. Bayesian method of estimation

Nowadays, the Bayesian school of thought is garnering more attention and at an increasing rate. This thought of statistics was given by Reverend Thomas Bayes. He first discovered the theorem that now bears his name. It was written up in a paper "An Essay Towards Solving a Problem in the Doctrine of Chances." This paper was found after his death by his friend Richard Price, who had it published posthumously in the Philosophical Transactions of the Royal Society in 1763. Bayes showed how inverse probability could be used to calculate probability of antecedent events from the occurrence of the consequent event. His methods were adopted by Laplace and other scientists in the nineteenth century. By mid twentieth century interest in Bayesian methods was renewed by De Finetti, Jeffreys and Lindley, among others. They developed a complete method of statistical inference based on Bayes' theorem. Bayesian analysis is to be used by practitioners for situations where scientists have a priori information about the values of the parameters to be estimated. In everyday life, uncertainty often permeates our choices, and when choices need to be made, past experience frequently proves a helpful aid. Bayesian theory provides a general and consistent framework for dealing with uncertainty.

Within Bayesian inference, there are also different interpretations of probability, and different approaches based on those interpretations. Early efforts to make Bayesian methods accessible for data analysis were made by Raiffa and Schlaifer [11], DeGroot [12], Zellner [13], and Box and Tiao [14]. The most popular interpretations and approaches are objective Bayesian inference and subjective Bayesian inference. Excellent expositions of these approaches are with Bayes and Price [15], Laplace [16], Jeffrey's [17], Anscombe and Aumann [18], Berger [19, 20], Gelman et al. [21], Leonard and Hsu [22], De-Finetti [23]. Modern Bayesian data analysis and methods based on Markov chain Monte Carlo methods are presented in Bernardo and Smith [24], Robert [25], Gelman et al. [26], Marin and Robert [27], Carlin and Louis [28]. Good elementary introductions to the subject are Ibrahim et al. [29], Ghosh [30], Bansal [31], Koch [32], Hoff [33].

In Bayesian statistics probability is not defined as a frequency of occurrence but as the plausibility that a proposition is true, given the available information. The parameters are treated as random variables. The rules of probability are used

#### A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

directly to make inferences about the parameters. Probability statements about parameters must be interpreted as "degree of belief." We revise our beliefs about parameters after getting the data by using Bayes' theorem. This gives our posterior distribution which gives the relative weights we give to each parameter value after analyzing the data. The posterior distribution comes from two sources: the prior distribution and the observed data. This means that the inference is based on the actual occurring data, not all possible data sets that might have occurred.

In this section, posterior distribution of Erlang distribution is obtained by using Jeffrey's prior and Quasi prior. The rate parameter of Erlang distribution is estimated with the help of different loss functions. For parameter estimation we have used the approach as is used by Ahmad et al. [34], Ahmad et al. [35], etc. Some important prior distributions and loss functions which we have used in this article are given as below:

#### 3.1 Prior distributions used

m0 <sup>1</sup> ¼ μ<sup>0</sup> 1:

m0 <sup>1</sup> ¼ x:

μ0 <sup>1</sup> <sup>¼</sup> <sup>k</sup> λ :

m0 <sup>1</sup> ¼ μ<sup>0</sup> 1

^<sup>λ</sup> <sup>¼</sup> <sup>k</sup> x :

Nowadays, the Bayesian school of thought is garnering more attention and at an increasing rate. This thought of statistics was given by Reverend Thomas Bayes. He first discovered the theorem that now bears his name. It was written up in a paper "An Essay Towards Solving a Problem in the Doctrine of Chances." This paper was found after his death by his friend Richard Price, who had it published posthumously in the Philosophical Transactions of the Royal Society in 1763. Bayes showed how inverse probability could be used to calculate probability of antecedent events from the occurrence of the consequent event. His methods were adopted by Laplace and other scientists in the nineteenth century. By mid twentieth century interest in Bayesian methods was renewed by De Finetti, Jeffreys and Lindley, among others. They developed a complete method of statistical inference based on Bayes' theorem. Bayesian analysis is to be used by practitioners for situations where scientists have a priori information about the values of the parameters to be estimated. In everyday life, uncertainty often permeates our choices, and when choices need to be made, past experience frequently proves a helpful aid. Bayesian theory provides a general

Within Bayesian inference, there are also different interpretations of probability, and different approaches based on those interpretations. Early efforts to make Bayesian methods accessible for data analysis were made by Raiffa and Schlaifer [11], DeGroot [12], Zellner [13], and Box and Tiao [14]. The most popular interpretations and approaches are objective Bayesian inference and subjective Bayesian inference. Excellent expositions of these approaches are with Bayes and Price [15], Laplace [16], Jeffrey's [17], Anscombe and Aumann [18], Berger [19, 20], Gelman et al. [21], Leonard and Hsu [22], De-Finetti [23]. Modern Bayesian data analysis and methods based on Markov chain Monte Carlo methods are presented in Bernardo and Smith [24], Robert [25], Gelman et al. [26], Marin and Robert [27], Carlin and Louis [28]. Good elementary introductions to the subject are Ibrahim

In Bayesian statistics probability is not defined as a frequency of occurrence but as the plausibility that a proposition is true, given the available information. The parameters are treated as random variables. The rules of probability are used

, where x is the mean of the data and

and consistent framework for dealing with uncertainty.

et al. [29], Ghosh [30], Bansal [31], Koch [32], Hoff [33].

If r = 1, in Eq. (5), then

Statistical Methodologies

Thus,

<sup>x</sup> <sup>¼</sup> <sup>k</sup> λ

94

Also if r = 1 in Eq. (7), then

3. Bayesian method of estimation

Prior distribution is the basic part of Bayesian analysis which represents all that is known or assumed about the parameter. Usually the prior information is subjective and is based on a person's own experience and judgment, a statement of one's degree of belief regarding the parameter.

Another important feature of the Bayesian analysis is the choice of the prior distribution. If the data have sufficient signal, even a bad prior will still not greatly influence the posterior. We can examine the impact of prior by observing the stability of posterior distribution related to different choices of priors. If the posterior distribution is highly dependent on the prior, then the data (the likelihood function) may not contain sufficient information. However, if the posterior is relatively stable over a choice of priors, then the data indeed contains significant information.

Prior distribution may be categorical in different ways. One common classification is a dichotomy that separated "proper" and "improper" priors.

A prior distribution is proper if it does not depend on the data and the value of ∞

integral Ð �∞ gð Þλ dλ or summation ∑ gð Þλ is one. If the prior does not depend on the data and the distribution does not integrate or sum to one then we say that the prior

is improper. In this chapter, we have used two different priors Jeffrey's prior and Quasi prior

which are given below:

#### 3.1.1 Jeffrey's prior

An invariant form for the prior probability in estimation problems is given by Jeffery's [36].

The general formula of the Jeffreys prior, which is defined by

$$\lg(\lambda) \propto \sqrt{I(\lambda)} \propto \left( -E\left[\frac{\partial^2 \log L(\lambda|\varkappa)}{\partial \lambda^2}\right] \right)^{\frac{1}{2}}.$$

where Ið Þλ is the Fisher information for the parameter λ.When there are multiple parameters I is the Fisher information matrix, the matrix of the expected second partials

$$I(\lambda) = E\left(\frac{\partial^2 \log L(\lambda|\infty)}{\partial \lambda\_i \partial \lambda\_j}\right).$$

3.2.2 Al-Bayyati's loss function (ALF)

DOI: http://dx.doi.org/10.5772/intechopen.85627

3.2.3 LINEX loss function (LLF)

tion Eq. (1) which is given by

where <sup>I</sup>ð Þ¼� <sup>λ</sup> nE <sup>∂</sup><sup>2</sup> log f xð Þ ;λ;<sup>k</sup>

bility density function Eq. (1).

97

constant c determines the shape of the loss function.

3.3 Posterior density under Jeffrey's prior

f xð Þ¼ ; <sup>λ</sup>; <sup>k</sup> <sup>λ</sup><sup>k</sup>

ð Þ k � 1 !

L xð Þ¼ ; <sup>λ</sup>; <sup>k</sup> ð Þ<sup>λ</sup> <sup>k</sup>

and the likelihood function Eq. (2) given as below

Jeffreys' non-informative prior for λ is given by

∂λ<sup>2</sup>

On solving the above expression, we have

By using the Bayes theorem, we have

Using Eqs. (2) and (10) in Eq. (11), we get

xk�<sup>1</sup> e

ð Þ k � 1 ! !<sup>n</sup> <sup>Y</sup><sup>n</sup>

<sup>g</sup>ð Þ<sup>λ</sup> <sup>∝</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi detð Þ <sup>I</sup>ð Þ<sup>λ</sup> <sup>p</sup> :

> <sup>g</sup>ð Þ¼ <sup>λ</sup> <sup>1</sup> λ

i¼1 xi k�1 e �λ ∑ n i¼1 xi :

h i is the Fisher's information matrix for the proba-

: (10)

<sup>π</sup><sup>1</sup> <sup>λ</sup>j<sup>x</sup> � �<sup>∝</sup> L xð Þ <sup>j</sup><sup>λ</sup> <sup>g</sup>ð Þ<sup>λ</sup> : (11)

and is given by

analysis.

The loss function proposed by Al-Bayyati [39] is an asymmetric loss function

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

where λ and ^λ represents the true and estimated values of the parameter. This loss function is frequently used because of its analytical tractability in Bayesian

The idea of LINEX loss function (LLF) was founded by Klebanov [40] and used by Varian [41] in the context of real estate assessment. The formula of LLF is given by

<sup>L</sup> ^λ; <sup>λ</sup> � � <sup>¼</sup> exp <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � � � <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � <sup>1</sup> � �

where λ and ^λ represents the true and estimated values of the parameter and the

Let ð Þ x1; x2; …; xn be a random sample of size n having the Erlang density func-

�λ<sup>x</sup> for x> 0, k∈ N andλ>0

cεR:

lA ^λ; <sup>λ</sup> � � <sup>¼</sup> <sup>λ</sup><sup>c</sup> ^<sup>λ</sup> � <sup>λ</sup> � �<sup>2</sup>

In this situation, the Jeffreys prior is given by

$$
\lg(\lambda) \propto \sqrt{\det(I(\lambda))}.
$$

Jeffrey suggested a thumb rule for specifying non-informative prior for parameter λ as

Rule 1: if λ∈ð Þ �∞; ∞ take gð Þλ to be constant, i.e., λ to be uniformly distributed. Rule 2: if <sup>λ</sup>∈ð Þ <sup>0</sup>; <sup>∞</sup> take <sup>g</sup>ð Þ<sup>λ</sup> <sup>∝</sup> <sup>1</sup> λ , i.e., log λ to be uniformly distributed.

Under linear transformation, rule 1 is invariant and under any power transformation of λ, rule 2 is invariant.

#### 3.1.2 Quasi prior

When there is no more information about the distribution parameter, one may use the Quasi density as given by <sup>g</sup>ð Þ¼ <sup>λ</sup> <sup>1</sup> <sup>λ</sup><sup>d</sup> ; λ> 0 and d>0:

#### 3.2 Loss functions used

The concept of loss function is old as Laplace and was reintroduced in statistics by Abraham Wald [37]. In statistics, typically a loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an occurrence of data. In the context of economics, loss function is usually economic cost. In optimal control, the loss is the penalty for failing to achieve a desired value.

The word "loss" is used in place of "error" and the loss function is used as a measure of the error or loss. Loss function is a measure of the error and presumably would be greater for large error than for small error. We would want the loss to be small or we want the estimate to be close to what it is estimating. Loss depends on sample and we cannot hope to make the loss small for every possible sample but can try to make the loss small on the average. Our objective is to select an estimator that makes this error or loss small, also which makes the average loss (risk) small and ideally select an estimator that has the small risk.

In this chapter, we have used three different Loss Functions which are as under:

#### 3.2.1 Precautionary loss function (PLF)

The concept of precautionary loss function (PLF) was introduced by Norstrom [38]. He introduced an alternative asymmetric loss function and also presented a general class of precautionary loss functions as a special case. These loss functions approach infinitely near the origin to prevent underestimation, thus giving conservative estimators, especially when low failure rates are being estimated. These estimators are very useful when underestimation may lead to serious consequences. A very useful and simple asymmetric precautionary loss function (PLF) is

$$L(\hat{\lambda}, \lambda) = \frac{\left(\hat{\lambda} - \lambda\right)^2}{\lambda}.$$

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

#### 3.2.2 Al-Bayyati's loss function (ALF)

<sup>I</sup>ð Þ¼ <sup>λ</sup> <sup>E</sup> <sup>∂</sup><sup>2</sup> log <sup>L</sup>ð Þ <sup>λ</sup>j<sup>x</sup>

<sup>g</sup>ð Þ<sup>λ</sup> <sup>∝</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi detð Þ <sup>I</sup>ð Þ<sup>λ</sup> <sup>p</sup> :

Jeffrey suggested a thumb rule for specifying non-informative prior for param-

Rule 1: if λ∈ð Þ �∞; ∞ take gð Þλ to be constant, i.e., λ to be uniformly distributed.

Under linear transformation, rule 1 is invariant and under any power transfor-

When there is no more information about the distribution parameter, one may

The concept of loss function is old as Laplace and was reintroduced in statistics by Abraham Wald [37]. In statistics, typically a loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an occurrence of data. In the context of economics, loss function is usually economic cost. In optimal control, the loss is the penalty for

The word "loss" is used in place of "error" and the loss function is used as a measure of the error or loss. Loss function is a measure of the error and presumably would be greater for large error than for small error. We would want the loss to be small or we want the estimate to be close to what it is estimating. Loss depends on sample and we cannot hope to make the loss small for every possible sample but can try to make the loss small on the average. Our objective is to select an estimator that makes this error or loss small, also which makes the average loss (risk) small and

In this chapter, we have used three different Loss Functions which are as under:

The concept of precautionary loss function (PLF) was introduced by Norstrom [38]. He introduced an alternative asymmetric loss function and also presented a general class of precautionary loss functions as a special case. These loss functions approach infinitely near the origin to prevent underestimation, thus giving conservative estimators, especially when low failure rates are being estimated. These estimators are very useful when underestimation may lead to serious consequences.

A very useful and simple asymmetric precautionary loss function (PLF) is

<sup>L</sup> ^λ; <sup>λ</sup> � � <sup>¼</sup> ^<sup>λ</sup> � <sup>λ</sup> � �<sup>2</sup>

<sup>λ</sup> :

<sup>λ</sup><sup>d</sup> ; λ> 0 and d>0:

λ

In this situation, the Jeffreys prior is given by

Rule 2: if <sup>λ</sup>∈ð Þ <sup>0</sup>; <sup>∞</sup> take <sup>g</sup>ð Þ<sup>λ</sup> <sup>∝</sup> <sup>1</sup>

use the Quasi density as given by <sup>g</sup>ð Þ¼ <sup>λ</sup> <sup>1</sup>

mation of λ, rule 2 is invariant.

eter λ as

3.1.2 Quasi prior

Statistical Methodologies

3.2 Loss functions used

failing to achieve a desired value.

ideally select an estimator that has the small risk.

3.2.1 Precautionary loss function (PLF)

96

∂λi∂λ<sup>j</sup> � �

:

, i.e., log λ to be uniformly distributed.

The loss function proposed by Al-Bayyati [39] is an asymmetric loss function and is given by

$$l\_A(\hat{\lambda}, \lambda) = \lambda^c \left(\hat{\lambda} - \lambda\right)^2 c e R.$$

where λ and ^λ represents the true and estimated values of the parameter. This loss function is frequently used because of its analytical tractability in Bayesian analysis.

#### 3.2.3 LINEX loss function (LLF)

The idea of LINEX loss function (LLF) was founded by Klebanov [40] and used by Varian [41] in the context of real estate assessment. The formula of LLF is given by

$$L(\hat{\lambda}, \lambda) = \left( \exp \left( a(\hat{\lambda} - \lambda) \right) - a(\hat{\lambda} - \lambda) - 1 \right)$$

where λ and ^λ represents the true and estimated values of the parameter and the constant c determines the shape of the loss function.

#### 3.3 Posterior density under Jeffrey's prior

Let ð Þ x1; x2; …; xn be a random sample of size n having the Erlang density function Eq. (1) which is given by

$$f(\mathbf{x}; \boldsymbol{\lambda}, \boldsymbol{k}) = \frac{\boldsymbol{\lambda}^k}{(k-1)!} \boldsymbol{\lambda}^{k-1} e^{-\boldsymbol{\lambda} \cdot \mathbf{x}} \text{ for } \mathbf{x} > \mathbf{0}, \boldsymbol{k} \in N \text{ and } \boldsymbol{\lambda} > \mathbf{0}$$

and the likelihood function Eq. (2) given as below

$$L(\mathfrak{x}; \lambda, k) = \left(\frac{(\lambda)^k}{(k-1)!}\right)^n \prod\_{i=1}^n \mathfrak{x}\_i^{k-1} e^{-\lambda \sum\_{i=1}^n \chi\_i}.$$

Jeffreys' non-informative prior for λ is given by

$$\lg(\lambda) \propto \sqrt{\det(I(\lambda))}.$$

where <sup>I</sup>ð Þ¼� <sup>λ</sup> nE <sup>∂</sup><sup>2</sup> log f xð Þ ;λ;<sup>k</sup> ∂λ<sup>2</sup> h i is the Fisher's information matrix for the probability density function Eq. (1).

On solving the above expression, we have

$$\mathbf{g}(\lambda) = \frac{\mathbf{1}}{\lambda}. \tag{10}$$

By using the Bayes theorem, we have

$$
\pi\_1\left(\boldsymbol{\lambda}|\underline{\mathbf{x}}\right) \propto L\left(\boldsymbol{x}|\boldsymbol{\lambda}\right)\mathbf{g}\left(\boldsymbol{\lambda}\right).\tag{11}
$$

Using Eqs. (2) and (10) in Eq. (11), we get

$$
\pi\_1\left(\boldsymbol{\lambda}|\underline{\boldsymbol{x}}\right) \propto \frac{\left(\boldsymbol{\lambda}\right)^{nk-1}}{\left(k-1\right)!} \prod\_{i=1}^n \boldsymbol{\varkappa}\_i^{k-1} e^{-\boldsymbol{\lambda}\sum\_{i=1}^n \boldsymbol{\varkappa}\_i}
$$

$$
\pi\_1\left(\boldsymbol{\lambda}|\underline{\boldsymbol{x}}\right) = \rho \boldsymbol{\lambda}^{nk-1} e^{-\boldsymbol{\lambda}\sum\_{i=1}^n \boldsymbol{\varkappa}\_i} \tag{12}
$$

<sup>π</sup><sup>2</sup> <sup>λ</sup>j<sup>x</sup> � � <sup>¼</sup> <sup>λ</sup>nk�de

DOI: http://dx.doi.org/10.5772/intechopen.85627

4. Estimation of parameters under Jeffrey's prior

following theorems:

R ^λ � � <sup>¼</sup>

R ^λ

99

∞ð

^<sup>λ</sup> � <sup>λ</sup> � � ^λ

<sup>i</sup>¼<sup>1</sup>xi � �nk <sup>Γ</sup>nk ^<sup>λ</sup>

0

� � <sup>¼</sup> <sup>∑</sup><sup>n</sup>

Lp ^λ; λ � � is given by the formula

Using Eq. (13) in Eq. (18), we get

0

B@

2 4

<sup>2</sup> <sup>λ</sup>nk�<sup>1</sup> <sup>∑</sup><sup>n</sup>

∞ð

λnk�<sup>1</sup> e �λ ∑ n i¼1 xi dλ þ 1 ^λ ∞ð

0

R ^λ � � <sup>¼</sup> ^<sup>λ</sup> <sup>þ</sup>

On solving the above expression, we get

0

B@

rate parameter λ, if the shape parameter k is known, is of the form

^λ<sup>p</sup> <sup>¼</sup>

∞ð

0

e �λ ∑ n i¼1 xi

<sup>i</sup>¼<sup>1</sup>xi � �nk

Γnk

1 ^λ

∂ ∂^λ

^λ<sup>p</sup> <sup>¼</sup>

rate parameter λ, if the shape parameter k is known, is of the form

R ^λ � � <sup>¼</sup> �λ ∑ n i¼1 xi ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi � �nk�dþ<sup>1</sup>

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

In this section, parameter estimation of Erlang distribution is done by using Jeffreys' prior under different loss functions. The procedure of calculating the Bayesian estimate is already defined in Section 3. The estimates are obtained in the

Theorem 4.1: Assuming the loss function Lp ^λ; λ � �, the Bayesian estimator of the

Proof: The risk function of the estimator λ under the precautionary loss function

2

1

CA dλ

0

� �<sup>2</sup> � <sup>2</sup>nk

∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi � � :

nk nk ð Þ þ 1 ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

<sup>R</sup> ^<sup>λ</sup> � � � � <sup>¼</sup> <sup>0</sup>

Theorem 4.2: Assuming the loss function lA ^λ; λ � �, the Bayesian estimator of the

^λ<sup>A</sup> <sup>¼</sup> ð Þ nk <sup>þ</sup> <sup>c</sup> ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi � � :

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nk nk ð Þ <sup>þ</sup> <sup>1</sup> <sup>p</sup> ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi

λnkþ<sup>1</sup> e �λ ∑ n i¼1 xi dλ � 2

^<sup>λ</sup> � <sup>λ</sup> � � ^λ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nk nk ð Þ <sup>þ</sup> <sup>1</sup> <sup>p</sup> ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi � � :

Γð Þ nk � d þ 1

1

<sup>π</sup><sup>1</sup> <sup>λ</sup>j<sup>x</sup> � �dλ: (18)

∞ð

λnke �λ ∑ n i¼1 xi dλ

3 5:

0

� � : (19)

CA: (17)

where ρ is independent of λ and

$$
\rho^{-1} = \int\_0^\infty \lambda^{nk-1} e^{-\lambda \sum\_{i=1}^n x\_i} d\lambda
$$

$$
\rho = \frac{\left(\sum\_{i=1}^n \varkappa\_i\right)^{nk}}{\Gamma n k}.
$$

Using the value of ρ in Eq. (12)

$$
\pi\_1\left(\lambda|\underline{x}\right) = \left(\frac{\lambda^{nk-1} e^{-\lambda \sum\_{i=1}^n \chi\_i} \left(\sum\_{i=1}^n \chi\_i\right)^{nk}}{\Gamma nk}\right). \tag{13}
$$

#### 3.4 Posterior density under Quasi prior

Let ð Þ x1; x2; …; xn be a random sample of size n having the Erlang density function Eq. (2) and the likelihood function Eq. (2).

The Quasi for λ is given by

$$\mathbf{g}(\boldsymbol{\lambda}) = \frac{\mathbf{1}}{\boldsymbol{\lambda}^d}. \tag{14}$$

By using the Bayes theorem, we have

$$
\pi\_2\left(\lambda|\underline{\mathbf{x}}\right) \propto L\left(\mathbf{x}|\lambda\right)\mathbf{g}\left(\lambda\right). \tag{15}
$$

Using Eqs. (2) and (14) in Eq. (15), we have

$$
\pi\_2\left(\lambda|\underline{\boldsymbol{x}}\right) \propto \frac{\left(\lambda\right)^{nk-d}}{\left(k-1\right)!} \prod\_{i=1}^n \boldsymbol{\varkappa}\_i^{k-1} e^{-\lambda \sum\_{i=1}^n \boldsymbol{\varkappa}\_i}
$$

$$
\pi\_2\left(\lambda|\underline{\boldsymbol{x}}\right) = \rho \lambda^{nk-d} e^{-\lambda \sum\_{i=1}^n \boldsymbol{\varkappa}\_i} \tag{16}
$$

where ρ is independent of λ and

$$
\rho^{-1} = \int\_0^\infty \lambda^{nk - 2c} e^{-\lambda \sum\_{i=1}^n x\_i} d\lambda.
$$

$$
\rho = \frac{\left(\sum\_{i=1}^n \chi\_i\right)^{nk - d + 1}}{\Gamma(nk - d + 1)}.
$$

By using the value of ρ in Eq. (16), we have

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

$$\pi\_2\left(\lambda|\underline{\mathbf{x}}\right) = \left(\frac{\lambda^{nk-d}e^{-\lambda\sum\_{i=1}^n \mathbf{x}\_i} \left(\sum\_{i=1}^n \mathbf{x}\_i\right)^{nk-d+1}}{\Gamma(nk-d+1)}\right). \tag{17}$$

#### 4. Estimation of parameters under Jeffrey's prior

In this section, parameter estimation of Erlang distribution is done by using Jeffreys' prior under different loss functions. The procedure of calculating the Bayesian estimate is already defined in Section 3. The estimates are obtained in the following theorems:

Theorem 4.1: Assuming the loss function Lp ^λ; λ � �, the Bayesian estimator of the rate parameter λ, if the shape parameter k is known, is of the form

$$
\hat{\lambda}\_p = \frac{\sqrt{nk(nk+1)}}{\left(\sum\_{i=1}^n \mathbf{x}\_i\right)}.
$$

Proof: The risk function of the estimator λ under the precautionary loss function Lp ^λ; λ � � is given by the formula

$$R\left(\hat{\lambda}\right) = \int\_0^\infty \frac{\left(\hat{\lambda} - \lambda\right)^2}{\hat{\lambda}} \pi\_1\left(\lambda|\underline{\mathbf{x}}\right) d\lambda. \tag{18}$$

Using Eq. (13) in Eq. (18), we get

$$\begin{split} R(\widehat{\lambda}) &= \int\_{0}^{\infty} \frac{(\widehat{\lambda} - \lambda)^{2}}{\widehat{\lambda}} \left( \frac{\lambda^{nk-1} \left(\sum\_{i=1}^{n} \mathbf{x}\_{i}\right)^{nk} e^{-\lambda \left(\sum\_{i=1}^{n} \mathbf{x}\_{i}\right)}}{\Gamma nk} \right) d\lambda \\ R(\widehat{\lambda}) &= \frac{\left(\sum\_{i=1}^{n} \mathbf{x}\_{i}\right)^{nk}}{\Gamma nk} \left[ \widehat{\lambda} \int\_{0}^{\infty} \lambda^{nk-1} e^{-\lambda \left[\sum\_{i=1}^{n} \mathbf{x}\_{i}\right]} d\lambda + \frac{1}{\widehat{\lambda}} \int\_{0}^{\infty} \lambda^{nk+1} e^{-\lambda \left[\sum\_{i=1}^{n} \mathbf{x}\_{i}\right]} d\lambda - 2 \int\_{0}^{\infty} \lambda^{nk} e^{-\lambda \left[\sum\_{i=1}^{n} \mathbf{x}\_{i}\right]} d\lambda \right] .\end{split}$$

On solving the above expression, we get

$$R\left(\hat{\lambda}\right) = \hat{\lambda} + \frac{\mathbf{1}}{\hat{\lambda}} \frac{nk(nk+1)}{\left(\sum\_{i=1}^{n} \mathbf{x}\_{i}\right)^{2}} - \frac{2nk}{\left(\sum\_{i=1}^{n} \mathbf{x}\_{i}\right)}.$$

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

$$\frac{\partial}{\partial \hat{\lambda}} \left[ R(\hat{\lambda}) \right] = \mathbf{0}$$

$$\hat{\lambda}\_p = \frac{\sqrt{nk(nk+1)}}{\left( \sum\_{i=1}^n \boldsymbol{\varkappa}\_i \right)}. \tag{19}$$

Theorem 4.2: Assuming the loss function lA ^λ; λ � �, the Bayesian estimator of the rate parameter λ, if the shape parameter k is known, is of the form

$$
\hat{\lambda}\_A = \frac{(nk+c)}{\left(\sum\_{i=1}^n \mathbf{x}\_i\right)}.
$$

<sup>π</sup><sup>1</sup> <sup>λ</sup>j<sup>x</sup> � �<sup>∝</sup> ð Þ<sup>λ</sup> nk�<sup>1</sup>

<sup>ρ</sup>�<sup>1</sup> <sup>¼</sup>

<sup>π</sup><sup>1</sup> <sup>λ</sup>j<sup>x</sup> � � <sup>¼</sup> <sup>λ</sup>nk�<sup>1</sup>

0

B@

where ρ is independent of λ and

Statistical Methodologies

Using the value of ρ in Eq. (12)

3.4 Posterior density under Quasi prior

The Quasi for λ is given by

tion Eq. (2) and the likelihood function Eq. (2).

By using the Bayes theorem, we have

where ρ is independent of λ and

98

Using Eqs. (2) and (14) in Eq. (15), we have

By using the value of ρ in Eq. (16), we have

<sup>π</sup><sup>2</sup> <sup>λ</sup>j<sup>x</sup> � �<sup>∝</sup> ð Þ<sup>λ</sup> nk�<sup>d</sup>

<sup>ρ</sup>�<sup>1</sup> <sup>¼</sup>

ð Þ k � 1 !

<sup>π</sup><sup>2</sup> <sup>λ</sup>j<sup>x</sup> � � <sup>¼</sup> ρλnk�de

∞ð

λnk�2<sup>c</sup> e �λ ∑ n i¼1 xi dλ

<sup>i</sup>¼<sup>1</sup>xi � �nk�dþ<sup>1</sup> <sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> :

0

<sup>ρ</sup> <sup>¼</sup> <sup>∑</sup><sup>n</sup>

Yn i¼1 xi k�1 e �λ ∑ n i¼1 xi

> �λ ∑ n i¼1 xi

ð Þ k � 1 !

<sup>π</sup><sup>1</sup> <sup>λ</sup>j<sup>x</sup> � � <sup>¼</sup> ρλnk�<sup>1</sup>

∞ð

λnk�<sup>1</sup> e �λ ∑ n i¼1 xi dλ

> e �λ ∑ n i¼1 xi ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi � �nk

Let ð Þ x1; x2; …; xn be a random sample of size n having the Erlang density func-

<sup>g</sup>ð Þ¼ <sup>λ</sup> <sup>1</sup>

Γnk

1

<sup>λ</sup><sup>d</sup> : (14)

<sup>π</sup><sup>2</sup> <sup>λ</sup>j<sup>x</sup> � �∝L xð Þ <sup>j</sup><sup>λ</sup> <sup>g</sup>ð Þ<sup>λ</sup> : (15)

CA: (13)

<sup>i</sup>¼<sup>1</sup>xi � �nk <sup>Γ</sup>nk :

0

<sup>ρ</sup> <sup>¼</sup> <sup>∑</sup><sup>n</sup>

Yn i¼1 xi k�1 e �λ ∑ n i¼1 xi

> e �λ ∑ n i¼1 xi

(12)

(16)

Proof: The risk function of the estimator λ under the Al-Bayyati's loss function LA ^λ; λ � � is given by the formula

$$\mathcal{R}\left(\hat{\lambda}\right) = \int\_0^\infty \lambda^c \left(\hat{\lambda} - \lambda\right)^2 \pi\_1\left(\lambda|\underline{\mathbf{x}}\right) d\lambda. \tag{20}$$

R ^λ � � <sup>¼</sup> <sup>∑</sup><sup>n</sup>

<sup>i</sup>¼<sup>1</sup>xi � �nk Γnk

following theorems.

Lp ^λ; λ � � is given by the formula

Using Eq. (17) in Eq. (24), we have

∞ð

0

∞ð

2 4

0

On solving the above expression, we get

R ^λ � � <sup>¼</sup>

<sup>i</sup>¼<sup>1</sup>xi � �nk�dþ<sup>1</sup> <sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> ^<sup>λ</sup>

R ^λ � � <sup>¼</sup> <sup>∑</sup><sup>n</sup>

101

exp <sup>a</sup>^<sup>λ</sup> � �<sup>∞</sup><sup>ð</sup>

DOI: http://dx.doi.org/10.5772/intechopen.85627

þa ∞ð

R ^λ

0

On solving the above expression, we get

� � <sup>¼</sup> <sup>∑</sup><sup>n</sup>

0

exp �∑ n i¼1 xi � �<sup>λ</sup> � �λnkd<sup>λ</sup> �

> <sup>i</sup>¼<sup>1</sup>xi � �nk

> > <sup>a</sup> <sup>þ</sup> <sup>∑</sup><sup>n</sup>

^λ<sup>l</sup> <sup>¼</sup>

rate parameter λ, if the shape parameter k is known, is of the form

5. Estimation of parameters under Quasi prior

^λ<sup>p</sup> <sup>¼</sup>

R ^λ � � <sup>¼</sup>

^<sup>λ</sup> � <sup>λ</sup> � � ^λ

λnk�de

∞ð

0

0

B@

�λ ∑ n i¼1 xi dλ þ 1 ^λ ∞ð

<sup>2</sup> <sup>λ</sup>nk�<sup>d</sup> <sup>∑</sup><sup>n</sup>

exp � a þ ∑

n i¼1 xi � �<sup>λ</sup> � � � � <sup>λ</sup>nk�<sup>1</sup>

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

exp a^λ � �

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

nk log 1 <sup>þ</sup> <sup>a</sup>

In this section, parameter estimation of Erlang distribution is done by using QUASI prior under different loss functions. The procedure for obtaining the Bayesian estimate is available in Section 3. The estimates of parameter are obtained in the

Theorem 5.1: Assuming the loss function Lp ^λ; λ � �, the Bayesian estimator of the

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>2</sup> <sup>p</sup> ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi � � :

Proof: The risk function of the estimator λ under the precautionary loss function

2

<sup>i</sup>¼<sup>1</sup>xi � �nk�dþ<sup>1</sup>

0

Γð Þ nk � d þ 1

λnk�dþ<sup>2</sup> e �λ ∑ n i¼1 xi dλ � 2

e �λ ∑ n i¼1 xi

^<sup>λ</sup> � <sup>λ</sup> � � ^λ

<sup>R</sup> ^<sup>λ</sup> � � � � <sup>¼</sup> <sup>0</sup>

∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup> ð Þ xi � �

<sup>i</sup>¼<sup>1</sup>xi � �nk � <sup>a</sup>^<sup>λ</sup> <sup>þ</sup>

> ∂ ∂^λ

∞ð

0

<sup>d</sup><sup>λ</sup> � <sup>a</sup>^<sup>λ</sup> ∞ð

exp �∑ n i¼1 xi � �<sup>λ</sup> � �λnk�<sup>1</sup>

0

aΓð Þ nk þ 1 Γnk ∑<sup>n</sup>

<sup>i</sup>¼<sup>1</sup>xi � � � <sup>1</sup>:

<sup>a</sup> : (23)

<sup>π</sup><sup>2</sup> <sup>λ</sup>j<sup>x</sup> � �d<sup>λ</sup> (24)

1

CA dλ

∞ð

λnk�dþ<sup>1</sup> e �λ ∑ n i¼1 xi dλ

3 5:

0

exp �∑ n i¼1 xi � �<sup>λ</sup> � �λnk�<sup>1</sup>

dλ

dλ

:

On substituting Eq. (13) in Eq. (20), we have.

$$\begin{split} R(\widehat{\lambda}) &= \int\_{0}^{\infty} \lambda^{\varepsilon} (\widehat{\lambda} - \lambda)^{2} \left( \frac{\lambda^{nk-1} \left(\sum\_{i=1}^{n} \mathbf{x}\_{i}\right)^{nk} e^{-\lambda} \widehat{\sum}\_{i=1}^{\infty} \mathbf{x}\_{i}}{\Gamma nk} \right) d\lambda \\ R(\widehat{\lambda}) &= \frac{\left(\sum\_{i=1}^{n} \mathbf{x}\_{i}\right)^{nk}}{\Gamma nk} \left[ \widehat{\lambda}^{2} \int\_{0}^{\infty} \lambda^{nk+\varepsilon-1} e^{-\lambda} \widehat{\sum}\_{i=1}^{\infty} d\lambda + \int\_{0}^{\infty} \lambda^{nk+\varepsilon+1} e^{-\lambda} \widehat{\sum}\_{i=1}^{\infty} d\lambda - 2\widehat{\lambda} \int\_{0}^{\infty} \lambda^{nk+\varepsilon} e^{-\lambda} \widehat{\sum}\_{i=1}^{\infty} d\lambda \right] .\end{split}$$

On solving the above expression, we get

$$R\left(\hat{\lambda}\right) = \frac{\hat{\lambda}^2 \Gamma(nk + c)}{\Gamma nk \left(\sum\_{i=1}^n \boldsymbol{\kappa}\_i\right)^c} + \frac{\Gamma(nk + c + 2)}{\Gamma nk \left(\sum\_{i=1}^n \boldsymbol{\kappa}\_i\right)^{c+2}} - \frac{2\hat{\lambda}\Gamma(nk + c + 1)}{\Gamma nk \left(\sum\_{i=1}^n \boldsymbol{\kappa}\_i\right)^{c+1}}.$$

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

$$\frac{\partial}{\partial \hat{\lambda}} \left[ R(\hat{\lambda}) \right] = \mathbf{0}$$

$$\hat{\lambda}\_A = \frac{(nk+c)}{\left( \sum\_{i=1}^n \mathbb{1}\_i \right)}. \tag{21}$$

Theorem 4.3: Assuming the loss function Ll ^λ; λ � �, the Bayesian estimator of the rate parameter λ, if the shape parameter k is known, is of the form

$$
\hat{\lambda}\_l = \frac{nk \log\left(1 + \frac{a}{\left(\sum\_{i=1}^n \nu\_i\right)}\right)}{a}.
$$

Proof: The risk function of the estimator λ under the LINEX loss function Ll ^λ; λ � � is given by the formula

$$\mathcal{R}(\hat{\lambda}) = \bigcap\_{\hat{0}}^{\infty} \left( \exp\left(a(\hat{\lambda} - \lambda)\right) - a\left(\hat{\lambda} - \lambda\right) - \mathbf{1} \right) \pi\_1(\lambda|\underline{\mathbf{x}}) d\lambda. \tag{22}$$

Using Eq. (13) in Eq. (22), we have

$$R(\hat{\lambda}) = \bigcap\_{0}^{\infty} \left( \exp \left( a(\hat{\lambda} - \lambda) \right) - a(\hat{\lambda} - \lambda) - 1 \right) \left( \frac{\lambda^{nk-1} \left( \sum\_{i=1}^{n} \chi\_i \right)^{nk} e^{-\lambda \sum\_{i=1}^{n} \chi\_i}}{\Gamma nk} \right) d\lambda$$

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

$$R(\boldsymbol{\hat{\lambda}}) = \frac{(\sum\_{i=1}^{n} \boldsymbol{\mathfrak{x}}\_{i})^{nk}}{\Gamma nk} \begin{bmatrix} \exp\left(a\boldsymbol{\lambda}\right) \left[ \left( \exp\left(-\left(\boldsymbol{a} + \sum\_{i=1}^{n} \boldsymbol{\mathfrak{x}}\_{i}\right) \boldsymbol{\lambda} \right) \right] \lambda^{nk-1} d\boldsymbol{\lambda} - a\boldsymbol{\lambda} \prod\_{0}^{\infty} \left( \exp\left(-\sum\_{i=1}^{n} \boldsymbol{\mathfrak{x}}\_{i} \right) \boldsymbol{\lambda} \right) \lambda^{nk-1} d\boldsymbol{\lambda} \\\\ + a \int\_{0}^{\infty} \left( \exp\left(-\sum\_{i=1}^{n} \boldsymbol{\mathfrak{x}}\_{i} \right) \boldsymbol{\lambda} \right) \lambda^{nk} d\boldsymbol{\lambda} - \left[ \left( \exp\left(-\sum\_{i=1}^{n} \boldsymbol{\mathfrak{x}}\_{i} \right) \boldsymbol{\lambda} \right) \lambda^{nk-1} d\boldsymbol{\lambda} \end{bmatrix}.$$

On solving the above expression, we get

Proof: The risk function of the estimator λ under the Al-Bayyati's loss function

<sup>λ</sup><sup>c</sup> ^<sup>λ</sup> � <sup>λ</sup> � �<sup>2</sup>

1

CA dλ

0

Γð Þ nk þ c þ 2

<sup>i</sup>¼<sup>1</sup>xi

Γnk ∑<sup>n</sup>

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

^λ<sup>A</sup> <sup>¼</sup> ð Þ nk <sup>þ</sup> <sup>c</sup> ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi

nk log 1 <sup>þ</sup> <sup>a</sup>

Proof: The risk function of the estimator λ under the LINEX loss function

exp <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � � � <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � <sup>1</sup> � � <sup>λ</sup>nk�<sup>1</sup> <sup>∑</sup><sup>n</sup>

∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup> <sup>y</sup> ð Þ<sup>i</sup> � �

<sup>a</sup> :

exp <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � � � <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � <sup>1</sup> � �π<sup>1</sup> <sup>λ</sup>j<sup>x</sup> � �dλ: (22)

0

B@

<sup>i</sup>¼<sup>1</sup>xi � �nk

Γnk

e �λ ∑ n i¼1 xi 1

CA dλ

<sup>R</sup> ^<sup>λ</sup> � � � � <sup>¼</sup> <sup>0</sup>

Theorem 4.3: Assuming the loss function Ll ^λ; λ � �, the Bayesian estimator of the

∂ ∂^λ

rate parameter λ, if the shape parameter k is known, is of the form

^λ<sup>l</sup> <sup>¼</sup>

λnkþcþ<sup>1</sup> e �λ ∑ n i¼1 xi

� �<sup>c</sup>þ<sup>2</sup> � <sup>2</sup>^λΓð Þ nk <sup>þ</sup> <sup>c</sup> <sup>þ</sup> <sup>1</sup> Γnk ∑<sup>n</sup>

<sup>π</sup><sup>1</sup> <sup>λ</sup>j<sup>x</sup> � �dλ: (20)

<sup>d</sup><sup>λ</sup> � <sup>2</sup>^<sup>λ</sup>

<sup>i</sup>¼<sup>1</sup>xi � �<sup>c</sup>þ<sup>1</sup> :

� � : (21)

∞ð

λnkþ<sup>c</sup> e �λ ∑ n i¼1 xi dλ

3 5:

0

LA ^λ; λ � � is given by the formula

Statistical Methodologies

R ^λ � � <sup>¼</sup>

R ^λ

∞ð

0

<sup>i</sup>¼<sup>1</sup>xi � �nk <sup>Γ</sup>nk ^<sup>λ</sup>

R ^λ

Ll ^λ; λ � � is given by the formula

∞ð

0

R ^λ � � <sup>¼</sup>

100

R ^λ � � <sup>¼</sup> ∞ð

0

Using Eq. (13) in Eq. (22), we have

� � <sup>¼</sup> <sup>∑</sup><sup>n</sup>

R ^λ � � <sup>¼</sup>

On substituting Eq. (13) in Eq. (20), we have.

<sup>λ</sup><sup>c</sup> ^<sup>λ</sup> � <sup>λ</sup> � �<sup>2</sup> <sup>λ</sup>nk�<sup>1</sup> <sup>∑</sup><sup>n</sup>

2 4

B@

2 ∞ð

0

On solving the above expression, we get

� � <sup>¼</sup> ^λ<sup>2</sup>Γð Þ nk <sup>þ</sup> <sup>c</sup> Γnk ∑<sup>n</sup>

λnkþc�<sup>1</sup> e �λ ∑ n i¼1 xi dλ þ ∞ð

<sup>i</sup>¼<sup>1</sup>xi � �<sup>c</sup> þ

0

∞ð

0

e �λ ∑ n i¼1 xi

<sup>i</sup>¼<sup>1</sup>xi � �nk

Γnk

$$R\left(\hat{\lambda}\right) = \frac{\left(\sum\_{i=1}^{n} \boldsymbol{\pi}\_{i}\right)^{nk} \exp\left(a\hat{\lambda}\right)}{\left(a + \sum\_{i=1}^{n} \boldsymbol{\pi}\_{i}\right)^{nk}} - a\hat{\lambda} + \frac{a\Gamma(nk+1)}{\Gamma nk \left(\sum\_{i=1}^{n} \boldsymbol{\pi}\_{i}\right)} - \mathbf{1}.$$

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

$$\frac{\partial}{\partial \hat{\lambda}} \left[ R(\hat{\lambda}) \right] = \mathbf{0}$$

$$\hat{\lambda}\_l = \frac{nk \log \left( \mathbf{1} + \frac{a}{\left( \sum\_{i=1}^n x\_i \right)} \right)}{a}. \tag{23}$$

#### 5. Estimation of parameters under Quasi prior

In this section, parameter estimation of Erlang distribution is done by using QUASI prior under different loss functions. The procedure for obtaining the Bayesian estimate is available in Section 3. The estimates of parameter are obtained in the following theorems.

Theorem 5.1: Assuming the loss function Lp ^λ; λ � �, the Bayesian estimator of the rate parameter λ, if the shape parameter k is known, is of the form

$$
\hat{\lambda}\_p = \frac{\sqrt{(nk - d + 1)(nk - d + 2)}}{\left(\sum\_{i=1}^n \mathbf{x}\_i\right)}.
$$

Proof: The risk function of the estimator λ under the precautionary loss function Lp ^λ; λ � � is given by the formula

$$\mathcal{R}\left(\hat{\lambda}\right) = \int\_0^\infty \frac{\left(\hat{\lambda} - \lambda\right)^2}{\hat{\lambda}} \, \pi\_2\left(\lambda|\underline{\omega}\right) d\lambda \tag{24}$$

Using Eq. (17) in Eq. (24), we have

$$R(\hat{\lambda}) = \int\_0^\infty \frac{(\hat{\lambda} - \lambda)^2}{\hat{\lambda}} \left( \frac{\lambda^{nk-d} \left( \sum\_{i=1}^n \mathbf{x}\_i \right)^{nk-d+1} e^{-\lambda \left( \sum\_{i=1}^n \mathbf{x}\_i \right)}}{\Gamma(nk - d + 1)} \right) d\lambda$$

$$R(\hat{\lambda}) = \frac{\left(\sum\_{i=1}^n \mathbf{x}\_i \right)^{nk-d+1}}{\Gamma(nk - d + 1)} \left[ \hat{\lambda} \int\_0^\infty \lambda^{nk-d} e^{-\lambda \left( \sum\_{i=1}^n \mathbf{x}\_i \right)} d\lambda + \frac{1}{\hat{\lambda}} \int\_0^\infty \lambda^{nk-d+2} e^{-\lambda \left( \sum\_{i=1}^n \mathbf{x}\_i \right)} d\lambda - 2 \left\{ \lambda^{nk-d+1} e^{-\lambda \left( \sum\_{i=1}^n \mathbf{x}\_i \right)} d\lambda \right\} \right]$$

On solving the above expression, we get

$$R(\hat{\lambda}) = \hat{\lambda} + \frac{1}{\hat{\lambda}} \frac{\Gamma(nk - d + 3)}{\Gamma(nk - d + 1) \left(\sum\_{i=1}^{n} \mathbf{x}\_i\right)^2} - \frac{2\Gamma(nk - d + 2)}{\Gamma(nk - d + 1) \left(\sum\_{i=1}^{n} \mathbf{x}\_i\right)}.$$

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

$$\frac{\partial}{\partial \hat{\lambda}} \left[ R(\hat{\lambda}) \right] = \mathbf{0}$$

$$\hat{\lambda}\_p = \frac{\sqrt{(nk - d + 1)(nk - d + 2)}}{\left( \sum\_{i=1}^n \mathbf{x}\_i \right)}. \tag{25}$$

^λ<sup>l</sup> <sup>¼</sup>

DOI: http://dx.doi.org/10.5772/intechopen.85627

∞ð

0

exp <sup>a</sup>^<sup>λ</sup> � �<sup>∞</sup><sup>ð</sup>

0

exp �∑ n i¼1 xi � �<sup>λ</sup> � �λnk�dþ<sup>1</sup>

<sup>i</sup>¼<sup>1</sup>xi � �nk�dþ<sup>1</sup>

� �nk�dþ<sup>1</sup> � <sup>a</sup>^<sup>λ</sup> <sup>þ</sup>

∂ ∂^λ

<sup>i</sup>¼<sup>1</sup>xi

^λ<sup>l</sup> <sup>¼</sup>

6. Entropy estimation of Erlang distribution

nats, when the log is to the base n.

Using Eq. (17) in Eq. (28), we have.

exp <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � � � <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � <sup>1</sup> � �

þ a ∞ð

0

On solving the above expression, we get

<sup>a</sup> <sup>þ</sup> <sup>∑</sup><sup>n</sup>

� � <sup>¼</sup> exp <sup>a</sup>^<sup>λ</sup> � � <sup>∑</sup><sup>n</sup>

Ll ^λ; λ � � is given by the formula

R ^λ � � <sup>¼</sup> ∞ð

R ^λ � � <sup>¼</sup> <sup>∑</sup><sup>n</sup>

Eq. (23).

103

0

<sup>i</sup>¼<sup>1</sup>xi � �nk�dþ<sup>1</sup> Γð Þ nk � 2c þ 1

R ^λ

R ^λ � � <sup>¼</sup> ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> log 1 <sup>þ</sup> <sup>a</sup>

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

Proof: The risk function of the estimator λ under the LINEX loss function

λnk�<sup>d</sup> ∑ n i¼1 xi � �nk�dþ<sup>1</sup>

n i¼1 xi � �<sup>λ</sup> � � � � <sup>λ</sup>nk�dd<sup>λ</sup> � <sup>a</sup>^<sup>λ</sup>

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> log 1 <sup>þ</sup> <sup>a</sup>

<sup>R</sup> ^<sup>λ</sup> � � � � <sup>¼</sup> <sup>0</sup>

Remark: Replacing d = 1 in Eq. (29), the same Bayes estimator is obtained as in

The concept of entropy was introduced by Claude. Shannon [42] in the paper "A Mathematical theory of Communication." This concept of Shannon's entropy is the central role of information theory, sometimes referred as measure of uncertainty. Shannon entropy provides an absolute limit on the best possible lossless encoding or compression of any communication, assuming that the communication may be represented as a sequence of independent and identical distributed random variables. Entropy is typically measured in bits, when the log is to the base 2, and

0

BBB@

exp � a þ ∑

∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup> ð Þ xi � �

exp <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � � � <sup>a</sup> ^<sup>λ</sup> � <sup>λ</sup> � � � <sup>1</sup> � �π<sup>2</sup> <sup>λ</sup>j<sup>x</sup> � �dλ: (28)

e �λ ∑ n i¼1 xi 1

CCCA dλ

exp �∑ n i¼1 xi � �<sup>λ</sup> � �λnk�dd<sup>λ</sup>

∞ð

exp �∑ n i¼1 xi � �<sup>λ</sup> � �λnk�dd<sup>λ</sup>

<sup>i</sup>¼<sup>1</sup>xi � � � <sup>1</sup>:

0

aΓð Þ nk � d þ 2 <sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> <sup>∑</sup><sup>n</sup>

<sup>a</sup> : (29)

Γð Þ nk � d þ 1

dλ � ∞ð

0

∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup> ð Þ xi � �

<sup>a</sup> :

Remark: Replacing d = 1 in Eq. (25), the same Bayes estimator is obtained as in Eq. (19).

Theorem 5.2: Assuming the loss function lA ^λ; λ � �, the Bayesian estimator of the rate parameter λ, if the shape parameter k is known, is of the form

$$
\hat{\lambda}\_A = \frac{(nk - d + c + 1)}{\left(\sum\_{i=1}^n \boldsymbol{\kappa}\_i\right)}.
$$

Proof: The risk function of the estimator λ under the Al-Bayyati's loss function LA ^λ; λ � � is given by the formula

$$\mathcal{R}\left(\hat{\lambda}\right) = \int\_0^\infty \lambda^\epsilon \left(\hat{\lambda} - \lambda\right)^2 \pi\_2\left(\lambda|\underline{x}\right) d\lambda. \tag{26}$$

By using Eq. (17) in Eq. (26), we have.

$$\begin{split} R(\widehat{\lambda}) &= \int\_{0}^{\infty} \lambda^{\epsilon} (\widehat{\lambda} - \lambda)^{2} \left( \frac{\lambda^{nk-d} (\sum\_{i=1}^{n} \mathbf{x}\_{i})^{nk-d+1} e^{-\lambda \sum\_{i=1}^{n} \mathbf{x}\_{i}}}{\Gamma(nk-d+1)} \right) d\lambda \\ R(\widehat{\lambda}) &= \frac{\left(\sum\_{i=1}^{n} \mathbf{x}\_{i}\right)^{nk-d+1}}{\Gamma(nk-d+1)} \left[ \widehat{\lambda}^{2} \int\_{0}^{\infty} \lambda^{nk-d+c} e^{-\lambda \sum\_{i=1}^{n} \mathbf{x}\_{i}} d\lambda + \int\_{0}^{\infty} \lambda^{nk-d+c+2} e^{-\lambda \sum\_{i=1}^{n} \mathbf{x}\_{i}} d\lambda - 2\widehat{\lambda} \left[ \int\_{0}^{\infty} \lambda^{nk-d+c+1} e^{-\lambda \sum\_{i=1}^{n} \mathbf{x}\_{i}} d\lambda \right] \right] \end{split}$$

On solving the above expression, we get

$$R(\hat{\lambda}) = \frac{\hat{\lambda}^2 \Gamma(nk - d + c + 1)}{\Gamma(nk - d + 1) \left(\sum\_{i=1}^n \mathbf{x}\_i\right)^c} + \frac{\Gamma(nk - d + c + 3)}{\Gamma(nk - d + 1) \left(\sum\_{i=1}^n \mathbf{x}\_i\right)^{c+2}} - \frac{2\hat{\lambda}\Gamma(nk - d + c + 2)}{\Gamma(nk - d + 1) \left(\sum\_{i=1}^n \mathbf{x}\_i\right)^{c+1}}$$

Minimization of the risk with respect to ^λ gives us the optimal estimator, i.e., ∂ ∂λ ^ <sup>R</sup> ^<sup>λ</sup> � � � � <sup>¼</sup> <sup>0</sup>

$$\hat{\lambda}\_A = \frac{(nk - d + c + 1)}{\left(\sum\_{i=1}^n \mathbf{x}\_i\right)}.\tag{27}$$

Remark: Replacing d = 1 in Eq. (27), the same Bayes estimator is obtained as in Eq. (21).

Theorem 5.3: Assuming the loss function Ll ^λ; λ � �, the Bayesian estimator of the rate parameter λ, if the shape parameter k is known, is of the form

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

$$
\hat{\lambda}\_l = \frac{(nk - d + 1)\log\left(1 + \frac{a}{\left(\sum\_{i=1}^n x\_i\right)}\right)}{a}.
$$

Proof: The risk function of the estimator λ under the LINEX loss function Ll ^λ; λ � � is given by the formula

$$\mathcal{R}\left(\hat{\lambda}\right) = \int\_0^\infty \left(\exp\left(a\left(\hat{\lambda} - \lambda\right)\right) - a\left(\hat{\lambda} - \lambda\right) - \mathbf{1}\right) \pi\_2\left(\lambda|\underline{x}\right) d\lambda. \tag{28}$$

Using Eq. (17) in Eq. (28), we have.

R ^λ � � <sup>¼</sup> ^<sup>λ</sup> <sup>þ</sup>

Statistical Methodologies

LA ^λ; λ � � is given by the formula

Eq. (19).

R ^λ � � <sup>¼</sup> ∞ð

R ^λ � � <sup>¼</sup> <sup>∑</sup><sup>n</sup>

R ^λ

∂ ∂λ

^ <sup>R</sup> ^<sup>λ</sup> � � � � <sup>¼</sup> <sup>0</sup>

Eq. (21).

102

0

<sup>i</sup>¼<sup>1</sup>xi � �nk�dþ<sup>1</sup> <sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> ^<sup>λ</sup>

� � <sup>¼</sup> ^λ<sup>2</sup>Γð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>c</sup> <sup>þ</sup> <sup>1</sup> <sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> <sup>∑</sup><sup>n</sup>

1 ^λ

^λ<sup>p</sup> <sup>¼</sup>

Γð Þ nk � d þ 3 <sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> <sup>∑</sup><sup>n</sup>

> ∂ ∂^λ

rate parameter λ, if the shape parameter k is known, is of the form

∞ð

0

e �λ ∑ n i¼1 xi

R ^λ � � <sup>¼</sup>

<sup>i</sup>¼<sup>1</sup>xi � �nk�dþ<sup>1</sup>

> λnk�dþ<sup>c</sup> e �λ ∑ n i¼1 xi dλ þ ∞ð

Γð Þ nk � d þ 1

By using Eq. (17) in Eq. (26), we have.

2 ∞ð

On solving the above expression, we get

2 4

0

<sup>i</sup>¼<sup>1</sup>xi � �<sup>c</sup> þ

<sup>λ</sup><sup>c</sup> ^<sup>λ</sup> � <sup>λ</sup> � �<sup>2</sup> <sup>λ</sup>nk�<sup>d</sup> <sup>∑</sup><sup>n</sup>

0 B@ <sup>i</sup>¼<sup>1</sup>xi

<sup>R</sup> ^<sup>λ</sup> � � � � <sup>¼</sup> <sup>0</sup>

Remark: Replacing d = 1 in Eq. (25), the same Bayes estimator is obtained as in

Theorem 5.2: Assuming the loss function lA ^λ; λ � �, the Bayesian estimator of the

^λ<sup>A</sup> <sup>¼</sup> ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>c</sup> <sup>þ</sup> <sup>1</sup> ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi � � :

Proof: The risk function of the estimator λ under the Al-Bayyati's loss function

<sup>λ</sup><sup>c</sup> ^<sup>λ</sup> � <sup>λ</sup> � �<sup>2</sup>

1 CA<sup>d</sup><sup>λ</sup>

0

Γð Þ nk � d þ c þ 3

<sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> <sup>∑</sup><sup>n</sup>

Minimization of the risk with respect to ^λ gives us the optimal estimator, i.e.,

^λ<sup>A</sup> <sup>¼</sup> ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>c</sup> <sup>þ</sup> <sup>1</sup> ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi

Remark: Replacing d = 1 in Eq. (27), the same Bayes estimator is obtained as in

Theorem 5.3: Assuming the loss function Ll ^λ; λ � �, the Bayesian estimator of the

rate parameter λ, if the shape parameter k is known, is of the form

λnk�dþcþ<sup>2</sup> e �λ ∑ n i¼1 xi <sup>d</sup><sup>λ</sup> � <sup>2</sup>^<sup>λ</sup>

<sup>i</sup>¼<sup>1</sup>xi

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>2</sup> <sup>p</sup> ∑<sup>n</sup> <sup>i</sup>¼<sup>1</sup>xi

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

� �<sup>2</sup> � <sup>2</sup>Γð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>2</sup>

<sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> <sup>∑</sup><sup>n</sup>

� � : (25)

<sup>π</sup><sup>2</sup> <sup>λ</sup>j<sup>x</sup> � �dλ: (26)

∞ð

λnk�dþcþ<sup>1</sup> e �λ ∑ n i¼1 xi dλ

3 5:

<sup>i</sup>¼<sup>1</sup>xi � �<sup>c</sup>þ<sup>1</sup>

0

� �<sup>c</sup>þ<sup>2</sup> � <sup>2</sup>^λΓð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>c</sup> <sup>þ</sup> <sup>2</sup>

� � : (27)

<sup>Γ</sup>ð Þ nk � <sup>d</sup> <sup>þ</sup> <sup>1</sup> <sup>∑</sup><sup>n</sup>

<sup>i</sup>¼<sup>1</sup>xi � � :

$$\begin{split} R(\hat{\lambda}) &= \bigcap\_{0}^{\infty} \left( \exp\left(a(\hat{\lambda} - \lambda)\right) - a\left(\hat{\lambda} - \lambda\right) - 1\right) \left( \frac{\lambda^{nk-d} \left(\sum\_{i=1}^{n} x\_{i}\right)^{nk-d+1} - \lambda \sum\_{i=1}^{n} x\_{i}}{\Gamma(nk-d+1)} \right) d\lambda \\ R(\hat{\lambda}) &= \frac{\left(\sum\_{i=1}^{n} x\_{i}\right)^{nk-d}}{\Gamma(nk-2c+1)} \left[ \frac{\exp\left(a\hat{\lambda}\right) \left[ \left(\exp\left(-\left(a + \sum\_{i=1}^{n} x\_{i}\right)\lambda\right) \right] \lambda^{nk-d} d\lambda - a\hat{\lambda} \right] \left( \exp\left(-\sum\_{i=1}^{n} x\_{i}\right) \lambda \right) \lambda^{nk-d} d\lambda \right] \\ &+ a \left[ \left( \exp\left(-\sum\_{i=1}^{n} x\_{i}\right) \lambda \right) \lambda^{nk-d+1} d\lambda - \left[ \left( \exp\left(-\sum\_{i=1}^{n} x\_{i}\right) \lambda \right) \lambda^{nk-d} d\lambda \right] \right] \end{split}$$

On solving the above expression, we get

$$R(\hat{\lambda}) = \frac{\exp\left(a\hat{\lambda}\right) \left(\sum\_{i=1}^{n} \mathbf{x}\_i\right)^{nk-d+1}}{\left(a + \sum\_{i=1}^{n} \mathbf{x}\_i\right)^{nk-d+1}} - a\hat{\lambda} + \frac{a\Gamma(nk-d+2)}{\Gamma(nk-d+1)\left(\sum\_{i=1}^{n} \mathbf{x}\_i\right)} - 1.$$

Minimization of the risk with respect to ^λ gives us the optimal estimator i.e.,

$$\frac{\partial}{\partial \hat{\lambda}} \left[ R(\hat{\lambda}) \right] = \mathbf{0}$$

$$\hat{\lambda}\_l = \frac{(nk - d + 1) \log \left( 1 + \frac{a}{\left( \sum\_{i=1}^{n} x\_i \right)} \right)}{a}. \tag{29}$$

Remark: Replacing d = 1 in Eq. (29), the same Bayes estimator is obtained as in Eq. (23).

#### 6. Entropy estimation of Erlang distribution

The concept of entropy was introduced by Claude. Shannon [42] in the paper "A Mathematical theory of Communication." This concept of Shannon's entropy is the central role of information theory, sometimes referred as measure of uncertainty. Shannon entropy provides an absolute limit on the best possible lossless encoding or compression of any communication, assuming that the communication may be represented as a sequence of independent and identical distributed random variables. Entropy is typically measured in bits, when the log is to the base 2, and nats, when the log is to the base n.

Shannon's definition of entropy, when applied to an information source, can determine the minimum channel capacity required to reliably transmit the source as encoded binary digit. The entropy of a random variable is defined in terms of its probability distribution and can be shown to be a good measure of randomness or uncertainty. For deriving the entropy of probability distributions, we need the following two definitions that are more discussed in Cover et al. [43].

Eð Þ¼ log ð Þ x

1 ð Þ k � 1 !

Eð Þ¼ log ð Þ x

DOI: http://dx.doi.org/10.5772/intechopen.85627

Also

λk�<sup>1</sup> ð Þ k � 1 !

∞ð

2 4

0

Eð Þ¼ log ð Þ x

E xð Þ¼

E xð Þ¼ <sup>λ</sup><sup>k</sup>

E xð Þ¼ <sup>λ</sup><sup>k</sup>

Substitute the value of Eqs. (32) and (33) in Eq. (31), we have.

ð Þ k � 1 ! � �

H fx ð Þ¼� ð Þ ; <sup>α</sup>; <sup>β</sup> log <sup>λ</sup><sup>k</sup>

AIC being the best.

105

The formula for AIC is given by

where K is the number of parameters and L ^λ

likelihood function for the estimated model.

7. AIC and BIC criterion for Erlang distribution

∞ð

0

ð Þ k � 1 !

ð Þ k � 1 !

E xð Þ¼ <sup>k</sup> λ

For model selection the approach of Akaike information criterion (AIC) and Bayesian information criterion (BIC) based on entropy estimation are used. The Akaike information criterion (AIC) was introduced by Hirotsugu Akaike [44] and proposed it as a measure of goodness of fit of an estimated statistical model. It is a measure of the relative quality of a statistical model for a given set of data. It has been found in information theory that it offers a relative estimate of the information lost when a given model is used to represent the process that generates the data. The AIC is not a test of the model in the sense of hypothesis testing; rather it is a test between models—a tool for model selection. Given a data set, several competing models may be ranked according to their AIC, with the one having the lowest

AIC <sup>¼</sup> <sup>2</sup><sup>K</sup> � 2 log <sup>L</sup> ^<sup>λ</sup> � � � � :

∞ð

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

log <sup>t</sup> λ � � t λ � �k�<sup>1</sup> e �t dt

e �t

xf xð Þ ; λ; k dx

xke �<sup>λ</sup>xdx

Γð Þ k þ 1 λ<sup>k</sup>þ<sup>1</sup>

∞ð

0

dt � log λ

Eð Þ¼ log ð Þ x ð Þ ψð Þ� k log λ : (32)

∞ð

ð Þ<sup>t</sup> <sup>k</sup>�<sup>1</sup> e �t dt

: (33)

� ð Þ k � 1 ðψð Þ� k log λÞ þ k: (34)

� � is the maximized value of the

3 5

0

0

Γ0 ð Þk <sup>Γ</sup><sup>k</sup> � log <sup>λ</sup>

log t tð Þk�<sup>1</sup>

Definition (i): The entropy of the discrete random variable defined on the probability space is given by

$$H\_P(f) = -\sum\_{i=1}^n p(f=a)\log\left(p(f=a)\right).$$

It is obvious that HPð Þf ≥0.

Definition (ii): The entropy of the continuous random variable defined on the real line is given by

$$H(f) = E(-\log \left(\infty\right)) = -\int\_{-\infty}^{\infty} f(\infty) \log f(\infty) d\infty.$$

In this section, entropy estimation of two parameter Erlang distribution is discussed which given as below.

Theorem 6.1: Let ð Þ x1; x2; …; xn be n positive independent and identically distributed random samples drawn from a population having Erlang density function Eq. (2), then the Shannon's entropy of two parameter Erlang distribution is given by

$$H(f(\mathbf{x}; a, \boldsymbol{\beta})) = -\log\left(\frac{\boldsymbol{\lambda}^k}{(k-1)!}\right) - (k-1)(\boldsymbol{\psi}(k) - \log \boldsymbol{\lambda}) + k.c.$$

Proof: Shannon's entropy for a continuous random variable is defined as

$$H(f(\mathbf{x};\lambda,k)) = E(-\log\left(f(\mathbf{x})\right) = -\int\_{-\infty}^{\infty} f(\mathbf{x}) \log f(\mathbf{x}) d\mathbf{x} \tag{30}$$

Using Eq. (1) in Eq. (30), we have

$$H(f(\mathbf{x}; a, \boldsymbol{\beta})) = E\left\{-\log\left(\frac{\boldsymbol{\lambda}^{k}}{(k-1)!} \boldsymbol{\lambda}^{k-1} \boldsymbol{e}^{-\boldsymbol{\lambda} \mathbf{x}}\right)\right\}$$

$$H(f(\mathbf{x}; a, \boldsymbol{\beta})) = -\log\left(\frac{\boldsymbol{\lambda}^{k}}{(k-1)!}\right) - (k-1)E(\log \left|\mathbf{x}\right|) + \boldsymbol{\lambda}E(\mathbf{x}).\tag{31}$$

Now

$$E(\log(\infty)) = \int\_0^\infty \log\left(\infty\right) f(\infty) d\infty$$

$$E(\log\left(\infty\right)) = \frac{\lambda^k}{(k-1)!} \int\_0^\infty \log\left(\infty\right) x^{k-1} e^{-\lambda x} dx$$

$$Put \; \lambda x = t \; \Rightarrow \; \partial x = \frac{\partial t}{\lambda}; \; as \; x \to 0, t \to 0 \; and \; as \; x \to \infty, t \to \infty$$

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

$$E(\log \left(x\right)) = \frac{\lambda^{k-1}}{(k-1)!} \prod\_{0}^{\infty} \log \left(\frac{t}{\lambda}\right) \left(\frac{t}{\lambda}\right)^{k-1} e^{-t} dt$$

$$E(\log \left(x\right)) = \frac{1}{(k-1)!} \left[ \int\_0^{\infty} \log t \left(t\right)^{k-1} e^{-t} dt - \log \lambda \int\_0^{\infty} (t)^{k-1} e^{-t} dt \right]$$

$$\begin{aligned} E(\log \left(x\right)) &= \frac{\Gamma'(k)}{\Gamma k} - \log \lambda\\ E(\log \left(x\right)) &= (\psi(k) - \log \lambda). \end{aligned} \tag{32}$$

Also

Shannon's definition of entropy, when applied to an information source, can determine the minimum channel capacity required to reliably transmit the source as encoded binary digit. The entropy of a random variable is defined in terms of its probability distribution and can be shown to be a good measure of randomness or uncertainty. For deriving the entropy of probability distributions, we need the

Definition (i): The entropy of the discrete random variable defined on the

Definition (ii): The entropy of the continuous random variable defined on the

In this section, entropy estimation of two parameter Erlang distribution is

ð Þ k � 1 ! � �

H fx ð Þ¼ ð Þ ; λ; k Eð� log ð Þ¼� f xð Þ

H fx ð Þ¼ ð Þ ; <sup>α</sup>; <sup>β</sup> <sup>E</sup> � log <sup>λ</sup><sup>k</sup>

ð Þ k � 1 ! � �

∞ð

0

∞ð

0

λk ð Þ k � 1 !

Eð Þ¼ log ð Þ x

Eð Þ¼ log ð Þ x

Proof: Shannon's entropy for a continuous random variable is defined as

Theorem 6.1: Let ð Þ x1; x2; …; xn be n positive independent and identically distributed random samples drawn from a population having Erlang density function Eq. (2), then the Shannon's entropy of two parameter Erlang distribution is given by

p f ð Þ ¼ a log ð Þ p f ð Þ ¼ a :

∞ð

f xð Þlog f xð Þdx:

� ð Þ k � 1 ðψð Þ� k log λÞ þ k:

f xð Þlog f xð Þdx (30)

∞ð

�∞

xk�<sup>1</sup> e �λx

� ð Þ k � 1 Eð Þþ log ð Þ x λE xð Þ: (31)

ð Þ k � 1 !

log ð Þ x f xð Þdx

log ð Þ <sup>x</sup> <sup>x</sup><sup>k</sup>�<sup>1</sup>

<sup>λ</sup> ; as x ! <sup>0</sup>, t ! <sup>0</sup> and as x ! <sup>∞</sup>, t ! <sup>∞</sup>

e �<sup>λ</sup>xdx

� � � �

�∞

following two definitions that are more discussed in Cover et al. [43].

n i¼1

H f ð Þ¼ Eð Þ¼� � log ð Þ x

H fx ð Þ¼� ð Þ ; <sup>α</sup>; <sup>β</sup> log <sup>λ</sup><sup>k</sup>

Using Eq. (1) in Eq. (30), we have

H fx ð Þ¼� ð Þ ; <sup>α</sup>; <sup>β</sup> log <sup>λ</sup><sup>k</sup>

Put <sup>λ</sup><sup>x</sup> <sup>¼</sup> <sup>t</sup> ) <sup>∂</sup><sup>x</sup> <sup>¼</sup> <sup>∂</sup><sup>t</sup>

Now

104

HPð Þ¼� f ∑

probability space is given by

Statistical Methodologies

real line is given by

It is obvious that HPð Þf ≥0.

discussed which given as below.

$$E(\boldsymbol{x}) = \bigcap\_{0}^{\infty} \mathbf{x} f(\boldsymbol{x}; \boldsymbol{\lambda}, \boldsymbol{k}) d\boldsymbol{x}$$

$$E(\boldsymbol{x}) = \frac{\boldsymbol{\lambda}^{k}}{(k-1)!} \bigcap\_{0}^{\infty} \boldsymbol{\lambda}^{k} e^{-\boldsymbol{\lambda} \mathbf{x}} d\boldsymbol{x}$$

$$E(\boldsymbol{x}) = \frac{\boldsymbol{\lambda}^{k}}{(k-1)!} \frac{\Gamma(k+1)}{\boldsymbol{\lambda}^{k+1}}$$

$$E(\boldsymbol{x}) = \frac{\boldsymbol{k}}{\boldsymbol{\lambda}}. \tag{33}$$

Substitute the value of Eqs. (32) and (33) in Eq. (31), we have.

$$H(f(x;a,\beta)) = -\log\left(\frac{\lambda^k}{(k-1)!}\right) - (k-1)(\nu(k) - \log \lambda) + k.\tag{34}$$

#### 7. AIC and BIC criterion for Erlang distribution

For model selection the approach of Akaike information criterion (AIC) and Bayesian information criterion (BIC) based on entropy estimation are used. The Akaike information criterion (AIC) was introduced by Hirotsugu Akaike [44] and proposed it as a measure of goodness of fit of an estimated statistical model. It is a measure of the relative quality of a statistical model for a given set of data. It has been found in information theory that it offers a relative estimate of the information lost when a given model is used to represent the process that generates the data.

The AIC is not a test of the model in the sense of hypothesis testing; rather it is a test between models—a tool for model selection. Given a data set, several competing models may be ranked according to their AIC, with the one having the lowest AIC being the best.

The formula for AIC is given by

$$AIC = 2K - 2\log\left(L(\hat{\lambda})\right).$$

where K is the number of parameters and L ^λ � � is the maximized value of the likelihood function for the estimated model.

AICC was first introduced by Hurvich and Tsai [45] and its different derivations were proposed by Burnham and Anderson [46]. AICC is AIC with a correction for finite sample sizes and is given by

BIC <sup>¼</sup> <sup>K</sup> log <sup>n</sup> <sup>þ</sup> <sup>2</sup>nH ED ^ ð Þ (38)

c = �1 c = 1 a = 0.5 a = 1.0

c = �1 c = 1 a = 0.5 a = 1.0

BIC ¼ K log n � 2ll:

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

We have generated the data for Erlang distribution of different sample sizes

ð Þ λ ¼ 0:5; 1:0 . The value for the loss parameter (C1 = �1, 1) and (a = 0.5, 1.0). The values of extension are (C = 0.5, 1.0). The estimates of rate parameter for each method are calculated. The results are presented in the following tables.

15 1 0.5 0.15350 0.16179 0.10530 0.18428 0.13350 0.12597

30 1 0.5 0.03480 0.03238 0.03238 0.03266 0.04191 0.03259

60 1 0.5 0.03910 0.03320 0.02856 0.03488 0.03110 0.03060

ML = maximum likelihood estimate, p = precautionary LF, A = Al-Bayyati's LF, l = LINEX LF.

N k λ d ^λML ^λ<sup>p</sup> ^λ<sup>A</sup> ^λ<sup>l</sup>

15 1 0.5 0.5 0.15350 0.14710 0.11081 0.18979 0.13901 0.13148

30 1 0.5 0.5 0.03480 0.03286 0.02179 0.02206 0.21630 0.02200

60 1 0.5 0.5 0.03910 0.03206 0.03206 0.03534 0.03156 0.03106

ML = maximum likelihood estimate, p = precautionary LF, A = Al-Bayyati's LF, l = LINEX LF.

1.0 0.15350 0.16179 0.10530 0.18428 0.13350 0.12597

1.0 0.51869 0.54673 0.43100 0.58896 0.47443 0.44430

1.0 0.03480 0.03238 0.03238 0.03266 0.04191 0.03259

1.0 0.06850 0.04258 0.07049 0.02085 0.03077 0.02171

1.0 0.03910 0.03320 0.02856 0.03488 0.03110 0.03060

1.0 0.10349 0.09918 0.08981 0.10244 0.09398 0.09200

2 1.0 0.5 0.51869 0.51528 0.43951 0.59747 0.48294 0.45281

2 1.0 0.5 0.06850 0.05828 0.06361 0.01397 0.02389 0.01483

2 1.0 0.5 0.10349 0.09641 0.09021 0.10284 0.09439 0.09241

2 1.0 0.51869 0.54673 0.43100 0.58896 0.47443 0.44430

2 1.0 0.06850 0.04258 0.07049 0.02085 0.03077 0.02171

2 1.0 0.10349 0.09918 0.08981 0.10244 0.09398 0.09200

(15, 30 and 60) in R Software for each pairs of ð Þ λ; k , where ð Þ k ¼ 1; 2 and

N k λ ^λML ^λ<sup>p</sup> ^λ<sup>A</sup> ^λ<sup>l</sup>

8. Simulation study of Erlang distribution

DOI: http://dx.doi.org/10.5772/intechopen.85627

or

Table 1.

Table 2.

107

Mean squared error for ^λ under Jeffrey's prior.

Mean squared error for ^λ under Quasi prior.

$$AICC = AIC + \frac{2K(K+1)}{(n-K-1)}.$$

Burnham and Anderson [47] strongly recommended that we should use AICC instead of AIC when the sample size is small or if K is large. Since AICC converges to AIC as the sample size is getting large. Using AIC, instead of AICC, when the sample size is not many times larger than K<sup>2</sup> , increases the probability of selecting models that have too many parameters, i.e., of over fitting. The probability of AIC over fitting can be substantial, in some cases.

The Bayesian information criterion (BIC) also known as Schwarz Criterion is used as a substitute for full calculation of the Bayes' factor since it can be calculated without specifying prior distribution. In BIC, the penalty for additional parameters is stronger than that of the AIC.

The formula for the BIC is given by

$$BIC = K \log n - 2 \log \left( L(\hat{\lambda}) \right).$$

where K is the number of parameters, n is the sample size and L ^λ is the maximized value of the likelihood function.

The AIC and BIC of two parameter Erlang distribution are obtained in this section, which are given below.

The Shannon's entropy of two parameter Erlang distribution is given by

$$\text{SH}(\text{ED}) = \log\left(k - 1\right)! - \log \lambda^k - (k - 1)E(\log x) + \lambda E(x)$$

$$\hat{H}(\text{ED}) = \log\left(k - 1\right)! - \log \lambda^k - (k - 1)\log \overline{x} + \lambda \overline{x}.\tag{35}$$

Also

$$\begin{split} l l(\boldsymbol{x}; \boldsymbol{\lambda}, \boldsymbol{k}) &= n \log \boldsymbol{\lambda}^{\boldsymbol{k}} - n \log \left( \boldsymbol{k} - \mathbf{1} \right) \mathbb{I} + (\boldsymbol{k} - \mathbf{1}) \sum\_{i=1}^{n} \log \boldsymbol{\chi}\_{i} - \boldsymbol{\lambda} \sum\_{i=1}^{n} \boldsymbol{x}\_{i} - l l\left( \boldsymbol{x}; \boldsymbol{\hat{\lambda}}, \boldsymbol{\hat{\lambda}} \right) \\ &= n \left[ \log \left( \boldsymbol{k} - \mathbf{1} \right) \mathbb{I} - \log \boldsymbol{\lambda}^{\boldsymbol{k}} - (\boldsymbol{k} - \mathbf{1}) \log \overline{\boldsymbol{x}} + \boldsymbol{\lambda} \overline{\boldsymbol{x}} \right]. \end{split} \tag{36}$$

Comparing Eqs. (35) and (36), we have

$$\mathcal{U}\left(\mathfrak{x}; \hat{\lambda}, \hat{k}\right) = -\mathfrak{n}\hat{H}(ED).$$

The AIC and BIC methodology attempts to find the model that best explains the data with a minimum of their values, we have.

$$\text{If } \text{l}\{\mathbf{x}; \hat{\lambda}, \hat{k}\} = -n\hat{H}(ED)\text{, then for Erlang family we have}$$

$$\text{AIC} = 2\text{K} + 2n\hat{H}(ED) \tag{37}$$

or

$$AIC = 2K - 2\mathcal{U}$$

and

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

$$BIC = K \log n + 2n\hat{H}(ED) \tag{38}$$

or

AICC was first introduced by Hurvich and Tsai [45] and its different derivations were proposed by Burnham and Anderson [46]. AICC is AIC with a correction for

Burnham and Anderson [47] strongly recommended that we should use AICC instead of AIC when the sample size is small or if K is large. Since AICC converges to AIC as the sample size is getting large. Using AIC, instead of AICC, when the

models that have too many parameters, i.e., of over fitting. The probability of AIC

The Bayesian information criterion (BIC) also known as Schwarz Criterion is used as a substitute for full calculation of the Bayes' factor since it can be calculated without specifying prior distribution. In BIC, the penalty for additional parameters

BIC <sup>¼</sup> <sup>K</sup> log <sup>n</sup> � 2 log <sup>L</sup> ^<sup>λ</sup> :

The AIC and BIC of two parameter Erlang distribution are obtained in this

SH ED ð Þ¼ log ð Þ <sup>k</sup> � <sup>1</sup> ! � log <sup>λ</sup><sup>k</sup> � ð Þ <sup>k</sup> � <sup>1</sup> <sup>E</sup>ð Þþ log <sup>x</sup> <sup>λ</sup>E xð Þ

H ED ^ ð Þ¼ log ð Þ <sup>k</sup> � <sup>1</sup> ! � log <sup>λ</sup><sup>k</sup> � ð Þ <sup>k</sup> � <sup>1</sup> log <sup>x</sup> <sup>þ</sup> <sup>λ</sup>x: (35)

n i¼1

¼ �nH ED ^ ð Þ:

The AIC and BIC methodology attempts to find the model that best explains the

AIC ¼ 2K � 2ll

log xi � λ ∑

AIC <sup>¼</sup> <sup>2</sup><sup>K</sup> <sup>þ</sup> <sup>2</sup>nH ED ^ ð Þ (37)

n i¼1

The Shannon's entropy of two parameter Erlang distribution is given by

<sup>¼</sup> <sup>n</sup> log ð Þ <sup>k</sup> � <sup>1</sup> ! � log <sup>λ</sup><sup>k</sup> � ð Þ <sup>k</sup> � <sup>1</sup> log <sup>x</sup> <sup>þ</sup> <sup>λ</sup><sup>x</sup> :

ll x; ^λ; ^ k 

¼ �nH ED ^ ð Þ, then for Erlang family we have

where K is the number of parameters, n is the sample size and L ^λ

2K Kð Þ þ 1 ð Þ <sup>n</sup> � <sup>K</sup> � <sup>1</sup> :

, increases the probability of selecting

is the

xi � ll x; ^λ; ^

k 

(36)

AICC ¼ AIC þ

finite sample sizes and is given by

Statistical Methodologies

is stronger than that of the AIC.

section, which are given below.

Also

ll x; ^λ; ^ k 

or

and

106

sample size is not many times larger than K<sup>2</sup>

over fitting can be substantial, in some cases.

The formula for the BIC is given by

maximized value of the likelihood function.

ll xð Þ¼ ; <sup>λ</sup>; <sup>k</sup> <sup>n</sup> log <sup>λ</sup><sup>k</sup> � <sup>n</sup> log ð Þ <sup>k</sup> � <sup>1</sup> ! <sup>þ</sup> ð Þ <sup>k</sup> � <sup>1</sup> <sup>∑</sup>

Comparing Eqs. (35) and (36), we have

data with a minimum of their values, we have.

$$BIC = K \log n - 2l l.$$

### 8. Simulation study of Erlang distribution

We have generated the data for Erlang distribution of different sample sizes (15, 30 and 60) in R Software for each pairs of ð Þ λ; k , where ð Þ k ¼ 1; 2 and ð Þ λ ¼ 0:5; 1:0 . The value for the loss parameter (C1 = �1, 1) and (a = 0.5, 1.0). The values of extension are (C = 0.5, 1.0). The estimates of rate parameter for each method are calculated. The results are presented in the following tables.


#### Table 1.

Mean squared error for ^λ under Jeffrey's prior.


#### Table 2.

Mean squared error for ^λ under Quasi prior.

#### 9. Comparison of Erlang distribution (ED) with its sub-models

The flexibility and potentiality of the Erlang distribution is compared with its sub models, which is examined by using different criterions like AIC, BIC and AICC with the help of the following illustration.

#### Illustration I:

We provide the compatibility of the Erlang distribution (ED) with their submodels; Chi square and exponential distributions. For this purpose, we generated the data set for Erlang distribution of large sample size (i.e., 200) in R Software for each pairs of ð Þ λ; k , where ð Þ k ¼ 2 and ð Þ λ ¼ 2:5 . The data analysis is given in the following table:


10. Results and discussion

λ

DOI: http://dx.doi.org/10.5772/intechopen.85627

Exponential λ

Table 4.

ter C is 1.

BIC, AICC statistics value.

11. Conclusions

109

presented in the Tables 1 and 2 respectively.

AIC, BIC and AICC criterion for different sub-models of ED.

We primarily studied the maximum likelihood (MLH) estimation and Bayesian

Chi-square k = 2.9420 1.1745 177.5835 357.1670 358.3025 357.327 Erlang k = 1 113.0298 230.0597 232.3307 230.5212

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

^ = 0.01387 0.002877 121.4324 244.8648 246.0003 245.0248

log l AIC BIC AICC

From the results obtained in Tables 1 and 2, we observe that in most of the cases, Bayesian estimator under Al-Bayyati's loss function has the smallest mean squared error (MSE) values for Jeffrey's prior and Quasi prior as compared to other loss functions and the maximum likelihood estimator. Thus we can conclude that Bayes estimator under Al-Bayyati's loss function is efficient when the loss parame-

Also we estimated the unknown parameters of sub-models of Erlang distribution. The Akaike information criterion (AIC), Bayesian information criterion (BIC), and the corrected Akaike information criterion (AICC) are used to compare the candidate distributions. The best distribution corresponds to lower logL, AIC,

From the results obtained in Tables 3 and 4, we observe that the Erlang distribution is a competitive distribution as compared to its sub-models (i.e., exponential distribution and Chi-square distribution). In fact, based on the values of the AIC, BIC and AICC criteria, it shows clear picture that the Erlang distribution provides

In this paper we have generated three types of data sets with varying sample sizes for Erlang distribution. These data sets were simulated and behavior of the data was checked in case of parameter estimation for Erlang distribution in R Software. By the virtue of the data analysis we are able predict the estimate of rate parameter for Erlang distribution under three different functions by using two different prior distributions. With the help of these results we can also do compar-

Also the comparison of Erlang distribution with its sub-models was carried out.

The results acquired in Tables 3 and 4, it shows the clear picture that Erlang distribution performs better as compared to its sub-models. Thus we can say that Erlang distribution is efficient as compared to its sub-models (i.e., exponential distribution and Chi-square distribution) on the basis of the above procedures.

the best fit for these data among all the models considered.

ison between loss functions and the priors.

estimation to estimate the rate parameter of Erlang distribution. In Bayesian method, we use Jeffreys' prior and Quasi prior under three different loss functions. These methods are compared through simulation technique and the results are

Model Parameter estimate Standard error Measures

^ = 0.05577 0.01683

Table 3.

AIC, BIC and AICC criterion for different sub-models of ED.

#### Illustration II:

The data set is taken from Lawless [48]. The observations involves the number of million revolutions between failures for each of 23 ball bearings, the individual bearings were inspected periodically to determine whether "failure" had occurred. Treating the failure times as continuous, the 23 failure times are:

(17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.40, 51.84, 51.96, 54.02, 55.56, 67.80, 68.64, 68.64, 68.88, 84.12, 93.12, 98.64, 105.12, 105.84, 127.92, 128.04, 173.40)

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627


Table 4.

9. Comparison of Erlang distribution (ED) with its sub-models

Model Parameter estimate Standard error Measures

^ = 1.1890 0.0594

with the help of the following illustration.

Illustration I:

Statistical Methodologies

following table:

Table 3.

Exponential λ

λ

AIC, BIC and AICC criterion for different sub-models of ED.

Illustration II:

108

The flexibility and potentiality of the Erlang distribution is compared with its sub models, which is examined by using different criterions like AIC, BIC and AICC

We provide the compatibility of the Erlang distribution (ED) with their submodels; Chi square and exponential distributions. For this purpose, we generated the data set for Erlang distribution of large sample size (i.e., 200) in R Software for each pairs of ð Þ λ; k , where ð Þ k ¼ 2 and ð Þ λ ¼ 2:5 . The data analysis is given in the

Chi-square k = 1.090 0.0586 312.3096 625.6346 628.9329 633.3902 Erlang k = 2 289.9327 581.8654 585.1637 625.6543

^ = 0.5945 0.04203 315.6853 633.3705 636.6689 581.9245

� log l AIC BIC AICC

The data set is taken from Lawless [48]. The observations involves the number of million revolutions between failures for each of 23 ball bearings, the individual bearings were inspected periodically to determine whether "failure" had occurred.

(17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.40, 51.84, 51.96, 54.02, 55.56, 67.80,

68.64, 68.64, 68.88, 84.12, 93.12, 98.64, 105.12, 105.84, 127.92, 128.04, 173.40)

Treating the failure times as continuous, the 23 failure times are:

AIC, BIC and AICC criterion for different sub-models of ED.

#### 10. Results and discussion

We primarily studied the maximum likelihood (MLH) estimation and Bayesian estimation to estimate the rate parameter of Erlang distribution. In Bayesian method, we use Jeffreys' prior and Quasi prior under three different loss functions. These methods are compared through simulation technique and the results are presented in the Tables 1 and 2 respectively.

From the results obtained in Tables 1 and 2, we observe that in most of the cases, Bayesian estimator under Al-Bayyati's loss function has the smallest mean squared error (MSE) values for Jeffrey's prior and Quasi prior as compared to other loss functions and the maximum likelihood estimator. Thus we can conclude that Bayes estimator under Al-Bayyati's loss function is efficient when the loss parameter C is 1.

Also we estimated the unknown parameters of sub-models of Erlang distribution. The Akaike information criterion (AIC), Bayesian information criterion (BIC), and the corrected Akaike information criterion (AICC) are used to compare the candidate distributions. The best distribution corresponds to lower logL, AIC, BIC, AICC statistics value.

From the results obtained in Tables 3 and 4, we observe that the Erlang distribution is a competitive distribution as compared to its sub-models (i.e., exponential distribution and Chi-square distribution). In fact, based on the values of the AIC, BIC and AICC criteria, it shows clear picture that the Erlang distribution provides the best fit for these data among all the models considered.

#### 11. Conclusions

In this paper we have generated three types of data sets with varying sample sizes for Erlang distribution. These data sets were simulated and behavior of the data was checked in case of parameter estimation for Erlang distribution in R Software. By the virtue of the data analysis we are able predict the estimate of rate parameter for Erlang distribution under three different functions by using two different prior distributions. With the help of these results we can also do comparison between loss functions and the priors.

Also the comparison of Erlang distribution with its sub-models was carried out. The results acquired in Tables 3 and 4, it shows the clear picture that Erlang distribution performs better as compared to its sub-models. Thus we can say that Erlang distribution is efficient as compared to its sub-models (i.e., exponential distribution and Chi-square distribution) on the basis of the above procedures.

## Acknowledgements

The authors acknowledge editor of this journal for encouragement to finalize the chapter. Further, the authors acknowledge profound thanks to anonymous referee for giving critical comments which have immensely improved the presentation of the chapter. Also I extend my sincere thanks to all the authors whose papers I have consulted for this work.

References

2:57-62

9(5):211-215

[1] Erlang AK. The theory of probabilities and telephone conversations. Nyt Tidsskrift for Matematik B. 1909;20(6):87-98

[2] Evans M, Hastings N, Peacock B. Statistical Distributions. 3rd ed. New York: John Wiley and Sons, Inc.; 2000

DOI: http://dx.doi.org/10.5772/intechopen.85627

[10] Ahmad K, Ahmad SP, Ahmed A, Reshi JA. Bayesian analysis of

Software. Journal of Statistics

[11] Raffia H, Schlaifer R. Applied Statistical Decision Theory. Division of Research, Graduate School of Business Administration, Harvard University; 1961

[12] DeGroot MH. Optimal Statistical Decisions. New York: McGraw-Hill; 1970

[13] Zellner A. Bayesian and non-Bayesian analysis of the log-normal distribution and log normal regression. Journal of the American Statistical Association. 1971;66:327-330

[14] Box GEP, Tiao GC. Bayesian Inference in Statistical Analysis. New York: John Wiley and Sons; 1973

[15] Bayes TR. An Essay Towards Solving Problems In The Doctrine Of Chances. Philosophical Transactions of the Royal Society A. 1763;53:370-418 Reprinted in Biometrika, 48, 296-315

[16] Laplace PS. Eassi Philosophiquesur Les Probabilities, Paris. This Book Went Through Five Editions (The Fifth was in 1825) Revised By Laplace. The Sixth Edition Appeared In English Translation By Dover Publications, New York, In 1951. While This Philosophical Essay Appeared Separately In 1814, It Also Appeared As A Preface To His Earlier Work. Theoric Analytique Des Probabilites; 1774. pp. 621-656

[17] Jeffrey H. Theory of Probability. III edition (I edition 1939, II edition 1994) ed. Oxford University Press: Clarendon

definition of subjective Probability. The

[18] Anscombe F, Alhnann R. A

Press; 1961

323-335

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

generalized gamma distribution using R

Applications & Probability. 2015;2(4):

[3] Bhattacharyya SK, Singh NK. Bayesian estimation of the traffic intensity in M/Ek/1 queue Far. East. Journal of Mathematical Sciences. 1994;

[4] Haq A, Dey S. Bayesian estimation of Erlang distribution under different prior distributions. Journal of Reliability and Statistical Studies. 2001;4(1):1-30

[5] Suri PK, Bhushan B, Jolly A. Time estimation for project management life

[6] Damodaran D, Gopal G, Kapur PK. A Bayesian Erlang software reliability

[7] Jodra P. Computing the asymptotic expansion of the median of the Erlang distribution. Mathematical Modelling and Analysis. 2012;17(2):281-292

[8] Zellner A. Bayesian estimation and prediction using asymmetric loss function [PhD thesis]. Journal of American Statistical Association exponential distribution using simulation. Iraq: Baghdad University;

cycles: A simulation approach. International Journal of Computer Science and Network Security. 2009;

model. Communication in Dependability and Quality Management. 2010;13(4):82-90

1986. vol. 81, pp. 446-451

156-164. ISSN: 1991-8178

111

[9] Ahmad SP, Ahmad K. Bayesian analysis of Weibull distribution using R Software. Australian Journal of Basic and Applied Sciences. 2013;7(9):

## Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this chapter.

## Author details

Kaisar Ahmad1 \* and Sheikh Parvaiz Ahmad<sup>2</sup>

1 Department of Statistics, Sheikh-ul-Alam Memorial Degree College, Budgam, J&K, India

2 Department of Statistics, University of Kashmir, Srinagar, J&K, India

\*Address all correspondence to: ahmadkaisar31@gmail.com

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

### References

Acknowledgements

Statistical Methodologies

consulted for this work.

Conflict of interest

tion of this chapter.

Author details

Kaisar Ahmad1

J&K, India

110

\* and Sheikh Parvaiz Ahmad<sup>2</sup>

\*Address all correspondence to: ahmadkaisar31@gmail.com

provided the original work is properly cited.

1 Department of Statistics, Sheikh-ul-Alam Memorial Degree College, Budgam,

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

2 Department of Statistics, University of Kashmir, Srinagar, J&K, India

The authors acknowledge editor of this journal for encouragement to finalize the chapter. Further, the authors acknowledge profound thanks to anonymous referee for giving critical comments which have immensely improved the presentation of the chapter. Also I extend my sincere thanks to all the authors whose papers I have

The authors declare that there is no conflict of interests regarding the publica-

[1] Erlang AK. The theory of probabilities and telephone conversations. Nyt Tidsskrift for Matematik B. 1909;20(6):87-98

[2] Evans M, Hastings N, Peacock B. Statistical Distributions. 3rd ed. New York: John Wiley and Sons, Inc.; 2000

[3] Bhattacharyya SK, Singh NK. Bayesian estimation of the traffic intensity in M/Ek/1 queue Far. East. Journal of Mathematical Sciences. 1994; 2:57-62

[4] Haq A, Dey S. Bayesian estimation of Erlang distribution under different prior distributions. Journal of Reliability and Statistical Studies. 2001;4(1):1-30

[5] Suri PK, Bhushan B, Jolly A. Time estimation for project management life cycles: A simulation approach. International Journal of Computer Science and Network Security. 2009; 9(5):211-215

[6] Damodaran D, Gopal G, Kapur PK. A Bayesian Erlang software reliability model. Communication in Dependability and Quality Management. 2010;13(4):82-90

[7] Jodra P. Computing the asymptotic expansion of the median of the Erlang distribution. Mathematical Modelling and Analysis. 2012;17(2):281-292

[8] Zellner A. Bayesian estimation and prediction using asymmetric loss function [PhD thesis]. Journal of American Statistical Association exponential distribution using simulation. Iraq: Baghdad University; 1986. vol. 81, pp. 446-451

[9] Ahmad SP, Ahmad K. Bayesian analysis of Weibull distribution using R Software. Australian Journal of Basic and Applied Sciences. 2013;7(9): 156-164. ISSN: 1991-8178

[10] Ahmad K, Ahmad SP, Ahmed A, Reshi JA. Bayesian analysis of generalized gamma distribution using R Software. Journal of Statistics Applications & Probability. 2015;2(4): 323-335

[11] Raffia H, Schlaifer R. Applied Statistical Decision Theory. Division of Research, Graduate School of Business Administration, Harvard University; 1961

[12] DeGroot MH. Optimal Statistical Decisions. New York: McGraw-Hill; 1970

[13] Zellner A. Bayesian and non-Bayesian analysis of the log-normal distribution and log normal regression. Journal of the American Statistical Association. 1971;66:327-330

[14] Box GEP, Tiao GC. Bayesian Inference in Statistical Analysis. New York: John Wiley and Sons; 1973

[15] Bayes TR. An Essay Towards Solving Problems In The Doctrine Of Chances. Philosophical Transactions of the Royal Society A. 1763;53:370-418 Reprinted in Biometrika, 48, 296-315

[16] Laplace PS. Eassi Philosophiquesur Les Probabilities, Paris. This Book Went Through Five Editions (The Fifth was in 1825) Revised By Laplace. The Sixth Edition Appeared In English Translation By Dover Publications, New York, In 1951. While This Philosophical Essay Appeared Separately In 1814, It Also Appeared As A Preface To His Earlier Work. Theoric Analytique Des Probabilites; 1774. pp. 621-656

[17] Jeffrey H. Theory of Probability. III edition (I edition 1939, II edition 1994) ed. Oxford University Press: Clarendon Press; 1961

[18] Anscombe F, Alhnann R. A definition of subjective Probability. The Annals of Mathematical Statistics. 1963; 34(1):199-205

[19] Berger JO. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York: Springer-Verlag; 1985

[20] Berger JO, Bernardo JM. Estimating a product of means: Bayesian analysis with reference priors. American Journal of Mathematical Statistical Association. 1989;84:200-207

[21] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. London: Chapman and Hall; 1995

[22] Leonardo T, Hsu JSJ. Bayesian Methods. Cambridge: Cambridge University Press; 1999

[23] De Finetti B. A critical essay on the theory of probability and on the value of science. "probabilismo". Erkenntnis. 1931;31:169-223 English translation as "Probabilism"

[24] Bernardo JM, Smith AFM. Bayesian Theory. Chichester, West Sussex: John Wiley and Sons; 2000

[25] Robert CP. The Bayesian Choice. 2nd ed. New York: Springer-Verlag; 2001

[26] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd ed. Boca Raton, FL: Chapman and Hall/ CRC; 2004

[27] Marin JM, Robert CP. Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer-Verlag; 2007

[28] Carlin BP, Louis TA. Bayesian Methods for Data Analysis. IIIrd ed. Boca Raton, FL: Chapman and Hall/ CRC; 2008

[29] Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. New York: Springer Verlag; 2001

[30] Ghosh JK, Delampady M, Samanta T. An Introduction to Bayesian Analysis Theory and Methods. Springer-Verlag; 2006

Nank SSSR Soviet-Maths. Dokl. T. 1972;

DOI: http://dx.doi.org/10.5772/intechopen.85627

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang…

[41] Varian HR. A Bayesian approach to real estate assessment. In: Fienberg SE, Zellner A, editors. Studies in Bayesian Econometrics and Statistics in Honor of Leonard J. Savage. Amsterdam: North-

[42] Shannon CE. A mathematical theory

[43] Cover TM, Thomas JA. Elements of Information Theory. New York: Wiley;

[44] Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. 2nd International Symposium on Information Theory. Budapest: Akademia Kiado; 1973. pp. 267-281

[45] Hurvich CM, Tsai CL. Regression and time series model selection in small samples. Biometrika. 1989;76:297-307

Sociological Methods & Research. 2004;

[47] Burnham KP, Anderson DR. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed. Springer-Verlag;

[48] Lawless JF. Statistical Models and Methods for Lifetime Data. 2nd ed.

Wiley-InterScience; 2002

[46] Burnham KP, Anderson DR. Multimodel inference: Understanding AIC and BIC in model selection.

Holland; 1975. pp. 195-208

of communication. Bell System Technical Journal. 1948;27:623-659

203:1249-1251

1991

33:261-304

2002

113

[31] Bansal AK. Bayesian Parametric Inference. New Dehli: Narosa Publishing House; 2007

[32] Koch KR. Introduction to Bayesian Statistics. New York: Springer-Verlag; 2007

[33] Hoff PD. A First Course in Bayesian Statistical Methods. Springer-Verlag; 2009

[34] Ahmad K, Ahmad SP, Ahmed A. Classical and Bayesian approach in estimation of scale parameter of inverse Weibull distribution. Mathematical Theory and Modeling. 2015;5. ISSN 2224-5804

[35] Ahmad K, Ahmad SP, Ahmed A. Classical and Bayesian approach in estimation of scale parameter of Nakagami distribution. Journal of Probability and Statistics. 2016;2016 Article ID 7581918, 1-8

[36] Jefferys H. An invariant form for the prior probability in estimation problems. Proceedings of The Royal Society Of London, Series. A. 1946;186: 453-461

[37] Wald A. Statistical Decision Functions. Wiley; 1950

[38] Norstrom JG. The use of precautionary loss functions in risk analysis. IEEE Transactions on Reliability. 1996;3:400-403

[39] Al-Bayyati. Comparing methods of estimating Weibull failure models using simulation [PhD thesis]. Iraq: College of Administration and Economics, Baghdad University; 2002

[40] Klebnov LB. Universal loss function and unbiased estimation. Dokl. Akad.

A Comparative Study of Maximum Likelihood Estimation and Bayesian Estimation for Erlang… DOI: http://dx.doi.org/10.5772/intechopen.85627

Nank SSSR Soviet-Maths. Dokl. T. 1972; 203:1249-1251

Annals of Mathematical Statistics. 1963;

[30] Ghosh JK, Delampady M, Samanta T. An Introduction to Bayesian Analysis Theory and Methods. Springer-Verlag;

[31] Bansal AK. Bayesian Parametric Inference. New Dehli: Narosa Publishing House; 2007

[32] Koch KR. Introduction to Bayesian Statistics. New York: Springer-Verlag;

[33] Hoff PD. A First Course in Bayesian Statistical Methods. Springer-Verlag;

[34] Ahmad K, Ahmad SP, Ahmed A. Classical and Bayesian approach in estimation of scale parameter of inverse Weibull distribution. Mathematical Theory and Modeling. 2015;5. ISSN

[35] Ahmad K, Ahmad SP, Ahmed A. Classical and Bayesian approach in estimation of scale parameter of Nakagami distribution. Journal of Probability and Statistics. 2016;2016

[36] Jefferys H. An invariant form for the prior probability in estimation problems. Proceedings of The Royal Society Of London, Series. A. 1946;186:

[37] Wald A. Statistical Decision

[38] Norstrom JG. The use of precautionary loss functions in risk analysis. IEEE Transactions on Reliability. 1996;3:400-403

[39] Al-Bayyati. Comparing methods of estimating Weibull failure models using simulation [PhD thesis]. Iraq: College of

[40] Klebnov LB. Universal loss function and unbiased estimation. Dokl. Akad.

Administration and Economics, Baghdad University; 2002

Functions. Wiley; 1950

Article ID 7581918, 1-8

2006

2007

2009

2224-5804

453-461

[20] Berger JO, Bernardo JM. Estimating a product of means: Bayesian analysis with reference priors. American Journal of Mathematical Statistical Association.

[21] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. London: Chapman and Hall; 1995

[22] Leonardo T, Hsu JSJ. Bayesian Methods. Cambridge: Cambridge

[23] De Finetti B. A critical essay on the theory of probability and on the value of science. "probabilismo". Erkenntnis. 1931;31:169-223 English translation as

[24] Bernardo JM, Smith AFM. Bayesian Theory. Chichester, West Sussex: John

[25] Robert CP. The Bayesian Choice. 2nd ed. New York: Springer-Verlag;

[26] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd ed. Boca Raton, FL: Chapman and Hall/

[27] Marin JM, Robert CP. Bayesian Core: A Practical Approach to Computational Bayesian Statistics.

[28] Carlin BP, Louis TA. Bayesian Methods for Data Analysis. IIIrd ed. Boca Raton, FL: Chapman and Hall/

[29] Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. New York:

University Press; 1999

Wiley and Sons; 2000

Springer-Verlag; 2007

Springer Verlag; 2001

"Probabilism"

2001

CRC; 2004

CRC; 2008

112

[19] Berger JO. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York: Springer-Verlag; 1985

34(1):199-205

Statistical Methodologies

1989;84:200-207

[41] Varian HR. A Bayesian approach to real estate assessment. In: Fienberg SE, Zellner A, editors. Studies in Bayesian Econometrics and Statistics in Honor of Leonard J. Savage. Amsterdam: North-Holland; 1975. pp. 195-208

[42] Shannon CE. A mathematical theory of communication. Bell System Technical Journal. 1948;27:623-659

[43] Cover TM, Thomas JA. Elements of Information Theory. New York: Wiley; 1991

[44] Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. 2nd International Symposium on Information Theory. Budapest: Akademia Kiado; 1973. pp. 267-281

[45] Hurvich CM, Tsai CL. Regression and time series model selection in small samples. Biometrika. 1989;76:297-307

[46] Burnham KP, Anderson DR. Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods & Research. 2004; 33:261-304

[47] Burnham KP, Anderson DR. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed. Springer-Verlag; 2002

[48] Lawless JF. Statistical Models and Methods for Lifetime Data. 2nd ed. Wiley-InterScience; 2002

Chapter 7

Abstract

previous results.

weak dependence

1. Introduction

the three types

115

Dependence

Asymptotic Normality of Hill'

This note is devoted to the asymptotic normality of Hill's estimator when data are weakly dependent in the sense of Doukhan. The primary results on this setting rely on the observations being strong mixing. This assumption is often the key tool for establishing the asymptotic behavior of this estimator. A number of attempts have been made to relax the assumption of stationarity and mixing. Relaxing this condition, and assuming the weak dependence, we extend the results obtained by Rootzen and Starica. This approach requires less restrictive conditions than the

Keywords: tail index, Hill's estimator, regularly varying function, linear process,

Extreme value theory (EVT) is a branch of statistics which focus on modeling and measuring extremes events occurring with small probability. Rare events can have severe consequences for human and economic society. The protection against these events is therefore of particular interest. EVT have been extensively applied in various many fields including hydrology, finance, insurance and telecommunications. Unlike most traditional statistical analysis that deal with the center of the underlying distribution, EVT enables us to restrict attention to the behavior of the tails of the distribution which is strongly connected to limiting distribution of

Let X1, X2, …, Xn be i.i.d random variables with a common distribution F and let Xð Þ <sup>n</sup> ≤ … ≤Xð Þ<sup>1</sup> the order statistics pertaining to X1, X2, …, Xn, where Xð Þ <sup>n</sup> ¼ Minð Þ Xi

0 if x≤0

and Xð Þ<sup>1</sup> ¼ Maxð Þ Xi . Suppose that there exist two normalizing constants an,

<sup>F</sup><sup>n</sup>ð Þ! bnx <sup>þ</sup> an H xð Þ, for every continuity point x of <sup>H</sup>, then <sup>H</sup> belongs to one of

bn, bð Þ <sup>n</sup> > 0 and a nondegenerate distribution function H such that

• Type I (<sup>β</sup> > 0): <sup>Φ</sup>βð Þ¼ <sup>x</sup> exp �x�<sup>β</sup> if <sup>x</sup> > 0

extremes values, i.e., maximum or minimum of a sample.

Estimator under Weak

Boualam Karima and Berkoun Youcef

s

#### Chapter 7
