Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

Jesús Adolfo Minjárez-Sosa

### Abstract

This chapter deals with a class of discrete-time inventory control systems where the demand process f g Dt is formed by independent and identically distributed random variables with unknown density. Our objective is to introduce a suitable density estimation method which, combined with optimal control schemes, defines a procedure to construct optimal policies under a discounted optimality criterion.

Keywords: discounted optimality, density estimation, inventory systems, optimal policies, Markov decision processes

AMS 2010 subject classifications: 93E20, 62G07, 90B05

### 1. Introduction

Inventory systems are one of the most studied sequential decision problems in the fields of operation research and operation management. Its origin lies in the problem of determining how much inventory of a certain product should be kept in existence to meet the demand of buyers, at a cost as low as possible. Specifically, the question is: How much should be ordered, or produced, to satisfy the demand that will be presented during a certain period? Clearly, the behavior of the inventory over time depends on the ordered quantities and the demand of the product in successive periods. Indeed, let It and qt be the inventory level and the order quantity at the beginning of period t, respectively, and Dt be the random demand during period t: Then f g It <sup>t</sup>≥<sup>0</sup> is a stochastic process whose evolution in time is given as

$$I\_{t+1} = \max\{0, I\_t + q\_t - D\_t\} =: \left(I\_t + q\_t - D\_t\right)^+, \ t = 0, 1, \dots$$

Schematically, this process is illustrated in the following figure.

(Standard inventory system)

In this case, the inventory manager (IM) observes the inventory level It and then selects the order quantity qt as a function of It: The order quantity process causes costs in the operation of the inventory system. For instance, if the quantity ordered is relatively small, then the items are very likely to be sold out, but there will be unmet demand. In this case the holding cost is reduced, but there is a significant cost due to shortage. Otherwise, if the size of the order is large, there is a risk of having surpluses with a high holding cost. These facts give rise to a stochastic optimization problem, which can be modeled as a Markov decision process (MDP). That is, the inventory system can be analyzed as a stochastic optimal control problem whose objective is to find the optimal ordering policy that minimizes a total expected cost.

The analysis of the control problem associated to inventory systems has been done under several scenarios: discrete-time and continuous-time systems with finite or infinite capacity, inventory systems considering bounded and unbounded onestage cost, as well as partially observable models, among others (see, e.g., [1–5, 7]). Moreover, such scenarios have their own methods and techniques to solve the corresponding control problem. However, in most cases, it has been assumed that all the components that define the behavior of the inventory system are known to the IM, which, in certain situations, can be too strong and unrealistic. Hence it is necessary to implement schemes that allow learning or collecting information about the unknown components during the evolution of the system to choose a decision with as much information as possible.

In this chapter we study a class of inventory control systems where the density of the demand is unknown by the IM. In this sense, our objective is to propose a procedure that combines density estimation methods and control schemes to construct optimal policies under a total expected discounted cost criterion. The estimation and control procedure is illustrated in the following figure:

(Estimation and control procedure)

the field of Markov decision processes (see, e.g., [11] and references therein) and in adaptive control (see, e.g. [9, 12–14]). That is, we prove the existence of a weighted function W which imposes a growth condition on the cost functions. Then, applying the dynamic programming algorithm, the density estimation method is adapted to such a condition to define an estimation and control procedure for the construction

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

The chapter is organized as follows. In Section 2 we describe the inventory model and define the corresponding optimal control problem. In Section 3 we introduce the dynamic programming approach under the true density. Next, in Section 4 we present the density estimation method which will be used to state, in Section 5, an estimation and control procedure for the construction of optimal policies. The proofs of the main results are given in Section 6. Finally, in Section 7,

We consider an inventory system evolving according to the difference equation

where It and qt are the inventory level and the order quantity at the beginning of period t, taking values in I ≔ ½ Þ 0; ∞ and Q ≔ ½ Þ 0; ∞ , respectively, and Dt represents

� �<sup>þ</sup>, t <sup>¼</sup> <sup>0</sup>, <sup>1</sup>, …, (1)

D ≔ E Dð Þ<sup>t</sup> < ∞: (2)

ρð Þs ≤ ρð Þs (3)

ρð Þs ds< ∞: (4)

<sup>1</sup>þ<sup>r</sup> � �, s <sup>∈</sup>½0, <sup>∞</sup>Þ, for some positive constants <sup>K</sup>

Itþ<sup>1</sup> ¼ It þ qt � Dt

manager. In addition, we assume finite expectation

For example, if ρð Þs ≔ Kmin 1; 1=s

73

The one-stage cost function is defined as

the random demand during period t: We assume that f g Dt is an observable sequence of nonnegative independent and identically distributed (i.i.d.) random variables with a common density ρ∈L1½ Þ 0; ∞ which is unknown by the inventory

Moreover, there exists a measurable function ρ∈L1½ Þ 0; ∞ such that

almost everywhere with respect to the Lebesgue measure. In addition

~c Ið Þ¼ ; q; D cq þ h Ið Þ þ q � D <sup>þ</sup> þ b Dð Þ � I � q <sup>þ</sup>, Ið Þ ; q ∈ I � Q , (5)

where h, c, and b are, respectively, the holding cost per unit, the ordering cost

The order quantities applied by the IM are selected according to rules known as ordering control policies defined as follows. Let H<sup>t</sup> be the space of histories of the inventory system up to time t: That is, a typical element of H<sup>t</sup> is written as

∞ð

0 s 2

and r, then there are plenty of densities that satisfy (3)–(4).

per unit, and the shortage cost per unit, satisfying b> c:

of optimal policies.

we present some concluding remarks.

DOI: http://dx.doi.org/10.5772/intechopen.88392

2. The inventory model

In this case, unlike the standard inventory system, before choosing the order quantity qt , the IM implements a density estimation method to get an estimate ρt, and, possibly, combines this with the history of the system ht <sup>¼</sup> <sup>I</sup>0; <sup>q</sup>0; <sup>D</sup>0; …;It�<sup>1</sup>; qt�<sup>1</sup>; Dt�<sup>1</sup>;It to select qt <sup>¼</sup> qt ht; <sup>ρ</sup><sup>t</sup> ð Þ: Specifically, the density of the demand is estimated by the projection of an arbitrary estimator on an appropriate set, and its convergence is stated with respect to a norm which depends on the components of the inventory control model.

In general terms, our approach consists in to show that the inventory system can be studied under the weighted-norm approach, widely studied by several authors in Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion DOI: http://dx.doi.org/10.5772/intechopen.88392

the field of Markov decision processes (see, e.g., [11] and references therein) and in adaptive control (see, e.g. [9, 12–14]). That is, we prove the existence of a weighted function W which imposes a growth condition on the cost functions. Then, applying the dynamic programming algorithm, the density estimation method is adapted to such a condition to define an estimation and control procedure for the construction of optimal policies.

The chapter is organized as follows. In Section 2 we describe the inventory model and define the corresponding optimal control problem. In Section 3 we introduce the dynamic programming approach under the true density. Next, in Section 4 we present the density estimation method which will be used to state, in Section 5, an estimation and control procedure for the construction of optimal policies. The proofs of the main results are given in Section 6. Finally, in Section 7, we present some concluding remarks.

#### 2. The inventory model

In this case, the inventory manager (IM) observes the inventory level It and then selects the order quantity qt as a function of It: The order quantity process causes costs in the operation of the inventory system. For instance, if the quantity ordered is relatively small, then the items are very likely to be sold out, but there will be unmet demand. In this case the holding cost is reduced, but there is a significant cost due to shortage. Otherwise, if the size of the order is large, there is a risk of having surpluses with a high holding cost. These facts give rise to a stochastic optimization problem, which can be modeled as a Markov decision process (MDP). That is, the inventory system can be analyzed as a stochastic optimal control problem whose objective is to find the optimal ordering policy that minimizes a total

The analysis of the control problem associated to inventory systems has been done under several scenarios: discrete-time and continuous-time systems with finite or infinite capacity, inventory systems considering bounded and unbounded onestage cost, as well as partially observable models, among others (see, e.g., [1–5, 7]). Moreover, such scenarios have their own methods and techniques to solve the corresponding control problem. However, in most cases, it has been assumed that all the components that define the behavior of the inventory system are known to the IM, which, in certain situations, can be too strong and unrealistic. Hence it is necessary to implement schemes that allow learning or collecting information about the unknown components during the evolution of the system to choose a decision

In this chapter we study a class of inventory control systems where the density of the demand is unknown by the IM. In this sense, our objective is to propose a procedure that combines density estimation methods and control schemes to construct optimal policies under a total expected discounted cost criterion. The estima-

In this case, unlike the standard inventory system, before choosing the

of the demand is estimated by the projection of an arbitrary estimator on an appropriate set, and its convergence is stated with respect to a norm which depends

estimate ρt, and, possibly, combines this with the history of the system

, the IM implements a density estimation method to get an

to select qt <sup>¼</sup> qt ht; <sup>ρ</sup><sup>t</sup> ð Þ: Specifically, the density

In general terms, our approach consists in to show that the inventory system can be studied under the weighted-norm approach, widely studied by several authors in

(Estimation and control procedure)

tion and control procedure is illustrated in the following figure:

expected cost.

Statistical Methodologies

order quantity qt

72

ht <sup>¼</sup> <sup>I</sup>0; <sup>q</sup>0; <sup>D</sup>0; …;It�<sup>1</sup>; qt�<sup>1</sup>; Dt�<sup>1</sup>;It

on the components of the inventory control model.

with as much information as possible.

We consider an inventory system evolving according to the difference equation

$$I\_{t+1} = \left(I\_t + q\_t - D\_t\right)^+, \ t = 0, 1, \ldots \tag{1}$$

where It and qt are the inventory level and the order quantity at the beginning of period t, taking values in I ≔ ½ Þ 0; ∞ and Q ≔ ½ Þ 0; ∞ , respectively, and Dt represents the random demand during period t: We assume that f g Dt is an observable sequence of nonnegative independent and identically distributed (i.i.d.) random variables with a common density ρ∈L1½ Þ 0; ∞ which is unknown by the inventory manager. In addition, we assume finite expectation

$$\overline{D} := E(D\_t) < \infty. \tag{2}$$

Moreover, there exists a measurable function ρ∈L1½ Þ 0; ∞ such that

$$
\rho(\mathfrak{s}) \le \overline{\rho}(\mathfrak{s}) \tag{3}
$$

almost everywhere with respect to the Lebesgue measure. In addition

$$\int\_0^\infty s^2 \overline{\rho}(s) ds < \infty. \tag{4}$$

For example, if ρð Þs ≔ Kmin 1; 1=s <sup>1</sup>þ<sup>r</sup> � �, s <sup>∈</sup>½0, <sup>∞</sup>Þ, for some positive constants <sup>K</sup> and r, then there are plenty of densities that satisfy (3)–(4).

The one-stage cost function is defined as

$$\bar{c}(I, q, D) = cq + h(I + q - D)^{+} + b(D - I - q)^{+}, \qquad (I, q) \in \mathbb{T} \times \mathbb{Q}, \tag{5}$$

where h, c, and b are, respectively, the holding cost per unit, the ordering cost per unit, and the shortage cost per unit, satisfying b> c:

The order quantities applied by the IM are selected according to rules known as ordering control policies defined as follows. Let H<sup>t</sup> be the space of histories of the inventory system up to time t: That is, a typical element of H<sup>t</sup> is written as

$$h\_t = \left(I\_0, q\_0, D\_0, \dots, I\_{t-1}, q\_{t-1}, D\_{t-1}, I\_t\right).$$

An ordering policy (or simply a policy) γ ¼ γf g<sup>t</sup> is a sequence of measurable functions γ<sup>t</sup> : H<sup>t</sup> ! Q , such that γtð Þ¼ ht qt , t≥ 0. We denote by Γ the set of all policies. A feedback policy or Markov policy is a sequence γ ¼ gt � � of functions gt : I ! Q , such that gt ð Þ¼ It qt : A feedback policy γ ¼ gt � � is stationary if there exists a function g : I ! Q such that gt ¼ g for all t≥ 0:

When using a policy γ ∈Γ, given the initial inventory level I<sup>0</sup> ¼ I, we define the total expected discounted cost as

$$V(\boldsymbol{\gamma}, I) \coloneqq \mathbb{E}\left[\sum\_{t=0}^{\infty} \alpha^t \tilde{c}\left(I\_t, q\_t, D\_t\right)\right],\tag{6}$$

process is repeated. Furthermore, the costs are accumulated according to the

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

The study of the inventory control problem will be done by means of the wellknown dynamic programming (DP) approach, which we now introduce in terms of the unknown density ρ: In order to establish precisely the ideas, we first present

The set of order quantities in which we can find the optimal ordering policy

<sup>Q</sup> <sup>∗</sup> <sup>¼</sup> bD

Thus, we can restrict the range of q so that q∈ Q<sup>∗</sup> : Specifically we have the

out loss of generality, we suppose that for a <sup>q</sup><sup>&</sup>gt; <sup>Q</sup> <sup>∗</sup> we have <sup>q</sup><sup>0</sup> <sup>¼</sup> <sup>q</sup>: Note that

policy such that <sup>γ</sup>kð Þ¼ hk qk <sup>&</sup>gt; <sup>Q</sup> <sup>∗</sup> , for at least a k <sup>¼</sup> <sup>0</sup>, <sup>1</sup>, :… Then

<sup>t</sup>þ<sup>1</sup> ¼ I 0 <sup>t</sup> � Dt � �<sup>þ</sup>

<sup>t</sup> ≤ It, for all t≥0: Then observing that cq> bD=ð Þ 1 � α ,

h It � Dt � �<sup>þ</sup> <sup>þ</sup> <sup>b</sup>

<sup>α</sup><sup>t</sup> <sup>h</sup> It <sup>þ</sup> qt � Dt

<sup>α</sup><sup>t</sup> <sup>h</sup> It <sup>þ</sup> qt � Dt

Remark 3.2 Observe that for Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> we have

<sup>α</sup><sup>t</sup> cqt <sup>þ</sup> <sup>h</sup> It <sup>þ</sup> qt � Dt

Lemma 3.1 Let <sup>γ</sup><sup>0</sup> <sup>∈</sup><sup>Γ</sup> be the policy defined as <sup>γ</sup><sup>0</sup> <sup>¼</sup> f g <sup>0</sup>; <sup>0</sup>; … , and let <sup>γ</sup> <sup>¼</sup> <sup>γ</sup>f g<sup>t</sup> be a

<sup>V</sup> <sup>γ</sup><sup>0</sup>;<sup>I</sup> � � <sup>≤</sup> <sup>V</sup>ð Þ <sup>γ</sup>;<sup>I</sup> , I <sup>∈</sup>I: (11)

, and Itþ<sup>1</sup> ¼ It þ qt � Dt

<sup>t</sup> � Dt

� �<sup>þ</sup> <sup>þ</sup>

� �<sup>þ</sup> <sup>þ</sup> cq

� �<sup>þ</sup>

� �<sup>þ</sup> <sup>þ</sup> b Dt � <sup>I</sup>

bD 1 � α

� �<sup>þ</sup> � � " #

, t≥0: With-

0 t

<sup>t</sup> , t ¼ 0, 1, …, be the inventory levels generated by the application

α<sup>t</sup> h I<sup>0</sup>

� � be the sequence of inventory levels and order quantities generated

t¼0

� �<sup>þ</sup> <sup>þ</sup> b Dt � It � qt

� �<sup>þ</sup> <sup>þ</sup> b Dt � It � qt

� � " #

� �<sup>þ</sup> � � " #

c Ið Þ¼ ; q cq þ L Ið Þ þ q ,

" # � �

� �<sup>þ</sup> <sup>þ</sup> b Dt � It � qt

X∞ t¼0 αt E Dð Þ<sup>t</sup>

<sup>c</sup>ð Þ <sup>1</sup> � <sup>α</sup> :

3. Dynamic programming equation under the true density ρ

discounted cost criterion (9).

DOI: http://dx.doi.org/10.5772/intechopen.88392

some preliminary and useful facts.

That is, γ<sup>0</sup> is a better solution than γ:

<sup>0</sup> <sup>¼</sup> <sup>I</sup><sup>0</sup> <sup>¼</sup> I, I<sup>0</sup>

t¼0 αt c I<sup>0</sup> <sup>t</sup> ; 0; Dt � � " # <sup>¼</sup> <sup>E</sup> <sup>X</sup><sup>∞</sup>

0

0

<sup>V</sup> <sup>γ</sup><sup>0</sup>;<sup>I</sup> � � <sup>¼</sup> <sup>E</sup> <sup>X</sup><sup>∞</sup>

≤ E X∞ t¼0 αt

<sup>≤</sup> <sup>E</sup> <sup>X</sup><sup>∞</sup>

<sup>≤</sup> <sup>E</sup> <sup>X</sup><sup>∞</sup>

<sup>≤</sup> <sup>E</sup> <sup>X</sup><sup>∞</sup>

t¼0

t¼0

t¼0

¼ Vð Þ γ;I , I ∈I:∎

should be <sup>Q</sup><sup>∗</sup> <sup>¼</sup> <sup>0</sup>; <sup>Q</sup> <sup>∗</sup> ½ �<sup>⊂</sup> <sup>Q</sup> ,

where

following result.

Proof. Let I

of γ<sup>0</sup>, and It; qt

by γ, where I

I 0

75

where α ∈ð Þ 0; 1 is the so-called discount factor. The inventory control problem is then to find an optimal feedback policy <sup>γ</sup> <sup>∗</sup> such that <sup>V</sup> <sup>γ</sup> <sup>∗</sup> ð Þ¼ ;<sup>I</sup> <sup>V</sup> <sup>∗</sup> ð Þ<sup>I</sup> for all <sup>I</sup> <sup>∈</sup> <sup>I</sup>, where

$$V^\*\left(I\right) \coloneqq \inf\_{\gamma \in \Gamma} V(\gamma, I), \quad I \in \mathbb{I}, \tag{7}$$

is the optimal discounted cost, which we call value function. We define the mean one-stage cost as

$$\begin{split} \mathcal{L}(I, q) &= cq + h \mathcal{E} (I + q - D)^{+} + b \mathcal{E} (D - I - q)^{+} \\ &= cq + h \int\_{0}^{I+q} (I + q - s)^{+} \rho(s) ds + b \int\_{I+q}^{\infty} (s - I - q)^{+} \rho(s) ds, \ (I, q) \in \mathbb{I} \times \mathbb{Q}. \end{split} \tag{8}$$

Then, by using properties of conditional expectation, we can rewrite the total expected discounted cost (6) as

$$V(\boldsymbol{\gamma}, I) = E\_I^{\mathbb{Y}} \left[ \sum\_{t=0}^{\infty} a^t c(I\_t, q\_t) \right], \tag{9}$$

where E<sup>γ</sup> <sup>I</sup> denotes the expectation operator with respect to the probability <sup>P</sup><sup>γ</sup> I induced by the policy γ, given the initial inventory level I<sup>0</sup> ¼ I (see, e.g., [8, 10]).

The sequence of events in our model is as follows. Since the density ρ is unknown, the one-stage cost (8) is also unknown by the IM. Then if at stage t the inventory level is It ¼ I ∈I, the IM implements a suitable density estimation method to get an estimate ρ<sup>t</sup> of ρ: Next, he/she combines this with the history of the system to select an order quantity qt ¼ q ¼ γ ρt <sup>t</sup> ð Þ ht ∈ Q: Then a cost c Ið Þ ; q is incurred, and the system moves to a new inventory level Itþ<sup>1</sup> ¼ I 0 ∈ I according to the transition law

$$\begin{split} Q(B|I,q) &:= \text{Prob}\left[I\_{t+1} \in B|I\_t = I, q\_t = q\right] \\ &= \int\_0^\infty \mathbf{1}\_B\left(\left(I + q - s\right)^+\right) \rho(s) ds \end{split} \tag{10}$$

where 1Bð Þ: denotes the indicator function of the set B ∈Bð ÞI , and Bð ÞI is the Borel σ�algebra on I. Once the transition to the inventory level I 0 occurs, the

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion DOI: http://dx.doi.org/10.5772/intechopen.88392

process is repeated. Furthermore, the costs are accumulated according to the discounted cost criterion (9).

#### 3. Dynamic programming equation under the true density ρ

The study of the inventory control problem will be done by means of the wellknown dynamic programming (DP) approach, which we now introduce in terms of the unknown density ρ: In order to establish precisely the ideas, we first present some preliminary and useful facts.

The set of order quantities in which we can find the optimal ordering policy should be <sup>Q</sup><sup>∗</sup> <sup>¼</sup> <sup>0</sup>; <sup>Q</sup> <sup>∗</sup> ½ �<sup>⊂</sup> <sup>Q</sup> ,

where

ht <sup>¼</sup> <sup>I</sup>0; <sup>q</sup>0; <sup>D</sup>0; …;It�1; qt�1; Dt�1;It � �:

An ordering policy (or simply a policy) γ ¼ γf g<sup>t</sup> is a sequence of measurable

: A feedback policy γ ¼ gt

� � " #

; Dt

When using a policy γ ∈Γ, given the initial inventory level I<sup>0</sup> ¼ I, we define the

where α ∈ð Þ 0; 1 is the so-called discount factor. The inventory control problem is then to find an optimal feedback policy <sup>γ</sup> <sup>∗</sup> such that <sup>V</sup> <sup>γ</sup> <sup>∗</sup> ð Þ¼ ;<sup>I</sup> <sup>V</sup> <sup>∗</sup> ð Þ<sup>I</sup> for all <sup>I</sup> <sup>∈</sup> <sup>I</sup>,

t¼0 αt ~c It; qt

γ ∈Γ

∞ð

Iþq

Then, by using properties of conditional expectation, we can rewrite the total

<sup>I</sup> denotes the expectation operator with respect to the probability <sup>P</sup><sup>γ</sup>

ρt

I X∞ t¼0 αt c It; qt � � " #

induced by the policy γ, given the initial inventory level I<sup>0</sup> ¼ I (see, e.g., [8, 10]). The sequence of events in our model is as follows. Since the density ρ is unknown, the one-stage cost (8) is also unknown by the IM. Then if at stage t the inventory level is It ¼ I ∈I, the IM implements a suitable density estimation method to get an estimate ρ<sup>t</sup> of ρ: Next, he/she combines this with the history of the

Q Bð Þ <sup>j</sup>I; <sup>q</sup> <sup>≔</sup> Prob Itþ<sup>1</sup> <sup>∈</sup>BjIt <sup>¼</sup> <sup>I</sup>; qt <sup>¼</sup> <sup>q</sup> � �

where 1Bð Þ: denotes the indicator function of the set B ∈Bð ÞI , and Bð ÞI is the

policies. A feedback policy or Markov policy is a sequence γ ¼ gt

<sup>V</sup>ð Þ <sup>γ</sup>;<sup>I</sup> <sup>≔</sup> <sup>E</sup> <sup>X</sup><sup>∞</sup>

<sup>V</sup> <sup>∗</sup> ð Þ<sup>I</sup> <sup>≔</sup> inf

is the optimal discounted cost, which we call value function.

ð Þ¼ It qt

exists a function g : I ! Q such that gt ¼ g for all t≥ 0:

, t≥ 0. We denote by Γ the set of all

Vð Þ γ;I , I ∈I, (7)

ð Þ <sup>s</sup> � <sup>I</sup> � <sup>q</sup> <sup>þ</sup>ρð Þ<sup>s</sup> ds, Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup>: (8)

<sup>t</sup> ð Þ ht ∈ Q: Then a cost c Ið Þ ; q is

<sup>1</sup><sup>B</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds (10)

0

0

occurs, the

∈ I according to

, (9)

I

� � of functions

� � is stationary if there

, (6)

functions γ<sup>t</sup> : H<sup>t</sup> ! Q , such that γtð Þ¼ ht qt

We define the mean one-stage cost as

system to select an order quantity qt ¼ q ¼ γ

¼ cq þ h

where E<sup>γ</sup>

the transition law

74

ð Iþq

0

expected discounted cost (6) as

c Ið Þ¼ ; q cq þ hE Ið Þ þ q � D <sup>þ</sup> þ bE Dð Þ � I � q <sup>þ</sup>

ð Þ I þ q � s <sup>þ</sup>ρð Þs ds þ b

<sup>V</sup>ð Þ¼ <sup>γ</sup>;<sup>I</sup> <sup>E</sup><sup>γ</sup>

incurred, and the system moves to a new inventory level Itþ<sup>1</sup> ¼ I

¼ ð<sup>∞</sup> 0

Borel σ�algebra on I. Once the transition to the inventory level I

gt : I ! Q , such that gt

Statistical Methodologies

where

total expected discounted cost as

$$Q^\* = \frac{b\overline{D}}{c(1-a)}.$$

Thus, we can restrict the range of q so that q∈ Q<sup>∗</sup> : Specifically we have the following result.

Lemma 3.1 Let <sup>γ</sup><sup>0</sup> <sup>∈</sup><sup>Γ</sup> be the policy defined as <sup>γ</sup><sup>0</sup> <sup>¼</sup> f g <sup>0</sup>; <sup>0</sup>; … , and let <sup>γ</sup> <sup>¼</sup> <sup>γ</sup>f g<sup>t</sup> be a policy such that <sup>γ</sup>kð Þ¼ hk qk <sup>&</sup>gt; <sup>Q</sup> <sup>∗</sup> , for at least a k <sup>¼</sup> <sup>0</sup>, <sup>1</sup>, :… Then

$$V(\boldsymbol{\chi}^0, I) \le V(\overline{\boldsymbol{\chi}}, I), \quad I \in \mathbb{I}. \tag{11}$$

That is, γ<sup>0</sup> is a better solution than γ:

Proof. Let I 0 <sup>t</sup> , t ¼ 0, 1, …, be the inventory levels generated by the application of γ<sup>0</sup>, and It; qt � � be the sequence of inventory levels and order quantities generated by γ, where I 0 <sup>0</sup> <sup>¼</sup> <sup>I</sup><sup>0</sup> <sup>¼</sup> I, I<sup>0</sup> <sup>t</sup>þ<sup>1</sup> ¼ I 0 <sup>t</sup> � Dt � �<sup>þ</sup> , and Itþ<sup>1</sup> ¼ It þ qt � Dt � �<sup>þ</sup> , t≥0: Without loss of generality, we suppose that for a <sup>q</sup><sup>&</sup>gt; <sup>Q</sup> <sup>∗</sup> we have <sup>q</sup><sup>0</sup> <sup>¼</sup> <sup>q</sup>: Note that I 0 <sup>t</sup> ≤ It, for all t≥0: Then observing that cq> bD=ð Þ 1 � α ,

$$\begin{split} V(\boldsymbol{\eta}^{0}, I) &= E\left[\sum\_{t=0}^{\infty} d\boldsymbol{c}\big(I\_{t}^{0}, \mathbf{0}, D\_{t}\big)\right] = E\left[\sum\_{t=0}^{\infty} d\Big(h\big(I\_{t}^{0} - D\_{t}\big)^{+} + b\left(D\_{t} - I\_{t}^{0}\right)^{+}\right)\right] \\ &\leq E\sum\_{t=0}^{\infty} d^{l}h\left(\overline{I}\_{t} - D\_{t}\right)^{+} + b\sum\_{t=0}^{\infty} d^{l}E(D\_{t}) \\ &\leq E\left[\sum\_{t=0}^{\infty} d^{l}\Big(h\big(\overline{I}\_{t} + \overline{q}\_{t} - D\_{t}\big)^{+} + b\left(D\_{t} - \overline{I}\_{t} - \overline{q}\_{t}\big)^{+} + \frac{b\overline{D}}{1 - a}\Big)\right] \\ &\leq E\left[\sum\_{t=0}^{\infty} d^{l}\Big(h\big(\overline{I}\_{t} + \overline{q}\_{t} - D\_{t}\big)^{+} + b\left(D\_{t} - \overline{I}\_{t} - \overline{q}\_{t}\big)^{+} + c\overline{q}\Big)\right] \\ &\leq E\left[\sum\_{t=0}^{\infty} d^{l}\Big(c\overline{q}\_{t} + h\big(\overline{I}\_{t} + \overline{q}\_{t} - D\_{t}\big)^{+} + b\left(D\_{t} - \overline{I}\_{t} - \overline{q}\_{t}\big)^{+}\right)\right] \\ &= V(\overline{\mathbf{r}}, I), \quad I \in \mathbb{I}. \end{split}$$

Remark 3.2 Observe that for Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> we have

$$c(I, q) = cq + L(I + q),$$

where, by writing y ¼ I þ q,

$$L(\boldsymbol{\jmath}) \coloneqq \boldsymbol{h}E(\boldsymbol{\jmath} - \boldsymbol{D})^{+} + \boldsymbol{b}E(\boldsymbol{D} - \boldsymbol{\jmath})^{+}.$$

In addition, observe that for any fixed s∈½0, ∞Þ, the functions y ! ð Þ y � s <sup>þ</sup> and y ! ð Þ s � y <sup>þ</sup> are convex, which implies that L yð Þ is convex. Moreover

$$\lim\_{\mathcal{y}\to\infty} L(\mathcal{y}) = \infty.$$

The following lemma provides a growth property of the one-stage cost function (8).

Lemma 3.3 There exist a number β and a function W : I ! ½1, ∞Þ such that 0<αβ <1,

$$\sup\_{(I,q,\varepsilon)\in\mathbb{I}\times\mathbb{Q}^\*\times[0,\infty)} \frac{W\left((I+q-\varepsilon)^+\right)}{W(I)} := \wp < \infty,\tag{12}$$

and for all Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup>

$$c(I, q) \le W(I). \tag{13}$$

<sup>V</sup>ð Þ <sup>n</sup> ðÞ¼ <sup>I</sup> min

DOI: http://dx.doi.org/10.5772/intechopen.88392

have the following result.

(c) V <sup>∗</sup> is convex.

<sup>V</sup> <sup>∗</sup> ðÞ¼ <sup>I</sup> min

4. Density estimation

ρ, such that (see (2) and (3)):

Ð <sup>∞</sup>

D.1. ρ<sup>t</sup> ∈L1½ Þ 0; ∞ is a density.

<sup>0</sup> sρtð Þs ds ≤ D:

density is ρ:

D.3. Ð <sup>∞</sup>

D.4. E

77

(b) As n ! <sup>∞</sup>, Vð Þ <sup>n</sup> � <sup>V</sup> <sup>∗</sup> �

¼ min

BW: Moreover

<sup>q</sup><sup>∈</sup> <sup>Q</sup><sup>∗</sup> c Ið Þþ ; <sup>q</sup> <sup>α</sup>

W Ið Þ

� <sup>W</sup> ! 0:

∞ð

0

(d) V <sup>∗</sup> satisfies the dynamic programming equation:

8 < :

<sup>V</sup>ð Þ <sup>n</sup> ð Þ<sup>I</sup> <sup>≤</sup>

� �

<sup>I</sup> <sup>≤</sup> <sup>y</sup> <sup>≤</sup> <sup>Q</sup> <sup>∗</sup> <sup>þ</sup><sup>I</sup> cy <sup>þ</sup> L yð Þþ <sup>α</sup>

8 < :

<sup>V</sup> <sup>∗</sup> ðÞ¼ <sup>I</sup> c I; <sup>g</sup> <sup>∗</sup> ð Þþ ð Þ<sup>I</sup> <sup>α</sup>

Moreover, <sup>γ</sup> <sup>∗</sup> <sup>¼</sup> <sup>g</sup> <sup>∗</sup> f g is an optimal control policy.

<sup>q</sup><sup>∈</sup> <sup>Q</sup><sup>∗</sup> c Ið Þþ ; <sup>q</sup> <sup>α</sup>

8 < : ∞ð

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

<sup>V</sup>ð Þ <sup>n</sup>�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds

W Ið Þ

9 = ;

> 9 =

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �μð Þ<sup>s</sup> ds (19)

; � cI, I <sup>∈</sup> <sup>I</sup>:

<sup>V</sup> <sup>∗</sup> ð Þ <sup>y</sup> � <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds

<sup>V</sup> <sup>∗</sup> <sup>I</sup> <sup>þ</sup> <sup>g</sup> <sup>∗</sup> ð Þ ð Þ� <sup>I</sup> <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds, I <sup>∈</sup>I:

9 = ;

<sup>1</sup> � αβ , I <sup>∈</sup>I: (17)

(16)

(18)

0

Moreover, from [11, Theorem 8.3.6], by making the appropriate changes, we

<sup>1</sup> � αβ , V <sup>∗</sup> ð Þ<sup>I</sup> <sup>≤</sup>

Theorem 3.4 (Dynamic programming) (a) The functions Vð Þ <sup>n</sup> and V <sup>∗</sup> belong to

<sup>V</sup> <sup>∗</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds

∞ð

0

∞ð

0

(e) There exists a function g <sup>∗</sup> : <sup>I</sup> ! <sup>Q</sup> such that g <sup>∗</sup> ð Þ<sup>I</sup> <sup>∈</sup> <sup>Q</sup><sup>∗</sup> and, for each I <sup>∈</sup>I,

As the density ρ is unknown, the results in Theorem 3.4 are not applicable, and therefore they are not accessible to the IM. In this section we introduce a suitable density estimation method with which we can obtain an estimated DP-equation. This will allow us to define a scheme for the construction of optimal policies. To this end, let D0, D1, …, Dt, … be independent realizations of the demand whose

Theorem 4.1 There exists an estimator ρtð Þs ≔ ρtð Þ s; D0; D1; …; Dt�<sup>1</sup> , s∈0, ∞Þ, of

D.2. ρ<sup>t</sup> ≤ ρð Þ� a.e. with respect to the Lebesgue measure.

<sup>0</sup> j j ρ<sup>t</sup> � ρð Þs ds ! 0, as t ! ∞:

kμk ≔ sup

ð Þ <sup>I</sup>;<sup>q</sup> <sup>∈</sup>I�Q<sup>∗</sup>

1 W Ið Þ ∞ð

0

D.5. Ek k ρ<sup>t</sup> � ρ ! 0, as t ! ∞, where

for measurable functions μ on 0½ Þ ; ∞ :

In addition, for any density <sup>μ</sup> on ½ Þ <sup>0</sup>; <sup>∞</sup> such that <sup>Ð</sup> <sup>∞</sup> <sup>0</sup> sμð Þs < ∞,

$$\int\_{0}^{\infty} W\left(\left(I+q-s\right)^{+}\right) \mu(s)ds \le \beta W(I), \quad (I,q) \in \mathbb{I} \times \mathbb{Q}^{\*}.\tag{14}$$

The proof of Lemma 3.3 is given in Section 6.

We denote by B<sup>W</sup> the normed linear space of all measurable functions u : I ! ℜ with finite weighted-norm (W�norm) ∥ � ∥<sup>W</sup> defined as

$$||u||\_{W} \coloneqq \sup\_{I \in \mathbb{I}} \frac{|u(I)|}{W(I)}.\tag{15}$$

Essentially, Lemma 3.3 proves that the inventory system (1) falls within of the weighted-norm approach used to study general Markov decision processes (see, e.g., [11]). Hence, we can formulate, on the space BW, important results as existence of solutions of the DP-equation, convergence of the value iteration algorithm, as well as existence of optimal policies, in the context of the inventory system (1). Indeed, let

$$V^{(n)}(\boldsymbol{\chi}, I) = E\_I^{\boldsymbol{\chi}} \left[ \sum\_{t=0}^{n-1} \alpha^t c(I\_t, q\_t) \right]$$

be the n-stage discounted cost under the policy γ ∈ Γ and the initial inventory level I ∈I, and

$$V^{(n)}(I) = \inf\_{\chi \in \Gamma} V^{(n)}(\chi, I); \quad V^{(0)}(I) = \mathbf{0}, \ I \in \mathbb{T}$$

the corresponding value function. Then, for all n ≥0 and I ∈I, (see, e.g., [6, 10, 11]),

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion DOI: http://dx.doi.org/10.5772/intechopen.88392

$$V^{(n)}(I) = \min\_{q \in \mathbb{Q}^\*} \left\{ c(I, q) + a \int\_0^\infty V^{(n-1)} \left( (I + q - s)^+ \right) \rho(s) ds \right\} \tag{16}$$

Moreover, from [11, Theorem 8.3.6], by making the appropriate changes, we have the following result.

Theorem 3.4 (Dynamic programming) (a) The functions Vð Þ <sup>n</sup> and V <sup>∗</sup> belong to BW: Moreover

$$V^{(n)}(I) \le \frac{W(I)}{1 - a\beta}, V^\*(I) \le \frac{W(I)}{1 - a\beta}, I \in \mathbb{I}.\tag{17}$$

(b) As n ! <sup>∞</sup>, Vð Þ <sup>n</sup> � <sup>V</sup> <sup>∗</sup> � � � � <sup>W</sup> ! 0: (c) V <sup>∗</sup> is convex.

where, by writing y ¼ I þ q,

Statistical Methodologies

and for all Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup>

∞ð

0

function (8).

0 <αβ <1,

Indeed, let

level I ∈I, and

[6, 10, 11]),

76

L y ð Þ ≔ hE y ð Þ � D <sup>þ</sup> þ bE Dð Þ � y <sup>þ</sup>:

In addition, observe that for any fixed s∈½0, ∞Þ, the functions y ! ð Þ y � s <sup>þ</sup> and

<sup>y</sup>!<sup>∞</sup> L yð Þ¼ <sup>∞</sup>:

y ! ð Þ s � y <sup>þ</sup> are convex, which implies that L yð Þ is convex. Moreover

sup ð Þ <sup>I</sup>;q;<sup>s</sup> <sup>∈</sup>I�Q<sup>∗</sup> �½<sup>0</sup>, <sup>∞</sup><sup>Þ</sup>

In addition, for any density <sup>μ</sup> on ½ Þ <sup>0</sup>; <sup>∞</sup> such that <sup>Ð</sup> <sup>∞</sup>

The proof of Lemma 3.3 is given in Section 6.

with finite weighted-norm (W�norm) ∥ � ∥<sup>W</sup> defined as

lim

The following lemma provides a growth property of the one-stage cost

Lemma 3.3 There exist a number β and a function W : I ! ½1, ∞Þ such that

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �

W Ið Þ <sup>≔</sup> <sup>φ</sup><sup>&</sup>lt; <sup>∞</sup>, (12)

W Ið Þ: (15)

c Ið Þ ; q ≤ W Ið Þ: (13)

<sup>0</sup> sμð Þs < ∞,

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �μð Þ<sup>s</sup> ds <sup>≤</sup> <sup>β</sup>W Ið Þ, Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> : (14)

We denote by B<sup>W</sup> the normed linear space of all measurable functions u : I ! ℜ

I ∈I

Essentially, Lemma 3.3 proves that the inventory system (1) falls within of the weighted-norm approach used to study general Markov decision processes (see, e.g., [11]). Hence, we can formulate, on the space BW, important results as existence of solutions of the DP-equation, convergence of the value iteration algorithm, as well as existence of optimal policies, in the context of the inventory system (1).

> I Xn�1 t¼0 αt c It; qt � � " #

the corresponding value function. Then, for all n ≥0 and I ∈I, (see, e.g.,

be the n-stage discounted cost under the policy γ ∈ Γ and the initial inventory

<sup>V</sup>ð Þ <sup>n</sup> ð Þ <sup>γ</sup>;<sup>I</sup> ; Vð Þ <sup>0</sup> ðÞ¼ <sup>I</sup> <sup>0</sup>, I <sup>∈</sup><sup>I</sup>

j j u Ið Þ

k ku <sup>W</sup> ≔ sup

<sup>V</sup>ð Þ <sup>n</sup> ð Þ¼ <sup>γ</sup>;<sup>I</sup> <sup>E</sup><sup>γ</sup>

γ ∈Γ

<sup>V</sup>ð Þ <sup>n</sup> ðÞ¼ <sup>I</sup> inf

(d) V <sup>∗</sup> satisfies the dynamic programming equation:

$$\begin{split} V^\*(I) &= \min\_{q \in \mathbb{Q}^\*} \left\{ c(I, q) + a \left\| V^\* \left( (I + q - s)^+ \right) \rho(s) ds \right\} \right. \\ &= \min\_{I \le \mathbf{y} \le \mathbf{Q}^\* + I} \left\{ c\mathbf{y} + L(\mathbf{y}) + a \left\| V^\* \left( (\mathbf{y} - s)^+ \right) \rho(s) ds \right\} - cI, I \in \mathbb{I}. \end{split} \tag{18}$$

(e) There exists a function g <sup>∗</sup> : <sup>I</sup> ! <sup>Q</sup> such that g <sup>∗</sup> ð Þ<sup>I</sup> <sup>∈</sup> <sup>Q</sup><sup>∗</sup> and, for each I <sup>∈</sup>I,

$$V^\*\left(I\right) = c\left(I, \operatorname{g}^\*\left(I\right)\right) + a \int\_0^\infty V^\*\left(\left(I + \operatorname{g}^\*\left(I\right) - s\right)^+\right) \rho(s)ds, I \in \mathbb{I}.$$

Moreover, <sup>γ</sup> <sup>∗</sup> <sup>¼</sup> <sup>g</sup> <sup>∗</sup> f g is an optimal control policy.

#### 4. Density estimation

As the density ρ is unknown, the results in Theorem 3.4 are not applicable, and therefore they are not accessible to the IM. In this section we introduce a suitable density estimation method with which we can obtain an estimated DP-equation. This will allow us to define a scheme for the construction of optimal policies. To this end, let D0, D1, …, Dt, … be independent realizations of the demand whose density is ρ:

Theorem 4.1 There exists an estimator ρtð Þs ≔ ρtð Þ s; D0; D1; …; Dt�<sup>1</sup> , s∈0, ∞Þ, of ρ, such that (see (2) and (3)):

D.1. ρ<sup>t</sup> ∈L1½ Þ 0; ∞ is a density. D.2. ρ<sup>t</sup> ≤ ρð Þ� a.e. with respect to the Lebesgue measure. D.3. Ð <sup>∞</sup> <sup>0</sup> sρtð Þs ds ≤ D: D.4. E Ð <sup>∞</sup> <sup>0</sup> j j ρ<sup>t</sup> � ρð Þs ds ! 0, as t ! ∞: D.5. Ek k ρ<sup>t</sup> � ρ ! 0, as t ! ∞, where

$$||\mu|| \coloneqq \sup\_{(I,q)\in\mathbb{Z}\times\mathbb{Q}^\*} \frac{1}{W(I)} \int\_0^\infty \mathcal{W}((I+q-s)^+) \mu(s)ds\tag{19}$$

for measurable functions μ on 0½ Þ ; ∞ :

It is worth noting that for any density μ on 0½ Þ ; ∞ satisfying (14), the norm kμk is finite. The remainder of the section is devoted to prove Theorem 4.1.

We define the set D ⊂ L1ð Þ ½ Þ 0, ∞ as:

$$\mathcal{D} \coloneqq \left\{ \mu : \mu \text{ is a density}, \quad \int\_0^\infty s \mu(s) ds \le \overline{D}, \quad \mu(s) \le \overline{\rho}(s) \text{ a.s.} \right\}.$$

Observe that ρ∈ D.

Lemma 4.2 The set D is closed and convex in L1ð Þ ½ Þ 0, ∞ : Proof. The convexity of D follows directly. To prove that D is closed, let μ<sup>t</sup> ∈ D

be a sequence in D such that μ<sup>t</sup> ! L1 μ∈L1ð Þ ½ Þ 0, ∞ : First, we prove

$$
\mu(s) \le \overline{\rho}(s) \quad a.e.\tag{20}
$$

for some constant M<sup>0</sup>

DOI: http://dx.doi.org/10.5772/intechopen.88392

which, in turn, implies that

This proves that D is closed.∎

from (26) observe that

(27) it is easy to see that

kρ<sup>t</sup> � ρk ¼ sup

79

E ∞ð

ð Þ <sup>I</sup>;<sup>q</sup> <sup>∈</sup> <sup>I</sup>�Q<sup>∗</sup>

0 @

0

1 W Ið Þ

which implies that, from (25),

E ∞ð

0

That is, ρ<sup>t</sup> satisfies Property D.4. In fact, since Ð

ρð Þ�s ρ<sup>t</sup> j j ð Þs ds

∞ð

0

Now, to obtain property D.5, observe that from (12)

1 A

q

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � <sup>ρ</sup>ðÞ�<sup>s</sup> <sup>ρ</sup><sup>t</sup> j j ð Þ<sup>s</sup> ds <sup>¼</sup> <sup>φ</sup>

Ek k ρ � ρ^<sup>t</sup> <sup>L</sup><sup>1</sup> ¼ E

< ∞: Letting t ! ∞ we obtain

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

∞ð

sμð Þs ds

ρðÞ�s ρ^<sup>t</sup> j j ð Þs ! 0 as t ! ∞: (25)

: (26)

, t ≥0,

ρðÞ�s ρ<sup>t</sup> j j ð Þs ds ≤ 2 a.s., from

! 0, as t ! ∞,for any q>0: (28)

∞ð

ρðÞ�s ρ<sup>t</sup> j j ð Þs ds: (29)

0

,

<sup>μ</sup> <sup>∈</sup> <sup>D</sup> k k <sup>μ</sup> � <sup>ρ</sup>^<sup>t</sup> <sup>L</sup><sup>1</sup>

ρð Þ�s ρ<sup>t</sup> j j ð Þs ds ≤ 2Ek k ρ^<sup>t</sup> � ρ <sup>L</sup><sup>1</sup> ! 0, as t ! ∞: (27)

∞ 0

0

sμð Þs ds ≤ D:

Let ρ^tð Þs ≔ ρ^tð Þ s; D0; D1; …; Dt , s∈½0, ∞Þ, be an arbitrary estimator of ρ such that

Lemma 4.2 ensures the existence of the estimator ρ<sup>t</sup> which is defined by the projection of ρ^<sup>t</sup> on the set of densities D: That is, the density ρ<sup>t</sup> ∈ D, expressed as

<sup>ρ</sup><sup>t</sup> <sup>≔</sup> arg min <sup>σ</sup> <sup>∈</sup> <sup>D</sup> k k <sup>σ</sup> � <sup>ρ</sup>^<sup>t</sup> <sup>L</sup><sup>1</sup>

Now observe that ρ<sup>t</sup> satisfies the properties D.1, D.2, and D.3. Hence, Theorem 4.1 will be proved if we show that ρ<sup>t</sup> satisfies D.4 and D.5. To this end, since ρ∈ D,

is the "best approximation" of the estimator ρ^<sup>t</sup> on the set D, that is,

k k ρ<sup>t</sup> � ρ <sup>L</sup><sup>1</sup> ≤ k k ρ<sup>t</sup> � ρ^<sup>t</sup> <sup>L</sup><sup>1</sup> þ k k ρ^<sup>t</sup> � ρ <sup>L</sup><sup>1</sup> ≤ 2k k ρ^<sup>t</sup> � ρ <sup>L</sup><sup>1</sup>

k k ρ<sup>t</sup> � ρ^<sup>t</sup> <sup>L</sup><sup>1</sup> ¼ inf

sμtð Þs ds !

∞ð

0

∞ð

0

∞ð

0

We assume that there is A ⊂ ½0, ∞Þ with m Að Þ> 0 such that μð Þs >ρð Þs , s∈ A, m being the Lebesgue measure on ℜ. Then, for some ε>0 and A<sup>0</sup> ⊂ A with m A<sup>0</sup> ð Þ>0,

$$
\mu(\mathfrak{s}) > \overline{\rho}(\mathfrak{s}) + \mathfrak{e}, \mathfrak{s} \in A'. \tag{21}
$$

Now, since μ<sup>t</sup> ∈ D, t≥0, there exists Bt ⊂ 0, ∞Þ with m Bð Þ¼ <sup>t</sup> 0, such that

$$
\mu\_t(s) \le \overline{\rho}(s), \ s \in [0, \infty) \\
\&\_{\mathbb{B}} B\_t, t \ge 0. \tag{22}
$$

Combining (21) and (22) we have

$$|\mu\_t(s) - \mu(s)| \ge \varepsilon, \ s \in A' \cap ([0, \infty) \backslash B\_t), \ t \ge 0.$$

Using the fact that m A<sup>0</sup> ∩ð Þ¼ ½ Þ 0, ∞ \Bt m A<sup>0</sup> ð ð Þ>0, we obtain that μ<sup>t</sup> does not converge to μ in measure, which is a contradiction to the convergence in L1: Therefore μð Þs ≤ ρð Þs a.e.

On the other hand, applying Holder's inequality and using the fact that ρ∈L1½ Þ 0; ∞ , from (20),

$$\begin{aligned} \left| 1 - \int\_0^\infty \mu(s) ds \right| &= \left| \int\_0^\infty \mu(s) ds - \int\_0^\infty \mu(s) ds \right| = \int\_0^\infty |\mu\_t(s) - \mu(s)|^\frac{1}{2} |\mu\_t(s) - \mu(s)|^\frac{1}{2} ds \\ &\le \left( \int\_0^\infty 2\overline{\rho}(s) \right)^{1/2} \left( \left| \int\_0^\infty |\mu\_t(s) - \mu(s)| \right| \right)^{1/2} \to 0 \quad \text{as } t \to \infty, \end{aligned} \tag{23}$$

which implies Ð <sup>∞</sup> <sup>0</sup> μð Þs ds ¼ 1: Now, as μ≥0 a:e:, we have that μ is a density. Similarly, from (4),

$$\begin{split} \int\_{0}^{\infty} |s|\_{t}(s) - \mu(s)| ds &= \int\_{0}^{\infty} |\mu\_{t}(s) - \mu(s)|^{\frac{1}{2}} |\mu\_{t}(s) - \mu(s)|^{\frac{1}{2}} ds \\ &\leq \left( \int\_{0}^{\infty} s^{2} 2\overline{\rho}(s) ds \right)^{1/2} \left( \int\_{0}^{\infty} |\mu\_{t}(s) - \mu(s)| ds \right)^{1/2} \\ &\leq 2^{\frac{1}{2}} \mathcal{M}' \left( \left[ |\mu\_{t}(s) - \mu(s)| ds \right]^{1/2} \right)^{1/2}, \end{split} \tag{24}$$

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion DOI: http://dx.doi.org/10.5772/intechopen.88392

for some constant M<sup>0</sup> < ∞: Letting t ! ∞ we obtain

$$\int\_0^\infty s\mu\_t(s)ds \to \int\_0^\infty s\mu(s)ds$$

which, in turn, implies that

It is worth noting that for any density μ on 0½ Þ ; ∞ satisfying (14), the norm kμk is

Proof. The convexity of D follows directly. To prove that D is closed, let μ<sup>t</sup> ∈ D

We assume that there is A ⊂ ½0, ∞Þ with m Að Þ> 0 such that μð Þs >ρð Þs , s∈ A, m

μð Þs >ρð Þþs ε, s∈ A<sup>0</sup>

converge to μ in measure, which is a contradiction to the convergence in L1: There-

On the other hand, applying Holder's inequality and using the fact that

� � � � � � ¼ ∞ð

0

μ<sup>t</sup> j j ðÞ�s μð Þs

1 A

<sup>1</sup>=<sup>2</sup> <sup>∞</sup><sup>ð</sup>

μ<sup>t</sup> j j ðÞ�s μð Þs ds

0 @

0

1 A

1=2 ,

Now, since μ<sup>t</sup> ∈ D, t≥0, there exists Bt ⊂ 0, ∞Þ with m Bð Þ¼ <sup>t</sup> 0, such that

μ∈L1ð Þ ½ Þ 0, ∞ : First, we prove

sμð Þs ds ≤ D; μð Þs ≤ ρð Þs a:s:

μð Þs ≤ ρð Þs a:e: (20)

μtð Þs ≤ ρð Þs , s∈½0, ∞Þ\Bt, t≥0: (22)

∩ð Þ ½ Þ 0, ∞ \Bt , t≥ 0:

∩ð Þ¼ ½ Þ 0, ∞ \Bt m A<sup>0</sup> ð ð Þ>0, we obtain that μ<sup>t</sup> does not

<sup>μ</sup><sup>t</sup> j j ðÞ�<sup>s</sup> <sup>μ</sup>ð Þ<sup>s</sup> <sup>1</sup>

1 A

<sup>0</sup> μð Þs ds ¼ 1: Now, as μ≥0 a:e:, we have that μ is a density.

<sup>2</sup> <sup>μ</sup><sup>t</sup> j j ðÞ�<sup>s</sup> <sup>μ</sup>ð Þ<sup>s</sup> <sup>1</sup>

<sup>2</sup>ds

μ<sup>t</sup> j j ð Þ�s μð Þs ds

1=2

9 = ;:

⊂ A with m A<sup>0</sup> ð Þ>0,

: (21)

<sup>2</sup> <sup>μ</sup><sup>t</sup> j j ðÞ�<sup>s</sup> <sup>μ</sup>ð Þ<sup>s</sup> <sup>1</sup>

! 0 as t ! ∞,

1 A

1=2

<sup>2</sup>ds

(23)

(24)

finite. The remainder of the section is devoted to prove Theorem 4.1.

Lemma 4.2 The set D is closed and convex in L1ð Þ ½ Þ 0, ∞ :

L1

being the Lebesgue measure on ℜ. Then, for some ε>0 and A<sup>0</sup>

μ<sup>t</sup> j j ð Þ�s μð Þs ≥ε, s ∈ A<sup>0</sup>

∞ð

μð Þs ds

<sup>1</sup>=<sup>2</sup> <sup>∞</sup><sup>ð</sup>

0 @

0

<sup>s</sup> <sup>μ</sup><sup>t</sup> j j ðÞ�<sup>s</sup> <sup>μ</sup>ð Þ<sup>s</sup> <sup>1</sup>

∞ð

0 @

0

0

1 A ∞ð

0

We define the set D ⊂ L1ð Þ ½ Þ 0, ∞ as:

8 < :

be a sequence in D such that μ<sup>t</sup> !

Combining (21) and (22) we have

Using the fact that m A<sup>0</sup>

fore μð Þs ≤ ρð Þs a.e.

1 � ∞ð

� � � � � �

0

which implies Ð <sup>∞</sup>

Similarly, from (4), ∞ð

0

78

ρ∈L1½ Þ 0; ∞ , from (20),

μð Þs ds

� � � � � � ¼ ∞ð

0

0 @ ∞ð

0

≤

s μ<sup>t</sup> j j ðÞ�s μð Þs ds ¼

� � � � � �

μtð Þs ds �

2ρð Þs

∞ð

0

∞ð

0 @

0 s 2 2ρð Þs ds

≤

≤ 2 1 <sup>2</sup>M<sup>0</sup>

Observe that ρ∈ D.

Statistical Methodologies

D ≔ μ : μ is a density;

$$\int\_0^\infty s\mu(s)ds \le \overline{D}.$$

This proves that D is closed.∎

Let ρ^tð Þs ≔ ρ^tð Þ s; D0; D1; …; Dt , s∈½0, ∞Þ, be an arbitrary estimator of ρ such that

$$E\|\rho - \hat{\rho}\_t\|\_{L\_1} = E\left\{|\rho(s) - \hat{\rho}\_t(s)| \to 0 \quad \text{as } t \to \infty. \tag{25}$$

Lemma 4.2 ensures the existence of the estimator ρ<sup>t</sup> which is defined by the projection of ρ^<sup>t</sup> on the set of densities D: That is, the density ρ<sup>t</sup> ∈ D, expressed as

$$\rho\_t \coloneqq \arg\min\_{\sigma \in \mathcal{D}} \|\sigma - \hat{\rho}\_t\|\_{L\_1},$$

is the "best approximation" of the estimator ρ^<sup>t</sup> on the set D, that is,

$$\|\|\rho\_t - \hat{\rho}\_t\|\|\_{L\_1} = \inf\_{\mu \in \mathcal{D}} \|\mu - \hat{\rho}\_t\|\_{L\_1}.\tag{26}$$

Now observe that ρ<sup>t</sup> satisfies the properties D.1, D.2, and D.3. Hence, Theorem 4.1 will be proved if we show that ρ<sup>t</sup> satisfies D.4 and D.5. To this end, since ρ∈ D, from (26) observe that

$$||\rho\_t - \rho||\_{L\_1} \le ||\rho\_t - \hat{\rho}\_t||\_{L\_1} + ||\hat{\rho}\_t - \rho||\_{L\_1} \le 2||\hat{\rho}\_t - \rho||\_{L\_1}, \ t \ge 0,$$

which implies that, from (25),

$$E\int\_{0}^{\infty} |\rho(s) - \rho\_t(s)| ds \le 2E||\hat{\rho}\_t - \rho||\_{L\_1} \to 0, \text{ as } t \to \infty. \tag{27}$$

That is, ρ<sup>t</sup> satisfies Property D.4. In fact, since Ð ∞ 0 ρðÞ�s ρ<sup>t</sup> j j ð Þs ds ≤ 2 a.s., from (27) it is easy to see that

$$E\left(\bigcap\_{0}^{\infty} |\rho(s) - \rho\_t(s)| ds\right)^q \to 0, \text{ as } \ t \to \infty \text{, for any } q > 0. \tag{28}$$

Now, to obtain property D.5, observe that from (12)

$$\|\|\rho\_t - \rho\|\| = \sup\_{(I,q)\in I\times\mathbb{Q}^\*} \frac{1}{\mathcal{W}(I)} \int\_0^\infty \mathcal{W}((I+q-s)^+) |\rho(s) - \rho\_t(s)| ds = \rho \int\_0^\infty |\rho(s) - \rho\_t(s)| ds.\tag{29}$$

Therefore, property D.4 yields

$$E\|\rho\_t - \rho\| \to \mathbf{0}, \text{ as } t \to \infty,\tag{30}$$

Therefore, Vt ∈BW:

(b) As t ! ∞, E sup

attains the minimum in (34).

point g∞ð Þ<sup>I</sup> <sup>∈</sup> <sup>Q</sup><sup>∗</sup> of the sequence gt

γ <sup>∗</sup> ≔ g<sup>∞</sup>

6. Proofs

6.1 Proof of Lemma 3.3

Note that, for each ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

density function <sup>μ</sup> on 0½ Þ ; <sup>∞</sup> and ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

∞ð

0

and for t ≥1 and any density function μ on 0½ Þ ; ∞

wtð ÞI ≔ sup q∈ Q<sup>∗</sup>

w1ðÞ¼ I sup

q∈ Q<sup>∗</sup>

Observe that, for each I ∈I,

81

∞ð

0

∞ð

0

<sup>≤</sup> <sup>1</sup> <sup>þ</sup> MG Ið Þþ MQ <sup>∗</sup> :

ð Þ <sup>I</sup>;<sup>q</sup> <sup>∈</sup>I�Q<sup>∗</sup>

(c) As t ! <sup>∞</sup>,EVt � <sup>V</sup> <sup>∗</sup> k k<sup>W</sup> ! <sup>0</sup>:

DOI: http://dx.doi.org/10.5772/intechopen.88392

q ∗ <sup>t</sup> ¼ gt

j j ctð Þ� I; q c Ið Þ ; q W Ið Þ " # ! <sup>0</sup>:

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

�

�

� � is an optimal base stock policy for the inventory problem.

c Ið Þ ; <sup>q</sup> <sup>≤</sup> cQ <sup>∗</sup> <sup>þ</sup> h I <sup>þ</sup> <sup>Q</sup> <sup>∗</sup> ð Þþ bD

where <sup>M</sup> <sup>≔</sup> max ð Þ <sup>c</sup> <sup>þ</sup> <sup>h</sup> <sup>Q</sup> <sup>∗</sup> <sup>þ</sup> bD; <sup>h</sup> � � and G IðÞ¼ <sup>I</sup> <sup>þ</sup> <sup>1</sup>: Moreover, for every

On the other hand, we define the sequence of functions f g wt , wt : I ! ℜ, as

(d) For each t≥0, there exists Kt ≥ 0 such that the selector gt : I ! Q defined as

Remark 5.2 From [10, Proposition D.7], for each I ∈ I, there is an accumulation

<sup>g</sup>∞ð Þ<sup>I</sup> <sup>≔</sup> <sup>K</sup><sup>∗</sup> � I if <sup>0</sup> <sup>≤</sup> <sup>I</sup> <sup>≤</sup> <sup>K</sup> <sup>∗</sup>

Theorem 5.3 Let g<sup>∞</sup> be the selector defined in (36). Then the stationary policy

0 if I >K<sup>∗</sup>

ð Þ<sup>I</sup> <sup>≔</sup> Kt � I if <sup>0</sup> <sup>≤</sup> <sup>I</sup> <sup>≤</sup> Kt 0 if I >Kt

ð Þ<sup>I</sup> � �. Hence, there exists a constant K <sup>∗</sup> such that

<sup>≤</sup> ð Þ <sup>c</sup> <sup>þ</sup> <sup>h</sup> <sup>Q</sup> <sup>∗</sup> <sup>þ</sup> hI <sup>þ</sup> bD <sup>≤</sup> MG Ið Þ, (37)

G Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �μð Þ<sup>s</sup> ds <sup>≤</sup> G Ið Þþ <sup>Q</sup> <sup>∗</sup> : (38)

wt�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �μð Þ<sup>s</sup> ds:

<sup>1</sup> <sup>þ</sup> MG Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � � � <sup>μ</sup>ð Þ<sup>s</sup> ds

w0ð ÞI ≔ 1 þ MG Ið Þ (39)

(36)

which proves the property D.5.

#### 5. Estimation and control

Having defined the estimator ρt, we will now introduce an estimate dynamic programming procedure with which we can construct optimal policies for the inventory systems.

Observe that for each t≥0, from (14),

$$\int\_{0}^{\infty} \mathcal{W}(\left(I + q - s\right)^{+}) \rho\_{t}(s) ds \le \beta \mathcal{W}(I), \ (I, q) \in \mathbb{I} \times \mathbb{Q}^{\*}. \tag{31}$$

Now, we define the estimate one-stage cost function:

$$\begin{split} c\_t(I, q) &= cq + h \int\_0^{I+q} (I + q - s)^+ \rho\_t(s) ds + h \int\_{I+q}^\infty (s - I - q)^+ \rho\_t(s) ds \\ &= cq + L\_t(I + q), \ (I, q) \in \mathbb{I} \times \mathbb{Q}^\*, \end{split} \tag{32}$$

where (see Remark 3.2) for y ¼ I þ q,

$$L\_t(\boldsymbol{y}) := \hbar \int\_0^\mathcal{Y} (\boldsymbol{y} - \boldsymbol{s})^+ \rho\_t(\boldsymbol{s}) d\boldsymbol{s} + \boldsymbol{b} \int\_\mathcal{Y}^\approx (\boldsymbol{s} - \boldsymbol{y})^+ \rho\_t(\boldsymbol{s}) d\boldsymbol{s}.$$

In addition, observe that for each t≥0, Ltð Þy is convex and

$$\lim\_{\mathfrak{y}\to\infty} L\_t(\mathfrak{y}) = \infty. \tag{33}$$

We define the sequence of functions f g Vt as V<sup>0</sup> � 0, and for t≥ 1

$$\begin{split} V\_t(I) &= \min\_{q \in \mathbb{Q}^\*} \left\{ c\_t(I, q) + a \left| V\_{t-1}((I + q - s)^+) \rho\_t(s) ds \right. \right\} \\ &= \min\_{I \le y \le Q^\* + I} \left\{ c y + L\_t(y) + a \left| V\_{t-1}((y - s)^+) \rho\_t(s) ds \right. \right\} - cI, \quad I \in \mathbb{T}. \end{split} \tag{34}$$

We can state our main results as follows: Theorem 5.1 (a) For t≥ 0 and I ∈I,

$$V\_t(I) \le \frac{W(I)}{1 - a\beta}.\tag{35}$$

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion DOI: http://dx.doi.org/10.5772/intechopen.88392

Therefore, Vt ∈BW:

Therefore, property D.4 yields

which proves the property D.5.

Observe that for each t≥0, from (14),

Now, we define the estimate one-stage cost function:

<sup>¼</sup> cq <sup>þ</sup> Ltð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> , Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

ð Þ I þ q � s <sup>þ</sup>ρtð Þs ds þ b

ð Þ y � s <sup>þ</sup>ρtð Þs ds þ b

ð Iþq

0

where (see Remark 3.2) for y ¼ I þ q,

Ltð Þy ≔ h

<sup>q</sup><sup>∈</sup> <sup>Q</sup><sup>∗</sup> ctð Þþ <sup>I</sup>; <sup>q</sup> <sup>α</sup>

<sup>I</sup> <sup>≤</sup> <sup>y</sup> <sup>≤</sup> <sup>Q</sup> <sup>∗</sup> <sup>þ</sup><sup>I</sup> cy <sup>þ</sup> Ltð Þþ <sup>y</sup> <sup>α</sup>

We can state our main results as follows: Theorem 5.1 (a) For t≥ 0 and I ∈I,

8 < :

8 < : ð y

0

∞ð

0

In addition, observe that for each t≥0, Ltð Þy is convex and

lim

We define the sequence of functions f g Vt as V<sup>0</sup> � 0, and for t≥ 1

∞ð

0

Vtð ÞI ≤

Vt�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρtð Þ<sup>s</sup> ds

∞ð

0

ctð Þ¼ I; q cq þ h

5. Estimation and control

inventory systems.

Statistical Methodologies

VtðÞ¼ I min

80

¼ min

Ekρ<sup>t</sup> � ρk ! 0, as t ! ∞, (30)

Having defined the estimator ρt, we will now introduce an estimate dynamic programming procedure with which we can construct optimal policies for the

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρtð Þ<sup>s</sup> ds <sup>≤</sup> <sup>β</sup>W Ið Þ, Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> : (31)

∞ð

ð Þ s � I � q <sup>þ</sup>ρtð Þs ds

ð Þ s � y <sup>þ</sup>ρtð Þs ds:

<sup>y</sup>!<sup>∞</sup> Ltð Þ¼ <sup>y</sup> <sup>∞</sup>: (33)

9 = ;

> 9 =

<sup>1</sup> � αβ : (35)

; � cI, I <sup>∈</sup>I:

(32)

(34)

Iþq

∞ð

y

Vt�<sup>1</sup> ð Þ <sup>y</sup> � <sup>s</sup> <sup>þ</sup> � �ρtð Þ<sup>s</sup> ds

W Ið Þ

$$\begin{aligned} &(b) \text{ As } t \to \infty, \ E\left[\sup\_{(I,q)\in\mathbb{Z}\times\mathbb{Q}^\*} \frac{|c\_t(I,q) - c(I,q)|}{W(I)}\right] \to 0. \\ &(c) \text{ As } t \to \infty, \ E||V\_t - V^\*||\_W \to 0. \\ &(d) \text{ y } \mathbb{Z} \to \mathbb{Z}, \ \lim\_{t \to \infty} 2^{-t} \qquad \lim\_{t \to \infty} 2^{-t} \quad \lim\_{t \to \infty} 2^{-t} \quad \lim\_{t \to \infty} 2^{-t} \quad \lim\_{t \to \infty} \end{aligned}$$

(d) For each t≥0, there exists Kt ≥ 0 such that the selector gt : I ! Q defined as

$$q\_t^\* := \mathbf{g}\_t(I) := \begin{cases} K\_t - I & \circ f \quad \mathbf{0} \le I \le K\_t \\ \mathbf{0} & \circ f \end{cases}$$

attains the minimum in (34).

Remark 5.2 From [10, Proposition D.7], for each I ∈ I, there is an accumulation point g∞ð Þ<sup>I</sup> <sup>∈</sup> <sup>Q</sup><sup>∗</sup> of the sequence gt ð Þ<sup>I</sup> � �. Hence, there exists a constant K <sup>∗</sup> such that

$$\mathbf{g}\_{\infty}(I) \coloneqq \begin{cases} K^\* - I & \text{if } \quad \mathbf{0} \le I \le K^\* \\ \mathbf{0} & \text{if } \qquad I > K^\* \end{cases} \tag{36}$$

Theorem 5.3 Let g<sup>∞</sup> be the selector defined in (36). Then the stationary policy γ <sup>∗</sup> ≔ g<sup>∞</sup> � � is an optimal base stock policy for the inventory problem.

#### 6. Proofs

#### 6.1 Proof of Lemma 3.3

Note that, for each ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

$$\begin{split} \mathcal{L}(I, \boldsymbol{q}) &\leq \mathcal{c} \mathcal{Q}^{\,\*} + h(\boldsymbol{I} + \mathcal{Q}^{\,\*}) + b\overline{D} \\ &\leq (\boldsymbol{c} + h)\mathcal{Q}^{\,\*} + h\boldsymbol{I} + b\overline{D} \leq \mathcal{M}\mathcal{G}(\boldsymbol{I}), \end{split} \tag{37}$$

where <sup>M</sup> <sup>≔</sup> max ð Þ <sup>c</sup> <sup>þ</sup> <sup>h</sup> <sup>Q</sup> <sup>∗</sup> <sup>þ</sup> bD; <sup>h</sup> � � and G IðÞ¼ <sup>I</sup> <sup>þ</sup> <sup>1</sup>: Moreover, for every density function <sup>μ</sup> on 0½ Þ ; <sup>∞</sup> and ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

$$\int\_{0}^{\infty} G\left(\left(I+q-s\right)^{+}\right) \mu(s)ds \le G(I) + Q^{\*}.\tag{38}$$

On the other hand, we define the sequence of functions f g wt , wt : I ! ℜ, as

$$w\_0(I) \coloneqq \mathbf{1} + \mathbf{M} \mathbf{G}(I) \tag{39}$$

and for t ≥1 and any density function μ on 0½ Þ ; ∞

$$w\_t(I) := \sup\_{q \in \mathbb{Q}^\*} \int\_0^\infty w\_{t-1}((I+q-s)^+) \mu(s)ds.$$

Observe that, for each I ∈I,

$$\begin{aligned} w\_1(I) &= \sup\_{q \in \mathbb{Q}^\*} \int\_0^\infty [\mathbf{1} + \mathbf{M} \mathbf{G}((I + q - s)^+)] \mu(s) ds \\ &\le \mathbf{1} + \mathbf{M} \mathbf{G}(I) + \mathbf{M} \mathbf{Q}^\* \, . \end{aligned}$$

Thus,

$$\begin{aligned} \omega\_2(I) &= \sup\_{q \in \mathbb{Q}^\*} \int\_0^\infty \left[ \mathbf{1} + \mathbf{M} \mathbf{G} \left( (I + q - s)^+ \right) + \mathbf{M} \mathbf{Q}^\* \right] \mu(s) ds \\ &\le \mathbf{1} + \mathbf{M} \mathbf{G}(I) + \mathbf{M} \mathbf{Q}^\* + \mathbf{M} \mathbf{Q}^\*, \quad I \in \mathbb{I}. \end{aligned}$$

In general, it is easy to see that for each I ∈ I,

$$w\_t(I) \le \mathcal{M}G(I) + \mathbf{1} + \sum\_{j=0}^{t-1} \mathcal{M}\mathbf{Q}^\* \ = \mathcal{M}G(I) + \mathbf{1} + \mathcal{M}\mathbf{Q}^\*t. \tag{40}$$

Let α<sup>0</sup> ∈ ð Þ α; 1 be arbitrary, and define

$$W(I) \coloneqq \sum\_{t=0}^{\infty} a\_0^t \omega\_t(I). \tag{41}$$

Therefore, defining β ≔ α�<sup>1</sup>

DOI: http://dx.doi.org/10.5772/intechopen.88392

∞ð

0

6.2 Proof of Theorem 5.1

(a) Since Ð <sup>∞</sup>

<sup>0</sup> , we have that 0< αβ <1, and

<sup>0</sup> sρtð Þs ds ≤ D, from (32) (see (37)) ctð Þ I; q ≤ MG Ið Þ for each <sup>t</sup>≥0, Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> : Hence, it is easy to see that ctð Þ <sup>I</sup>; <sup>q</sup> <sup>≤</sup> W Ið Þ for each ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> (see (45)). Then we have <sup>V</sup>1ð Þ<sup>I</sup> <sup>≤</sup> W Ið Þ, and from (31), and

<sup>1</sup> � αβ , t<sup>≥</sup> <sup>0</sup>, I <sup>∈</sup>I: (46)

W Ið Þ <sup>&</sup>lt; <sup>∞</sup>: (47)

1 A

b W Ið Þ

∞ð

0 @

0

∞ð

0

ρ<sup>t</sup> j j ðÞ�s ρð Þs ds

1=2

, (49)

s ρ<sup>t</sup> j j ðÞ�s ρð Þs ds

1 A

1=2 :

h I <sup>þ</sup> <sup>Q</sup> <sup>∗</sup> ð Þ <sup>≤</sup> MG Ið Þ: (48)

ρ<sup>t</sup> j j ð Þ�s ρð Þs ds

< ∞: Hence, combining (47)–(49), from the definition of

1 2M<sup>0</sup>

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �μð Þ<sup>s</sup> ds <sup>≤</sup> <sup>β</sup>W Ið Þ, Ið Þ ; <sup>q</sup> <sup>∈</sup> <sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

which, together with (43), (44), and (45), proves Lemma 3.3.∎

by applying induction arguments, we get

(b) Observe that from (39), for each I ∈I,

which implies that (see (43))

∞ð

0

ctð Þ I; q and c Ið Þ ; q , we have

for some constant M<sup>0</sup>

j j ctð Þ� I; q cðI; qÞ W Ið Þ <sup>≤</sup>

83

In addition, from (37),

Vtð ÞI ≤

MG Ið Þ

s ρ<sup>t</sup> j j ðÞ�s ρð Þs ds ≤ 2

h W Ið Þ

MG Ið Þ W Ið Þ

≤

∞ð

0

∞ð

0

W Ið Þ <sup>≤</sup> <sup>1</sup> � <sup>1</sup>

On the other hand, similarly as (24), from (4), it is easy to see that

1 <sup>2</sup>M<sup>0</sup> ∞ð

0 @

0

<sup>I</sup> <sup>þ</sup> <sup>Q</sup> <sup>∗</sup> ð Þ <sup>ρ</sup><sup>t</sup> j j ðÞ�<sup>s</sup> <sup>ρ</sup>ð Þ<sup>s</sup> ds <sup>þ</sup>

ρ<sup>t</sup> j j ðÞ�s ρð Þs ds þ b2

Finally, taking expectation, (28) and Property D.4 prove the result.

W Ið Þ

W Ið Þ≥ w0ðÞ¼ I 1 þ MG Ið Þ,

Then, from (40),

$$\begin{split} W(I) &\leq \sum\_{t=0}^{\infty} a\_0^t [MG(I) + 1 + MQ^\*t] \\ &= \sum\_{t=0}^{\infty} a\_0^t (MG(I) + 1) + MQ^\* \sum\_{t=0}^{\infty} t a\_0^t \leq \frac{MG(I) + 1}{1 - a\_0} + \frac{MQ^\*a\_0}{(1 - a\_0)^2}. \end{split} \tag{42}$$

Therefore, W Ið Þ< ∞ for each I ∈ I, and since w<sup>0</sup> >1, from (41),

$$W(I) > 1.\tag{43}$$

Furthermore, using (42) and the fact that Wð Þ� ≥ w0ð Þ� , a straightforward calculation shows that

$$\varphi \coloneqq \sup\_{(I,q,\boldsymbol{s})\in\mathbb{Z}\times\mathbb{Q}^\*\times\mathbb{Q},\ \boldsymbol{\infty})} \frac{W\left(\left(I+q-\boldsymbol{s}\right)^+\right)}{W(I)} < \infty. \tag{44}$$

Now, from (37) and (39), c Ið Þ ; <sup>q</sup> <sup>≤</sup> <sup>w</sup>0ð Þ<sup>I</sup> , which yields, for all ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

$$c(I, q) \le W(I). \tag{45}$$

In addition, for every density function <sup>μ</sup> on 0½ Þ ; <sup>∞</sup> and ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup> <sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

$$\begin{aligned} \int\_0^\infty W((I+q-s)^+) \mu(s)ds &= \int\_0^\infty \sum\_{t=0}^\infty a\_0^t w\_t ((I+q-s)^+) \mu(s)ds \\ &= \sum\_{t=0}^\infty a\_0^t \left[ w\_t ((I+q-s)^+) \mu(s)ds \\ &\le \sum\_{t=0}^\infty a\_0^t w\_{t+1}(I) = a\_0^{-1} \left[ \sum\_{t=0}^\infty a\_0^t w\_t(I) - w\_0(I) \right] \\ &= a\_0^{-1} [W(I) - w\_0(I)] \le a\_0^{-1} W(I). \end{aligned}$$

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion DOI: http://dx.doi.org/10.5772/intechopen.88392

Therefore, defining β ≔ α�<sup>1</sup> <sup>0</sup> , we have that 0< αβ <1, and

$$\int\_0^\infty W\left(\left(I+q-s\right)^+\right)\mu(s)ds \le \beta W(I), \ (I,q) \in \mathbb{I} \times \mathbb{Q}^\*,$$

which, together with (43), (44), and (45), proves Lemma 3.3.∎

#### 6.2 Proof of Theorem 5.1

Thus,

Statistical Methodologies

w2ðÞ¼ I sup

Let α<sup>0</sup> ∈ ð Þ α; 1 be arbitrary, and define

Then, from (40),

t¼0 αt

<sup>¼</sup> <sup>X</sup><sup>∞</sup> t¼0 αt

W Ið Þ <sup>≤</sup> <sup>X</sup><sup>∞</sup>

culation shows that

∞ð

0

82

q∈ Q<sup>∗</sup>

In general, it is easy to see that for each I ∈ I,

wtð Þ<sup>I</sup> <sup>≤</sup> MG Ið Þþ <sup>1</sup> <sup>þ</sup>Xt�<sup>1</sup>

<sup>0</sup> MG Ið Þþ <sup>1</sup> <sup>þ</sup> MQ <sup>∗</sup> ½ �<sup>t</sup>

φ ≔ sup

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �μð Þ<sup>s</sup> ds <sup>¼</sup>

ð Þ <sup>I</sup>;q;<sup>s</sup> <sup>∈</sup>I�Q<sup>∗</sup> �<sup>0</sup>, <sup>∞</sup><sup>Þ</sup>

<sup>0</sup>ð Þþ MG Ið Þþ <sup>1</sup> MQ <sup>∗</sup>X<sup>∞</sup>

∞ð

<sup>1</sup> <sup>þ</sup> MG Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � <sup>þ</sup> MQ <sup>∗</sup> � �μð Þ<sup>s</sup> ds

MQ <sup>∗</sup> <sup>¼</sup> MG Ið Þþ <sup>1</sup> <sup>þ</sup> MQ <sup>∗</sup> <sup>t</sup>: (40)

<sup>0</sup>wtð ÞI : (41)

<sup>þ</sup> MQ <sup>∗</sup> <sup>α</sup><sup>0</sup> ð Þ 1 � α<sup>0</sup>

W Ið Þ <sup>&</sup>lt; <sup>∞</sup>: (44)

2 :

(42)

MG Ið Þþ 1 1 � α<sup>0</sup>

W Ið Þ> 1: (43)

c Ið Þ ; q ≤ W Ið Þ: (45)

<sup>0</sup>wt ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �μð Þ<sup>s</sup> ds

wt ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �μð Þ<sup>s</sup> ds

0

X∞ t¼0 αt

<sup>0</sup> W Ið Þ:

<sup>0</sup>wtð Þ� I w0ð ÞI " #

<sup>≤</sup> <sup>1</sup> <sup>þ</sup> MG Ið Þþ MQ <sup>∗</sup> <sup>þ</sup> MQ <sup>∗</sup> , I <sup>∈</sup>I:

j¼0

W Ið Þ <sup>≔</sup> <sup>X</sup><sup>∞</sup>

t¼0 αt

t¼0 tαt <sup>0</sup> ≤

Furthermore, using (42) and the fact that Wð Þ� ≥ w0ð Þ� , a straightforward cal-

Now, from (37) and (39), c Ið Þ ; <sup>q</sup> <sup>≤</sup> <sup>w</sup>0ð Þ<sup>I</sup> , which yields, for all ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

In addition, for every density function <sup>μ</sup> on 0½ Þ ; <sup>∞</sup> and ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup> <sup>I</sup> � <sup>Q</sup><sup>∗</sup> ,

X∞ t¼0 αt

0

<sup>0</sup>wtþ<sup>1</sup>ðÞ¼ <sup>I</sup> <sup>α</sup>�<sup>1</sup>

<sup>0</sup> ½ � W Ið Þ� <sup>w</sup>0ð Þ<sup>I</sup> <sup>≤</sup> <sup>α</sup>�<sup>1</sup>

∞ð

0

<sup>¼</sup> <sup>X</sup><sup>∞</sup> t¼0 αt 0 ∞ð

<sup>≤</sup> <sup>X</sup><sup>∞</sup> t¼0 αt

<sup>¼</sup> <sup>α</sup>�<sup>1</sup>

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �

Therefore, W Ið Þ< ∞ for each I ∈ I, and since w<sup>0</sup> >1, from (41),

0

(a) Since Ð <sup>∞</sup> <sup>0</sup> sρtð Þs ds ≤ D, from (32) (see (37)) ctð Þ I; q ≤ MG Ið Þ for each <sup>t</sup>≥0, Ið Þ ; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> : Hence, it is easy to see that ctð Þ <sup>I</sup>; <sup>q</sup> <sup>≤</sup> W Ið Þ for each ð Þ <sup>I</sup>; <sup>q</sup> <sup>∈</sup><sup>I</sup> � <sup>Q</sup><sup>∗</sup> (see (45)). Then we have <sup>V</sup>1ð Þ<sup>I</sup> <sup>≤</sup> W Ið Þ, and from (31), and by applying induction arguments, we get

$$V\_t(I) \le \frac{W(I)}{1 - a\beta}, \ t \ge 0, \ I \in \mathbb{I}.\tag{46}$$

(b) Observe that from (39), for each I ∈I,

$$\mathcal{W}(I) \ge \mathcal{w}\_0(I) = \mathbf{1} + \mathcal{M}\mathcal{G}(I),$$

which implies that (see (43))

$$\frac{MG(I)}{W(I)} \le 1 - \frac{1}{W(I)} < \infty. \tag{47}$$

In addition, from (37),

$$h(I + Q^\*) \le MG(I). \tag{48}$$

On the other hand, similarly as (24), from (4), it is easy to see that

$$\int\_0^\infty |\rho\_t(s) - \rho(s)| ds \le 2^\sharp \mathcal{M}' \left( \int\_0^\infty |\rho\_t(s) - \rho(s)| ds \right)^{1/2},\tag{49}$$

for some constant M<sup>0</sup> < ∞: Hence, combining (47)–(49), from the definition of ctð Þ I; q and c Ið Þ ; q , we have

$$\frac{|c\_t(I,q) - c(I,q)|}{W(I)} \le \frac{h}{W(I)} \bigcap\_{0}^{\infty} (I + Q^\*) |\rho\_t(s) - \rho(s)| ds + \frac{b}{W(I)} \left[ s |\rho\_t(s) - \rho(s)| ds \right]$$

$$\le \frac{MG(I)}{W(I)} \bigcap\_{0}^{\infty} |\rho\_t(s) - \rho(s)| ds + b2^2 \mathcal{M} \left( \bigcap\_{0}^{\infty} |\rho\_t(s) - \rho(s)| ds \right)^{1/2}.$$

Finally, taking expectation, (28) and Property D.4 prove the result.

0

(c) For each I ∈I and t ≥0, by adding and subtracting the term α Ð ∞ Vt�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds, we have

$$\begin{split} |V\_{t}(I) - V^{\*}(I)| &\leq \sup\_{q \in \mathbb{Q}^{\*}} |c\_{t}(I, q) - (I, q)| + \sup\_{q \in \mathbb{Q}^{\*}} a \Big| V\_{t-1}((I + q - s)^{+}) |\rho\_{t}(s) - \rho(s)| ds \\ &\quad + a \Big| \Big| V\_{t-1}((I + q - s)^{+}) - V^{\*} \left( (I + q - s)^{+} \right) |\rho(s) ds \\ &\leq \sup\_{q \in \mathbb{Q}^{\*}} |c\_{t}(I, q) - (I, q)| + \frac{a}{1 - a\beta} \sup\_{q \in \mathbb{Q}^{\*}} \Big| \int\_{0}^{\infty} ((I + q - s)^{+}) |\rho\_{t}(s) - \rho(s)| ds \\ &\quad + a\theta \| V\_{t-1} - V^{\*} \|\_{W} W(I), \end{split}$$

gtm ðÞ!I g∞ð ÞI as m ! ∞:

Vm�<sup>1</sup> <sup>I</sup> <sup>þ</sup> gm � <sup>s</sup> � �<sup>þ</sup> � �ρmð Þ ds : (52)

<sup>V</sup> <sup>∗</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �j j <sup>ρ</sup>mðÞ�<sup>s</sup> <sup>ρ</sup>ð Þ<sup>s</sup> ds

! 0, as m ! ∞:

3 5

<sup>V</sup> <sup>∗</sup> <sup>I</sup> <sup>þ</sup> gm � <sup>s</sup> � �<sup>þ</sup> � �ρð Þ<sup>s</sup> ds

(53)

(54)

� � � � � �

<sup>V</sup> <sup>∗</sup> <sup>I</sup> <sup>þ</sup> gm � <sup>s</sup> � �<sup>þ</sup> � �ρð Þ<sup>s</sup> ds

∞ð

0

<sup>V</sup> <sup>∗</sup> <sup>I</sup> <sup>þ</sup> <sup>g</sup><sup>∞</sup> � <sup>s</sup> � �<sup>þ</sup> � �ρð Þ<sup>s</sup> ds,

� � � � � �

∞ð

0

<sup>V</sup> <sup>∗</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds

Moreover, from (34) and Theorem 5.1(d), letting tm ¼ m, we have

∞ð

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

0

On the other hand, following similar arguments as the proof of Theorem 5.1(c),

<sup>V</sup> <sup>∗</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds

∞ð

0

�ρmð Þs ds þ α

∞ð

0

Taking expectation and liminf as m ! ∞ on both sides of (54), from (53) we

≥ ∞ð

where the last inequality follows by applying Fatou's Lemma and because the function q ! ð Þ I þ q � s <sup>þ</sup> is continuous. Hence, taking expectation and liminf in

0

Vm�<sup>1</sup> <sup>I</sup> <sup>þ</sup> gm � <sup>s</sup> � �<sup>þ</sup> � �ρmð Þ¼ ds lim inf <sup>m</sup>!<sup>∞</sup> <sup>E</sup><sup>α</sup>

� � <sup>þ</sup> <sup>α</sup>

∞ð

0

<sup>1</sup> � αβ k k <sup>ρ</sup><sup>m</sup> � <sup>ρ</sup> :

VmðÞ¼ I cm I; gm

Vm�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � � <sup>V</sup> <sup>∗</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � � � �

Vm�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρmð Þ<sup>s</sup> ds � <sup>α</sup>

Vm�<sup>1</sup> <sup>I</sup> <sup>þ</sup> gm � <sup>s</sup> � �<sup>þ</sup> � �ρmð Þ� ds <sup>α</sup>

<sup>V</sup> <sup>∗</sup> <sup>I</sup> <sup>þ</sup> gm � <sup>s</sup> � �<sup>þ</sup> � �ρð Þ<sup>s</sup> ds:

for each m ≥0 and ð Þ I; q ∈ I � Q , we have

DOI: http://dx.doi.org/10.5772/intechopen.88392

Vm�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρmð Þ<sup>s</sup> ds � <sup>α</sup>

Vm�<sup>1</sup> <sup>I</sup> <sup>þ</sup> gm � <sup>s</sup> � �<sup>þ</sup> � �ρmð Þ ds

<sup>≤</sup> αβ∥Vm�<sup>1</sup> � <sup>V</sup> <sup>∗</sup> <sup>∥</sup>WW Ið Þþ <sup>α</sup>

Then, for each I ∈I,

0

¼ α ∞ð

2 4

0

0

∞ð

0

þ α ∞ð

� � � � � �

α ∞ð

� � � � � �

≤ α ∞ð

0

0

E sup q∈ Q<sup>∗</sup> α ∞ð

Now,

α ∞ð

obtain

lim inf <sup>m</sup>!<sup>∞</sup> <sup>E</sup><sup>α</sup>

(52), we obtain

85

0

where the last inequality is due to (35), (17), (14), and (15). Therefore, from (15) and (19) and by taking expectation,

$$E\|V\_t - V^\*\|\_W \le E \sup\_{q \in \mathbb{Q}^\*} |c\_t(I, q) - c(I, q)| + \frac{a}{1 - a\beta} E \|\rho\_t - \rho\| + a\beta E \|V\_{t-1} - V^\*\|\_W. \tag{50}$$

Finally, from (17) and (35), <sup>η</sup> <sup>≔</sup> lim sup<sup>t</sup>!∞E V <sup>∗</sup> k k � Vt <sup>W</sup> <sup>&</sup>lt; <sup>∞</sup>. Hence, taking limsup in both sides of (50), from part (a) and property D.5 in Theorem 4.1, we get η ≤ αβη, which yields η ¼ 0 (since 0< αβ < 1). This proves (c).

(d) For each t≥ 0, let Ht : I ! ℜ be the function defined as

$$H\_t(\boldsymbol{\y}) \coloneqq c\boldsymbol{\y} + L\_t(\boldsymbol{\y}) + a \int\_0^\infty V\_{t-1}((\boldsymbol{\y} - s)^+) \rho\_t(s) ds.$$

Hence, (34) is equivalent to

$$W\_t(I) = \min\_{q \in \mathbb{Q}^\*} H\_t(I+q) - cI, \qquad I \in \mathbb{I}.\tag{51}$$

Moreover (see (33)), observe that Ht is convex and lim<sup>y</sup>!<sup>∞</sup>Htð Þ¼ y ∞: Thus, there exist a constant Kt ≥0 such that

$$H\_t(K\_t) = \min\_{I \le \mathcal{Y} \le Q^\* + I} H\_t(\mathcal{Y}),$$

and

$$\mathbf{g}\_t(I) = \begin{cases} K\_t - I & \text{if } \quad \mathbf{0} \le I \le K\_t \\\mathbf{0} & \text{if } \qquad I > K\_t \end{cases}$$

attains the minimum in (51).∎

#### 6.3 Proof of Theorem 5.3

We fix an arbitrary I ∈I. Since g∞ð ÞI is an accumulation point of gt ð Þ<sup>I</sup> � � (see Remark 5.2), there exists a subsequence f g tmð ÞI of f gt (tm ¼ tmð ÞÞ I such that

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion DOI: http://dx.doi.org/10.5772/intechopen.88392

$$\operatorname{g}\_{t\_m}(I) \to \operatorname{g}\_{\infty}(I) \quad \text{as } m \to \infty.$$

Moreover, from (34) and Theorem 5.1(d), letting tm ¼ m, we have

$$V\_m(I) = c\_m(I, \mathbf{g}\_m) + a \int\_0^\infty V\_{m-1} \left( \left( I + \mathbf{g}\_m - \mathbf{s} \right)^+ \right) \rho\_m(ds). \tag{52}$$

On the other hand, following similar arguments as the proof of Theorem 5.1(c), for each m ≥0 and ð Þ I; q ∈ I � Q , we have

$$\begin{aligned} & \left| a \int\_0^\infty V\_{m-1} ( (I + q - s)^+ ) \rho\_m(s) ds - a \int\_0^\infty V^\* \left( (I + q - s)^+ \right) \rho(s) ds \right| \\ & \stackrel{\text{as}}{\iff} a \left| V\_{m-1} ( (I + q - s)^+ ) - V^\* \left( (I + q - s)^+ \right) \right| \rho\_m(s) ds + a \int\_0^\infty V^\* \left( (I + q - s)^+ \right) |\rho\_m(s) - \rho(s)| ds \\ & \le a \theta \| V\_{m-1} - V^\* \|\_{W} W(I) + \frac{a}{1 - a\beta} \| \rho\_m - \rho \| . \end{aligned}$$

Then, for each I ∈I,

$$E\sup\_{q\in\mathbb{Q}^\*} \left| \alpha \int\_0^\infty V\_{m-1}((I+q-s)^+) \rho\_m(s)ds - \alpha \int\_0^\infty V^\*\left((I+q-s)^+\right) \rho(s)ds \right| \to 0, \quad \text{as } m \to \infty. \tag{53}$$

Now,

(c) For each I ∈I and t ≥0, by adding and subtracting the term

j j ctð Þ�ð I; q I; qÞ þ sup

j j ctð Þ�ð <sup>I</sup>; <sup>q</sup> <sup>I</sup>; <sup>q</sup><sup>Þ</sup> <sup>þ</sup> <sup>α</sup>

<sup>þ</sup> αβ∥Vt�<sup>1</sup> � <sup>V</sup> <sup>∗</sup> <sup>∥</sup>WW Ið Þ,

Vt�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � � <sup>V</sup> <sup>∗</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � � � �

q∈ Q<sup>∗</sup> α ∞ð

<sup>1</sup> � αβ sup q∈ Q<sup>∗</sup>

1 � αβ

where the last inequality is due to (35), (17), (14), and (15). Therefore, from

Finally, from (17) and (35), <sup>η</sup> <sup>≔</sup> lim sup<sup>t</sup>!∞E V <sup>∗</sup> k k � Vt <sup>W</sup> <sup>&</sup>lt; <sup>∞</sup>. Hence, taking limsup in both sides of (50), from part (a) and property D.5 in Theorem 4.1, we get

∞ð

0

Moreover (see (33)), observe that Ht is convex and lim<sup>y</sup>!<sup>∞</sup>Htð Þ¼ y ∞: Thus,

<sup>I</sup> <sup>≤</sup> <sup>y</sup> <sup>≤</sup> <sup>Q</sup> <sup>∗</sup> <sup>þ</sup><sup>I</sup>

Kt � I if 0 ≤ I ≤ Kt 0 if I >Kt

Htð Þ¼ Kt min

We fix an arbitrary I ∈I. Since g∞ð ÞI is an accumulation point of gt

Remark 5.2), there exists a subsequence f g tmð ÞI of f gt (tm ¼ tmð ÞÞ I such that

�

j j ctð Þ� <sup>I</sup>; <sup>q</sup> c Ið Þ ; <sup>q</sup> <sup>þ</sup> <sup>α</sup>

η ≤ αβη, which yields η ¼ 0 (since 0< αβ < 1). This proves (c). (d) For each t≥ 0, let Ht : I ! ℜ be the function defined as

Htð Þy ≔ cy þ Ltð Þþ y α

VtðÞ¼ I min

gt ðÞ¼ I 0

∞ð

0

Vt�<sup>1</sup> ð Þ <sup>y</sup> � <sup>s</sup> <sup>þ</sup> � �ρtð Þ<sup>s</sup> ds:

<sup>q</sup><sup>∈</sup> <sup>Q</sup><sup>∗</sup> Htð Þ� <sup>I</sup> <sup>þ</sup> <sup>q</sup> cI, I <sup>∈</sup> <sup>I</sup>: (51)

Htð Þy ,

Vt�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � <sup>ρ</sup><sup>t</sup> j j ðÞ�<sup>s</sup> <sup>ρ</sup>ð Þ<sup>s</sup> ds

W Ið Þ <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � � <sup>ρ</sup><sup>t</sup> j j ðÞ�<sup>s</sup> <sup>ρ</sup>ð Þ<sup>s</sup> ds

<sup>E</sup>k k <sup>ρ</sup><sup>t</sup> � <sup>ρ</sup> <sup>þ</sup> αβE∥Vt�<sup>1</sup> � <sup>V</sup> <sup>∗</sup> <sup>∥</sup>W:

(50)

ð Þ<sup>I</sup> � � (see

�ρð Þs ds

Vt�<sup>1</sup> ð Þ <sup>I</sup> <sup>þ</sup> <sup>q</sup> � <sup>s</sup> <sup>þ</sup> � �ρð Þ<sup>s</sup> ds, we have

q∈ Q<sup>∗</sup>

þ α ∞ð

≤ sup q∈ Q<sup>∗</sup>

Hence, (34) is equivalent to

there exist a constant Kt ≥0 such that

attains the minimum in (51).∎

6.3 Proof of Theorem 5.3

and

84

0

(15) and (19) and by taking expectation,

q∈ Q<sup>∗</sup>

α Ð ∞ 0

Vtð Þ� <sup>I</sup> <sup>V</sup> <sup>∗</sup> j j ð Þ<sup>I</sup> <sup>≤</sup> sup

Statistical Methodologies

<sup>E</sup>∥Vt � <sup>V</sup> <sup>∗</sup> <sup>∥</sup><sup>W</sup> <sup>≤</sup> <sup>E</sup> sup

$$\begin{split} &a \int\_{0}^{\infty} V\_{m-1} \left( \left( I + \mathbf{g}\_{m} - \boldsymbol{s} \right)^{+} \right) \rho\_{m}(ds) \\ & \qquad = \left[ a \int\_{0}^{\infty} V\_{m-1} \left( \left( I + \mathbf{g}\_{m} - \boldsymbol{s} \right)^{+} \right) \rho\_{m}(ds) - a \int\_{0}^{\infty} V^{\*} \left( \left( I + \mathbf{g}\_{m} - \boldsymbol{s} \right)^{+} \right) \rho(s) ds \right] \quad \text{(54)} \\ & \qquad \phantom{\mathcal{L}} + a \int\_{0}^{\infty} V^{\*} \left( \left( I + \mathbf{g}\_{m} - \boldsymbol{s} \right)^{+} \right) \rho(s) ds . \end{split} \quad \text{(55)} \end{split}$$

Taking expectation and liminf as m ! ∞ on both sides of (54), from (53) we obtain

$$\begin{aligned} \lim\_{m \to \infty} \inf E a \int\_0^\infty V\_{m-1} \left( \left( I + \mathfrak{g}\_m - \mathfrak{s} \right)^+ \right) \rho\_m(ds) &= \lim\_{m \to \infty} \inf E a \int\_0^\infty V^\* \left( \left( I + \mathfrak{g}\_m - \mathfrak{s} \right)^+ \right) \rho(s) ds \\ &\ge \int\_0^\infty V^\* \left( \left( I + \mathfrak{g}\_\infty - \mathfrak{s} \right)^+ \right) \rho(s) ds, \end{aligned}$$

where the last inequality follows by applying Fatou's Lemma and because the function q ! ð Þ I þ q � s <sup>þ</sup> is continuous. Hence, taking expectation and liminf in (52), we obtain

$$\mathcal{L}\left(I,\mathcal{g}\_{\infty}\right) + a \int\_{0}^{\infty} V^\* \left(\left(I + \mathcal{g}\_{\infty} - s\right)^+\right) \rho(s) ds \le V^\*(I), \quad I \in \mathbb{I}. \tag{55}$$

References

[1] Arrow KJ, Karlin S, Scarf H. Studies

DOI: http://dx.doi.org/10.5772/intechopen.88392

[10] Hernández-Lerma O, Lasserre JB. Discrete-Time Markov Control Processes: Basic Optimality Criteria. New York: Springer-Verlag; 1996

[11] Hernández-Lerma O, Lasserre JB. Further Topics on Discrete-Time Markov Control Processes. New York:

[12] Hilgert N, Minjárez-Sosa JA. Adaptive policies for time-varying stochastic systems under discounted criterion. Mathematical Methods of Operations Research. 2001;54(3):

[13] Minjárez-Sosa JA. Approximation and estimation in Markov control processes under discounted criterion. Kybernetika. 2004;6(40):681-690

[14] Minjárez-Sosa JA. Empirical estimation in average Markov control processes. Applied Mathematics Letters.

[15] Minjárez-Sosa JA. Markov control models with unknown random stateaction-dependent discount factors. TOP.

2008;21:459-464

2015;23:743-772

Springer-Verlag; 1999

491-505

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion

Inventory and Production. CA: Stanford

[2] Bensoussan A, Çakanyıldırım M, Sethi SP. Partially observed inventory systems: The case of zero balance walk.

[3] Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Royal A, Sethi SP. Inventory problems with partially observed demands and lost sales. Journal of Optimization Theory and Applications. 2008;136:321-340

[4] Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Sethi SP, Shi R. Partially observed inventory systems: The case of rain checks. SIAM Journal on Control and Optimization. 2008;

[5] Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Sethi SP, Shi R. An incomplete information inventory model with presence of inventories or backorders as only observations. Journal

[7] Beyer D, Cheng F, Sethi SP, Taksar MI. Markovian Demand Inventory Models. New York: Springer; 2008

[8] Dynkin EB, Yushkevich AA. Controlled Markov Processes. New York: Springer-Verlag; 1979

[9] Gordienko EI, Minjárez-Sosa JA. Adaptive control for discrete-time Markov processes with unbounded

costs: Discounted criterion. Kybernetika. 1998;34:217-234

87

of Optimization Theory and Applications. 2010;146(3):544-580

[6] Bertsekas DP. Dynamic Programming: Deterministic and Stochastic Models. Englewood Cliffs, N.

J: Prentice-Hall; 1987

47(5):2490-2519

in the Mathematical Theory of

SIAM Journal on Control and Optimization. 2007;46:176-209

University Press; 1958

As I was arbitrary, by (18), the equality holds in (55) for all I ∈I. To conclude, standard arguments on stochastic control literature (see, e.g., [10]) show that the policy <sup>γ</sup> <sup>∗</sup> <sup>¼</sup> <sup>g</sup><sup>∞</sup> � � is optimal.∎

#### 7. Concluding remarks

In this chapter we have introduced an estimation and control procedure in inventory systems when the density of the demand is unknown by the inventory manager. Specifically we have proposed a density estimation method defined by the projection to a suitable set of densities, which, combined with control schemes relative to the inventory systems, defines a procedure to construct optimal ordering policies.

A point to highlight is that our results include the most general scenarios of an inventory system, e.g., state and control spaces either countable or uncountable, possibly unbounded costs, finite or infinite inventory capacity. This generality entailed the need to develop new estimation and control techniques, accompanied by a suitable mathematical analysis. For example, the simple fact of considering possibly unbounded costs led us to formulate a density estimation method that was related to the weight function W, which, in turn, defines the normed linear space B<sup>W</sup> (see (15)), all this through the projection estimator. Observe that if the cost function c is bounded, we can take W � 1 and we have k k� ¼ � k k<sup>L</sup><sup>1</sup> (see (19) and (25)). Thus, any L1�consistent density estimator ρ<sup>t</sup> can be used for the construction of optimal ordering policies.

Finally, the theory presented in this chapter lays the foundations to develop estimation and control algorithms in inventory systems considering other optimality criteria, for instance, the average cost or discounted criteria with random stateaction-dependent discount factors (see [14, 15] and references therein).

#### Author details

Jesús Adolfo Minjárez-Sosa Departamento de Matemáticas, Universidad de Sonora, Hermosillo, Sonora, Mexico

\*Address all correspondence to: aminjare@gauss.mat.uson.mx

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Density Estimation in Inventory Control Systems under a Discounted Optimality Criterion DOI: http://dx.doi.org/10.5772/intechopen.88392

### References

c I; g<sup>∞</sup> � � <sup>þ</sup> <sup>α</sup>

� � is optimal.∎

policy <sup>γ</sup> <sup>∗</sup> <sup>¼</sup> <sup>g</sup><sup>∞</sup>

Statistical Methodologies

policies.

7. Concluding remarks

of optimal ordering policies.

Author details

86

Jesús Adolfo Minjárez-Sosa

provided the original work is properly cited.

∞ð

<sup>V</sup> <sup>∗</sup> <sup>I</sup> <sup>þ</sup> <sup>g</sup><sup>∞</sup> � <sup>s</sup> � �<sup>þ</sup> � �

As I was arbitrary, by (18), the equality holds in (55) for all I ∈I. To conclude, standard arguments on stochastic control literature (see, e.g., [10]) show that the

In this chapter we have introduced an estimation and control procedure in inventory systems when the density of the demand is unknown by the inventory manager. Specifically we have proposed a density estimation method defined by the projection to a suitable set of densities, which, combined with control schemes relative to the inventory systems, defines a procedure to construct optimal ordering

A point to highlight is that our results include the most general scenarios of an inventory system, e.g., state and control spaces either countable or uncountable, possibly unbounded costs, finite or infinite inventory capacity. This generality entailed the need to develop new estimation and control techniques, accompanied by a suitable mathematical analysis. For example, the simple fact of considering possibly unbounded costs led us to formulate a density estimation method that was related to the weight function W, which, in turn, defines the normed linear space B<sup>W</sup> (see (15)), all this through the projection estimator. Observe that if the cost function c is bounded, we can take W � 1 and we have k k� ¼ � k k<sup>L</sup><sup>1</sup> (see (19) and (25)). Thus, any L1�consistent density estimator ρ<sup>t</sup> can be used for the construction

Finally, the theory presented in this chapter lays the foundations to develop estimation and control algorithms in inventory systems considering other optimality criteria, for instance, the average cost or discounted criteria with random state-

Departamento de Matemáticas, Universidad de Sonora, Hermosillo, Sonora, Mexico

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

\*Address all correspondence to: aminjare@gauss.mat.uson.mx

action-dependent discount factors (see [14, 15] and references therein).

<sup>ρ</sup>ð Þ<sup>s</sup> ds <sup>≤</sup> <sup>V</sup> <sup>∗</sup> ð Þ<sup>I</sup> , I <sup>∈</sup> <sup>I</sup>: (55)

0

[1] Arrow KJ, Karlin S, Scarf H. Studies in the Mathematical Theory of Inventory and Production. CA: Stanford University Press; 1958

[2] Bensoussan A, Çakanyıldırım M, Sethi SP. Partially observed inventory systems: The case of zero balance walk. SIAM Journal on Control and Optimization. 2007;46:176-209

[3] Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Royal A, Sethi SP. Inventory problems with partially observed demands and lost sales. Journal of Optimization Theory and Applications. 2008;136:321-340

[4] Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Sethi SP, Shi R. Partially observed inventory systems: The case of rain checks. SIAM Journal on Control and Optimization. 2008; 47(5):2490-2519

[5] Bensoussan A, Çakanyıldırım M, Minjárez-Sosa JA, Sethi SP, Shi R. An incomplete information inventory model with presence of inventories or backorders as only observations. Journal of Optimization Theory and Applications. 2010;146(3):544-580

[6] Bertsekas DP. Dynamic Programming: Deterministic and Stochastic Models. Englewood Cliffs, N. J: Prentice-Hall; 1987

[7] Beyer D, Cheng F, Sethi SP, Taksar MI. Markovian Demand Inventory Models. New York: Springer; 2008

[8] Dynkin EB, Yushkevich AA. Controlled Markov Processes. New York: Springer-Verlag; 1979

[9] Gordienko EI, Minjárez-Sosa JA. Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion. Kybernetika. 1998;34:217-234

[10] Hernández-Lerma O, Lasserre JB. Discrete-Time Markov Control Processes: Basic Optimality Criteria. New York: Springer-Verlag; 1996

[11] Hernández-Lerma O, Lasserre JB. Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag; 1999

[12] Hilgert N, Minjárez-Sosa JA. Adaptive policies for time-varying stochastic systems under discounted criterion. Mathematical Methods of Operations Research. 2001;54(3): 491-505

[13] Minjárez-Sosa JA. Approximation and estimation in Markov control processes under discounted criterion. Kybernetika. 2004;6(40):681-690

[14] Minjárez-Sosa JA. Empirical estimation in average Markov control processes. Applied Mathematics Letters. 2008;21:459-464

[15] Minjárez-Sosa JA. Markov control models with unknown random stateaction-dependent discount factors. TOP. 2015;23:743-772

Chapter 6

Abstract

loss functions.

study, applications

1. Introduction

89

Applications

A Comparative Study of

and Bayesian Estimation for

Erlang Distribution and Its

Kaisar Ahmad and Sheikh Parvaiz Ahmad

Maximum Likelihood Estimation

In this chapter, Erlang distribution is considered. For parameter estimation, maximum likelihood method of estimation, method of moments and Bayesian method of estimation are applied. In Bayesian methodology, different prior distributions are employed under various loss functions to estimate the rate parameter of Erlang distribution. At the end the simulation study is conducted in R-Software to compare these methods by using mean square error with varying sample sizes. Also the real life applications are examined in order to compare the behavior of the data sets in the parametric estimation. The comparison is also done among the different

Keywords: Erlang distribution, prior distributions, loss functions, simulation

Erlang distribution is a continuous probability distribution with wide applicability, primarily due to its relation to the exponential and gamma distributions. The Erlang distribution was developed by Erlang [1] to examine the number of telephone calls that could be made at the same time to switching station operators. This distribution can be expressed as waiting time and message length in telephone traffic. If the duration of individual calls are exponentially distributed then the duration of succession of calls is the Erlang distribution. The Erlang variate becomes gamma variate when its shape parameter is an integer (for details see Evans et al. [2]). Bhattacharyya and Singh [3] obtained Bayes estimator for the Erlangian queue under two prior densities. Haq and Dey [4] addressed the problem of Bayesian estimation of parameters for the Erlang distribution assuming different independent informative priors. Suri et al. [5] used Erlang distribution to design a simulator for time estimation of project management process. Damodaran et al. [6] obtained the expected time between failure measures. Further, they showed that the predicted failure times are closer to the actual failure times. Jodra [7] showed the

### Chapter 6
