Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling

Partha Parichha, Kajla Basu and Arnab Bandyopadhyay

#### Abstract

This article describes the problem of estimation of finite population mean in two-phase stratified random sampling. Using information on two auxiliary variables, a class of product to regression chain type estimators has been proposed and its characteristic is discussed. The unbiased version of the proposed class of estimators has been constructed and the optimality condition for the proposed class of estimators is derived. The efficacy of the proposed methodology has been justified through empirical investigations carried over the data set of natural population as well as the data set of artificially generated population. The survey statistician may be suggested to use it.

Keywords: stratified random sampling, double sampling, auxiliary variables, chain type estimators, bias, mean square error, efficiency, AMS 2000 Mathematics Subject Classification: 62D05

#### 1. Introduction

In this present paper we have made use of Auxiliary information extracted from the variables having correlation with study variable. Auxiliary information may be utilized at planning, design and estimation stages to develop improved estimation procedures in sample surveys. Sometimes, information on auxiliary variable may be readily available for all the units of population; for example, tonnage (or seat capacity) of each vehicle or ship is known in survey sampling of transportation and number of beds available in different hospitals may be known well in advance in health care surveys. If such information lacks, it is sometimes, relatively cheap to take a large preliminary sample where auxiliary variable alone is measured, such practice is applicable in two-phase (or double) sampling. Two-phase stratified sampling happens to be a powerful and cost effective (economical) technique for obtaining the reliable estimate in first-phase (preliminary) sample for the unknown parameters of the auxiliary variables. For example, Sukhatme [1] mentioned that in a survey to estimate the production of lime crop based on orchards as sampling units, a comparatively larger sample is drawn to determine the acreage under the crop while the yield rate is determined from a sub sample of the orchards selected for determining acreage.

In order to construct an efficient estimator of the population mean of the auxiliary variable in first-phase (preliminary) sample, Chand [2] introduced a technique of chaining another auxiliary variable with the first auxiliary variable by using the ratio estimator in the first phase sample. The estimator is known as chain-type ratio estimator. This work was further extended by Kiregyera [3, 4], Tracy et al. [5], Singh and Espejo [6], Gupta and Shabbir [7], Shukla et al. [8], Choudhury and Singh [9], Parichha et al. [10] and among others, where they proposed various chain-type ratio and regression estimators.

Let Cyh <sup>¼</sup> Syh

r

Syh ¼

Yh

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑Nh <sup>i</sup>¼<sup>1</sup> <sup>y</sup>ð Þ hi�Yh

Nh�1

sample n<sup>0</sup>

nh <sup>∑</sup>nh

yh <sup>¼</sup> <sup>1</sup>

obtained as

MSE yð Þ <sup>h</sup>

31

z0 <sup>h</sup> <sup>¼</sup> <sup>1</sup> n0 h ∑<sup>n</sup><sup>0</sup> h , Cxh <sup>¼</sup> Sxh Xh

DOI: http://dx.doi.org/10.5772/intechopen.82850

, Sxh ¼

variables x and z is observed.

<sup>i</sup>¼<sup>1</sup>yhi, xh <sup>¼</sup> <sup>1</sup>

nh <sup>∑</sup>nh

3. Discussion on existing estimation strategies

The mean square error (MSE) of yst, is given by

MES yst � � <sup>¼</sup> <sup>∑</sup><sup>L</sup>

ratio-product type estimator in stratified sampling structure as

<sup>h</sup>¼<sup>1</sup>whyh

1 nh � 1 n0 h � �A3h <sup>þ</sup>

yð Þ <sup>h</sup> RP <sup>¼</sup> <sup>∑</sup><sup>L</sup>

The bias and MSE respectively of yð Þ <sup>h</sup>

RP � � ffi <sup>∑</sup><sup>L</sup>

<sup>h</sup>¼<sup>1</sup>w2 hs 2 yh

Bias yð Þ <sup>h</sup>

RP � � <sup>¼</sup> <sup>∑</sup><sup>L</sup>

r

2

standard deviations in the hth stratum.

and Czh <sup>¼</sup> Szh

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑Nh <sup>i</sup>¼<sup>1</sup>ð Þ xhi�Xh

Nh�1

i. In the first phase, a preliminary large sample of size n<sup>0</sup>

Zh

2

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling

, Szh ¼

Let ρyxh, ρyzh and ρxzh be the correlation coefficients between (y, x), (y, z), and (x, z) respectively in the hth stratum. Chand [2] and Kiregyera [3, 4] discussed a situation in simple random sampling when information on x is unknown but another auxiliary variable z is easily available. It is assumed that population mean of one auxiliary variable z is known in advance and the population mean of the other auxiliary variable x is unknown. We seek to estimate through a two-phase stratified sampling design. Using a simple random sample without replacement (SRSWOR) sampling scheme at each phase, we adopt the double sampling scheme as follows.

hth stratum of size Nh (h = 1, 2,…, L) and information on the auxiliary

nh <sup>∑</sup>nh

The usual stratified mean estimator (yst) for population mean (Y), is given by

yst <sup>¼</sup> <sup>∑</sup><sup>L</sup>

h¼1' w2 h 1 nh � 1 Nh � �<sup>s</sup>

<sup>h</sup>¼<sup>1</sup>whyh

1 nh � 1 n0 h � �A1h <sup>þ</sup>

Motivated with the technique adopted by Chand [2], one may frame the chain

X0 h xh ! Zh

> 1 n0 h � 1 Nh � �A4h <sup>þ</sup>

z 0 h

!

<sup>h</sup> units from the h th stratum of size Nh and information on both the

<sup>i</sup>¼<sup>1</sup>zhi, <sup>x</sup><sup>0</sup>

<sup>h</sup> <sup>¼</sup> <sup>1</sup> n0 h ∑<sup>n</sup><sup>0</sup> h <sup>i</sup>¼<sup>1</sup>xhi, and

<sup>h</sup>¼<sup>1</sup>whyh (1)

yh (2)

(3)

2

RP, to first order of approximation, are

1 n0 h � 1 Nh

� � � � (5)

� �A2h � � (4)

1 nh � 1 Nh

ii. In the second phase, a sub-sample of size nh is drawn from first phase

<sup>i</sup>¼<sup>1</sup>zhi be the corresponding sample means in the hth stratum.

study variable y and the auxiliary variables x and z is taken.

<sup>i</sup>¼<sup>1</sup>xhi, zh <sup>¼</sup> <sup>1</sup>

r

be the coefficients of variation where

2

are the population

<sup>h</sup> is drawn from the

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑Nh <sup>i</sup>¼<sup>1</sup>ð Þ zhi�Zh

Nh�1

In practice, the population may often consist of heterogeneous units. For example, in socio-economic surveys, people may live in rural areas, urban localities, ordinary domestic houses, hostels, hospitals and jail, etc. In such a situation one should carefully study the population according to the characteristics of regions and then apply sampling scheme strata wise independently. This procedure is known as stratified random sampling. It may be noted that most of the developments in twophase sampling scheme are based on simple random sampling only while limited number of attempts are taken to address the problems of two-phase sampling scheme in the platform of stratified random sampling. It may be also noticeable that the most of the research work on two-phase sampling are producing biased estimates. However, biased becomes a serious drawback in sample surveys. A sampling method is called biased if it systematically favors some outcomes over others. It results in a biased sample of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling. For example, telephone sampling is common in marketing surveys. A simple random sample may be chosen from the sampling frame consisting of a list of telephone numbers of people in the area being surveyed. This method does involve taking a simple random sample, but it is not a simple random sample of the target population (consumers in the area being surveyed). It will miss people who do not have a phone. It may also miss people who only have a cell phone that has an area code not in the region being surveyed. It will also miss people who do not wish to be surveyed, including those who monitor calls on an answering machine and don't answer those from telephone surveyors. Thus the method systematically excludes certain types of consumers in the area. It is obvious that the inferences from a biased sample are not as trustworthy as conclusions from a truly random sample.

Encouraged with the above work, we have proposed a class of product to regression chain type estimators in stratified sampling using two auxiliary variables under double sampling. The unbiased version of the proposed class of estimators has been obtained which make the estimation strategy more practicable. The dominance of the proposed estimation strategy over the conventional ones has been established through empirical investigations carried over the data set of natural as well as artificially generated population.

#### 2. Sampling structures and notations

Consider a finite population U = {1, 2,…, N} of N identifiable units divided into L homogeneous strata with the hth stratum (h = 1, 2,…, L) having Nh. Let y and (x, z) be the study variable and two auxiliary variables respectively taking values yih and (xih,zih), respectively, for the unit i = 1,2,…Nh of the hth stratum. <sup>Y</sup> <sup>¼</sup> <sup>∑</sup><sup>L</sup> <sup>h</sup>¼<sup>1</sup>YhWh, <sup>X</sup> <sup>¼</sup> <sup>∑</sup><sup>L</sup> <sup>h</sup>¼<sup>1</sup>XhWh, <sup>Z</sup> <sup>¼</sup> <sup>∑</sup><sup>L</sup> <sup>h</sup>¼<sup>1</sup>ZhWh be population means of the study and the auxiliary variables, and Yh <sup>¼</sup> <sup>∑</sup>Nh i¼1 yhi Nh , Xh <sup>¼</sup> <sup>∑</sup>Nh i¼1 xhi Nh , Zh <sup>¼</sup> <sup>∑</sup>Nh i¼1 zhi Nh be the corresponding stratum means. Here Wh <sup>¼</sup> Nh <sup>N</sup> is the known stratum weight.

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling DOI: http://dx.doi.org/10.5772/intechopen.82850

Let Cyh <sup>¼</sup> Syh Yh , Cxh <sup>¼</sup> Sxh Xh and Czh <sup>¼</sup> Szh Zh be the coefficients of variation where Syh ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑Nh <sup>i</sup>¼<sup>1</sup> <sup>y</sup>ð Þ hi�Yh 2 Nh�1 r , Sxh ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑Nh <sup>i</sup>¼<sup>1</sup>ð Þ xhi�Xh 2 Nh�1 r , Szh ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑Nh <sup>i</sup>¼<sup>1</sup>ð Þ zhi�Zh 2 Nh�1 r are the population standard deviations in the hth stratum.

Let ρyxh, ρyzh and ρxzh be the correlation coefficients between (y, x), (y, z), and (x, z) respectively in the hth stratum. Chand [2] and Kiregyera [3, 4] discussed a situation in simple random sampling when information on x is unknown but another auxiliary variable z is easily available. It is assumed that population mean of one auxiliary variable z is known in advance and the population mean of the other auxiliary variable x is unknown. We seek to estimate through a two-phase stratified sampling design. Using a simple random sample without replacement (SRSWOR) sampling scheme at each phase, we adopt the double sampling scheme as follows.


$$\overline{\mathbf{y}}\_{\mathbf{h}} = \frac{1}{\mathbf{n}\_{\mathbf{h}}} \sum\_{i=1}^{\mathbf{n}\_{\mathbf{h}}} \mathbf{y}\_{\mathbf{h}i}, \overline{\mathbf{x}}\_{\mathbf{h}} = \frac{1}{\mathbf{n}\_{\mathbf{h}}} \sum\_{i=1}^{\mathbf{n}\_{\mathbf{h}}} \mathbf{x}\_{\mathbf{h}i}, \overline{\mathbf{z}}\_{\mathbf{h}} = \frac{1}{\mathbf{n}\_{\mathbf{h}}} \sum\_{i=1}^{\mathbf{n}\_{\mathbf{h}}} \mathbf{z}\_{\mathbf{h}i}, \overline{\mathbf{x}}\_{\mathbf{h}}' = \frac{1}{\mathbf{n}\_{\mathbf{h}}'} \sum\_{i=1}^{\mathbf{n}\_{\mathbf{h}}'} \mathbf{x}\_{\mathbf{h}i}, \text{ and }$$
 
$$\overline{\mathbf{z}}\_{\mathbf{h}}' = \frac{1}{\mathbf{n}\_{\mathbf{h}}'} \sum\_{i=1}^{\mathbf{n}\_{\mathbf{h}}'} \mathbf{z}\_{\mathbf{h}i} \text{ be the corresponding sample means in the } \mathbf{h}\text{th stratum.}$$

#### 3. Discussion on existing estimation strategies

The usual stratified mean estimator (yst) for population mean (Y), is given by

$$\overline{\mathbf{y}}\_{\text{st}} = \sum\_{\mathbf{h}=\mathbf{l}}^{\text{L}} \mathbf{w}\_{\text{h}} \overline{\mathbf{y}}\_{\text{h}} \tag{1}$$

The mean square error (MSE) of yst, is given by

$$\text{MES} \left( \overline{\mathbf{y}}\_{\text{st}} \right) = \sum\_{\text{h}=1}^{L} \mathbf{w}\_{\text{h}}^{2} \left( \frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}} - \frac{\mathbf{1}}{\mathbf{N}\_{\text{h}}} \right) \mathbf{s}\_{\text{y}\text{h}}^{2} \tag{2}$$

Motivated with the technique adopted by Chand [2], one may frame the chain ratio-product type estimator in stratified sampling structure as

$$\overline{\mathbf{y}}\_{\rm RP}^{(\rm h)} = \sum\_{\rm h=1}^{L} \mathbf{w}\_{\rm h} \overline{\mathbf{y}}\_{\rm h} \left( \frac{\overline{\mathbf{X}}\_{\rm h}'}{\overline{\mathbf{x}}\_{\rm h}} \right) \left( \frac{\overline{\mathbf{Z}}\_{\rm h}}{\overline{\mathbf{z}}\_{\rm h}}' \right) \tag{3}$$

The bias and MSE respectively of yð Þ <sup>h</sup> RP, to first order of approximation, are obtained as

$$\mathbf{Bias}\left(\overline{\mathbf{y}}\_{\rm RP}^{(\rm h)}\right) \cong \sum\_{\mathbf{h}=1}^{L} \mathbf{w}\_{\rm h} \overline{\mathbf{y}}\_{\rm h} \left[ \left(\frac{1}{\mathbf{n}\_{\rm h}} - \frac{1}{\mathbf{n}\_{\rm h}'}\right) \mathbf{A}\_{\rm 1h} + \left(\frac{1}{\mathbf{n}\_{\rm h}'} - \frac{1}{\mathbf{N}\_{\rm h}}\right) \mathbf{A}\_{\rm 2h} \right] \tag{4}$$

$$\text{MSE}\left(\overline{\mathbf{y}}\_{\text{RP}}^{\left(\mathbf{h}\right)}\right) = \sum\_{\mathbf{h}=1}^{\mathbf{L}} \mathbf{w}\_{\text{h}}^{2} \mathbf{s}\_{\text{yh}}^{2} \left[ \left(\frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}} - \frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}'}\right) \mathbf{A}\_{\text{3h}} + \left(\frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}'} - \frac{\mathbf{1}}{\mathbf{N}\_{\text{h}}}\right) \mathbf{A}\_{\text{4h}} + \left(\frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}} - \frac{\mathbf{1}}{\mathbf{N}\_{\text{h}}}\right) \right] \tag{5}$$

of chaining another auxiliary variable with the first auxiliary variable by using the ratio estimator in the first phase sample. The estimator is known as chain-type ratio estimator. This work was further extended by Kiregyera [3, 4], Tracy et al. [5], Singh and Espejo [6], Gupta and Shabbir [7], Shukla et al. [8], Choudhury and Singh [9], Parichha et al. [10] and among others, where they proposed various

In practice, the population may often consist of heterogeneous units. For exam-

ple, in socio-economic surveys, people may live in rural areas, urban localities, ordinary domestic houses, hostels, hospitals and jail, etc. In such a situation one should carefully study the population according to the characteristics of regions and then apply sampling scheme strata wise independently. This procedure is known as stratified random sampling. It may be noted that most of the developments in twophase sampling scheme are based on simple random sampling only while limited number of attempts are taken to address the problems of two-phase sampling scheme in the platform of stratified random sampling. It may be also noticeable that the most of the research work on two-phase sampling are producing biased estimates. However, biased becomes a serious drawback in sample surveys. A sampling method is called biased if it systematically favors some outcomes over others. It results in a biased sample of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling. For example, telephone sampling is common in marketing surveys. A simple random sample may be chosen from the sampling frame consisting of a list of telephone numbers of people in the area being surveyed. This method does involve taking a simple random sample, but it is not a simple random sample of the target population (consumers in the area being surveyed). It will miss people who do not have a phone. It may also miss people who only have a cell phone that has an area code not in the region being surveyed. It will also miss people who do not wish to be surveyed, including those who monitor calls on an answering machine and don't answer those from telephone surveyors. Thus the method systematically excludes certain types of consumers in the area. It is obvious that the inferences from a biased sample are not as trustworthy as conclu-

Encouraged with the above work, we have proposed a class of product to regression chain type estimators in stratified sampling using two auxiliary variables under double sampling. The unbiased version of the proposed class of estimators has been obtained which make the estimation strategy more practicable. The dominance of the proposed estimation strategy over the conventional ones has been established through empirical investigations carried over the data set of natural as

Consider a finite population U = {1, 2,…, N} of N identifiable units divided into L homogeneous strata with the hth stratum (h = 1, 2,…, L) having Nh. Let y and (x, z) be the study variable and two auxiliary variables respectively taking values

> i¼1 yhi Nh

, Xh <sup>¼</sup> <sup>∑</sup>Nh

i¼1 xhi Nh

<sup>h</sup>¼<sup>1</sup>ZhWh be population means of the study

<sup>N</sup> is the known stratum weight.

, Zh <sup>¼</sup> <sup>∑</sup>Nh

i¼1 zhi Nh be the

yih and (xih,zih), respectively, for the unit i = 1,2,…Nh of the hth stratum.

<sup>h</sup>¼<sup>1</sup>XhWh, <sup>Z</sup> <sup>¼</sup> <sup>∑</sup><sup>L</sup>

chain-type ratio and regression estimators.

Statistical Methodologies

sions from a truly random sample.

well as artificially generated population.

2. Sampling structures and notations

and the auxiliary variables, and Yh <sup>¼</sup> <sup>∑</sup>Nh

corresponding stratum means. Here Wh <sup>¼</sup> Nh

<sup>h</sup>¼<sup>1</sup>YhWh, <sup>X</sup> <sup>¼</sup> <sup>∑</sup><sup>L</sup>

<sup>Y</sup> <sup>¼</sup> <sup>∑</sup><sup>L</sup>

30

where

$$\mathbf{A\_{1h}} = \mathbf{C\_{xh}^2} - \rho\_{\rm yxh} \mathbf{C\_{yh}} \mathbf{C\_{xh}} \text{ and } \mathbf{A\_{2h}} = \mathbf{C\_{zh}^2} - \rho\_{\rm yzh} \mathbf{C\_{yh}} \mathbf{C\_{zh}}$$

$$\mathbf{A\_{3h}} = \frac{\mathbf{C\_{xh}^2}}{\mathbf{C\_{yh}^2}} - 2\rho\_{\rm yxh} \frac{\mathbf{C\_{xh}}}{\mathbf{C\_{yh}}} \quad \text{and} \quad \mathbf{A\_{4h}} = \frac{\mathbf{C\_{zh}^2}}{\mathbf{C\_{yh}^2}} - 2\rho\_{\rm yzh} \frac{\mathbf{C\_{zh}}}{\mathbf{C\_{yh}}}$$

Similarly, inspired with the technique adopted by Choudhary and Sing [9], one may frame the two-phase stratified random sampling estimator in stratified sampling as

$$\overline{\mathbf{y}}\_{\rm cs}^{\rm h} = \sum\_{\rm h=1}^{L} \mathbf{w}\_{\rm h} \overline{\mathbf{y}}\_{\rm h} \left[ \mathbf{k}\_{\rm h} \left( \frac{\overline{\mathbf{x}}\_{\rm h}'}{\overline{\mathbf{x}}\_{\rm h}} \right) \left( \frac{\overline{\mathbf{Z}}\_{\rm h}'}{\overline{\mathbf{z}}\_{\rm h}} \right) + (1 - \mathbf{k}\_{\rm h}) \left( \frac{\overline{\mathbf{x}}\_{\rm h}'}{\overline{\mathbf{x}}\_{\rm h}} \right) \left( \frac{\overline{\mathbf{Z}}\_{\rm h}'}{\overline{\mathbf{Z}}\_{\rm h}} \right) \right] \tag{6}$$

and E eð Þ¼<sup>i</sup> 0 for (i = 1, 2,…, 6), ei for (i = 1, 2,…, 6) are relative error term. Under above transformations the class of estimator tp may be represented as

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling

βxzh ð Þ e4 þ e4e5 � e4e6

We have the following expectations of the sample statistics of two-phase strati-

xh , E e2 4 � � <sup>¼</sup> f2C<sup>2</sup>

CyhCxh ,

,f2 <sup>¼</sup> <sup>1</sup> n0 h � 1 Nh ,

� �<sup>r</sup>

� f <sup>1</sup>

<sup>ρ</sup>xzhCyhCxh � <sup>f</sup> <sup>2</sup>ρ<sup>2</sup>

ρxzhCyhCxh :

" !

Syzh YhZh

a þ 2khb þ c h i (13)

> xzh � �C<sup>2</sup>

xh

� f2

μ<sup>102</sup> SxzhZh � <sup>μ</sup><sup>003</sup> S2 Zh Zh

(12)

� �<sup>q</sup> zi � Zh

Expanding binomially, using results from Eq. (1) and retaining the terms up to first order of sample size, we have derived the expressions of bias B(.) and mean

> Zh Xh f2 Sxzh XhZh

<sup>h</sup> f 1C<sup>2</sup> yh þ kh 2

xh � 2f <sup>2</sup>ρyzh

In recent time serious drawback is bias of an estimator. Therefore, unbiased versions of the proposed classes of estimators are more desirable. Motivated with

ð Þ <sup>1</sup> <sup>þ</sup> e2 �<sup>1</sup>

zh

9

>>>>>>>>>>>>>>=

>>>>>>>>>>>>>>;

; pð Þ ; q;r≥0

� (10)

(11)

whY 1ð Þ <sup>þ</sup> e1 ð Þ <sup>1</sup> � kh ð Þ <sup>1</sup> <sup>þ</sup> e3 ð Þ <sup>1</sup> <sup>þ</sup> e2 �<sup>1</sup> <sup>h</sup> n o

� �

xh , E e<sup>2</sup> 3 � � <sup>¼</sup> <sup>f</sup> 2C<sup>2</sup>

CyhCxh , E eð Þ¼ 1e3 f <sup>2</sup>ρyxh

, E eð Þ¼ 4e6 f2

, E eð Þ¼ 2e6 f <sup>2</sup>

xi � Xh

whY 1ð Þ � kh bxzh

L h¼1

xh and b ¼ f <sup>2</sup>ρyzh

xzh � �C<sup>2</sup>

6. Bias reduction for the proposed class of estimators

whY2

,<sup>f</sup> <sup>3</sup> <sup>¼</sup> <sup>1</sup> nh � 1 n0 h

� �<sup>p</sup> yi � Yh

, E eð Þ¼ 2e4 E eð Þ¼ 3e4 f <sup>2</sup>ρxzhCxhCzh ,

μ<sup>003</sup> ZhS<sup>2</sup> zh ,

μ<sup>102</sup> XhS<sup>2</sup> zh ,

Zh Xh

tp ¼ ∑ L h¼1

fied sampling as

where

B tp

<sup>c</sup> <sup>¼</sup> f3C<sup>2</sup>

33

� � <sup>¼</sup> E tp � Yh

þ f <sup>3</sup>

M tp

where a <sup>¼</sup> <sup>f</sup> <sup>2</sup>ρ<sup>2</sup>

xh � 2 f <sup>3</sup>ρyxh

E e<sup>2</sup> 1 � � <sup>¼</sup> <sup>f</sup> 1C<sup>2</sup>

þ kh ð Þ� 1 þ e3

DOI: http://dx.doi.org/10.5772/intechopen.82850

yh , E e2 2 � � <sup>¼</sup> f1C<sup>2</sup>

xh

μ<sup>102</sup> ZhSxzh

μ<sup>201</sup> XhSxzh

f1 <sup>¼</sup> <sup>1</sup> nh � 1 Nh

square error M(.) of the class of estimators tp as

L h¼1

<sup>2</sup> � Syxh YhXh

� � <sup>¼</sup> E tp � Yh

xzh � �C<sup>2</sup>

� �<sup>2</sup> <sup>¼</sup> <sup>∑</sup>

CyhCxh <sup>þ</sup> <sup>f</sup> <sup>2</sup>ρ<sup>2</sup>

!#

<sup>μ</sup>pqr <sup>¼</sup> <sup>1</sup> Nh ∑ Nh i¼1

� � <sup>¼</sup> <sup>∑</sup>

S2 xh Xh

CyhCzh :

E eð Þ¼ 1e2 f1ρyxh

E eð Þ¼ 2e3 f2C<sup>2</sup>

E eð Þ¼ 4e5 f <sup>2</sup>

E eð Þ¼ 2e5 f2

E eð Þ¼ 1e4 f <sup>2</sup>ρyzh

where khis constant.

Bias yh cs � � ffi <sup>∑</sup><sup>L</sup> <sup>h</sup>¼<sup>1</sup>whyhA5h

$$\mathbf{A\_{fh}} = (\mathbf{1} - 2\mathbf{k\_h})\mathbf{C\_{yh}} \left[ \left( \frac{\mathbf{1}}{\mathbf{n\_h}} - \frac{\mathbf{1}}{\mathbf{n\_h'}} \right) \rho\_{\rm ynh} \mathbf{C\_{xh}} + \left( \frac{\mathbf{1}}{\mathbf{n\_h'}} - \frac{\mathbf{1}}{\mathbf{N\_h}} \right) \rho\_{\rm ynh} \mathbf{C\_{xh}} \right] + \mathbf{k\_h} \left[ \left( \frac{\mathbf{1}}{\mathbf{n\_h}} - \frac{\mathbf{1}}{\mathbf{N\_h}} \right) \mathbf{C\_{xh}^2} + \left( \frac{\mathbf{1}}{\mathbf{n\_h'}} - \frac{\mathbf{1}}{\mathbf{N\_h}} \right) \mathbf{C\_{xh}^2} \right] \tag{7}$$

$$\text{And MSE} \quad \left(\mathbf{\tilde{y}}\_{\text{cx}}^{\text{h}}\right)\_{\text{min}} = \sum\_{\mathbf{h}=1}^{\text{L}} \mathbf{w}\_{\text{h}}^{2} \mathbf{s}\_{\text{y}\_{\text{h}}}^{2} \times \left[ \left(\frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}} - \frac{\mathbf{1}}{\mathbf{N}\_{\text{h}}}\right) - \frac{\left\{\left(\frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}} - \frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}^{\text{L}}}\right) \rho\_{\text{y} \mathbf{x}\_{\text{h}}} \mathbf{C}\_{\text{x}\_{\text{h}}} - \left(\frac{\mathbf{1}}{\mathbf{n}\_{\text{h}}^{\text{L}}} - \frac{\mathbf{1}}{\mathbf{N}\_{\text{h}}}\right) \rho\_{\text{y} \mathbf{x}\_{\text{h}}} \mathbf{C}\_{\text{x}\_{\text{h}}} \right]^{2} \right] \tag{8}$$

#### 4. Formulation of proposed estimation strategy

Motivated with the earlier work, discussed above, we have constructed a class of product to regression chain type estimators as

$$\mathbf{t}\_{\mathbf{p}} = \sum\_{\mathbf{h}=1}^{L} \mathbf{w}\_{\mathbf{h}} \overline{\mathbf{y}}\_{\mathbf{h}} \left\{ \mathbf{k}\_{\mathbf{h}} \frac{\overline{\mathbf{x}}\_{\mathbf{h}}'}{\overline{\mathbf{x}}\_{\mathbf{h}}} + \left( \mathbf{1} - \mathbf{k}\_{\mathbf{h}} \right) \frac{\overline{\mathbf{x}}\_{\text{id}\_{\mathbf{h}}}'}{\overline{\mathbf{x}}\_{\mathbf{h}}} \right\} \tag{9}$$

where khð Þ h ¼ 1; 2; …; L is a real constant which can be suitably determined by minimizing the M. S. E. of the class of estimator tp and x<sup>0</sup> dh ¼ x<sup>0</sup> þ bxzh n<sup>0</sup> h � � Zh � <sup>z</sup> 0 h � �; where bxzh n<sup>0</sup> h � � is the regression coefficient between the variables x and z at the hth stratum.

### 5. Bias and mean square errors of the proposed class of estimator tp

It can be easily noted that the proposed class of estimators tp defined in Eqs. (8) is chain product and regression type estimator. Therefore, it is biased estimator for population mean Y. So, we obtain biases and mean square errors under large sample approximations using the following transformations:

$$\begin{aligned} \overline{\mathbf{y}}\_{\mathbf{h}} &= \overline{\mathbf{Y}}\_{\mathbf{h}}(\mathbf{1} + \mathbf{e}\_{1}), \overline{\mathbf{x}}\_{\mathbf{h}} = \overline{\mathbf{X}}\_{\mathbf{h}}(\mathbf{1} + \mathbf{e}\_{2}), \overline{\mathbf{x}}\_{\mathbf{h}}^{\prime} = \overline{\mathbf{X}}\_{\mathbf{h}}(\mathbf{1} + \mathbf{e}\_{3}), \overline{\mathbf{z}}\_{\mathbf{h}}^{\prime} = \overline{\mathbf{Z}}\_{\mathbf{h}}(\mathbf{1} + \mathbf{e}\_{4}), \\\mathbf{s}\_{\mathbf{x}\mathbf{z}\_{\mathbf{h}}}^{\prime} &= \mathbf{S}\_{\mathbf{x}\mathbf{z}\_{\mathbf{h}}}(\mathbf{1} + \mathbf{e}\_{5}), \quad \mathbf{s}\_{\mathbf{z}\_{\mathbf{h}}^{\prime}}^{2} = \mathbf{S}\_{\mathbf{z}\_{\mathbf{h}}}^{2}(\mathbf{1} + \mathbf{e}\_{6}) \end{aligned}$$

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling DOI: http://dx.doi.org/10.5772/intechopen.82850

and E eð Þ¼<sup>i</sup> 0 for (i = 1, 2,…, 6), ei for (i = 1, 2,…, 6) are relative error term. Under above transformations the class of estimator tp may be represented as

$$\begin{split} \mathbf{t}\_{\mathbf{p}} &= \sum\_{\mathbf{h}=1}^{L} \mathbf{w}\_{\mathbf{h}} \overline{\mathbf{Y}} (\mathbf{1} + \mathbf{e}\_{1}) \Big[ \left( \mathbf{1} - \mathbf{k}\_{\mathrm{h}} \right) \Big\{ \left( \mathbf{1} + \mathbf{e}\_{3} \right) \left( \mathbf{1} + \mathbf{e}\_{2} \right)^{-1} \right\} \\ &+ \mathbf{k}\_{\mathrm{h}} \Big\{ \left( \mathbf{1} + \mathbf{e}\_{3} \right) - \frac{\overline{\mathbf{Z}}\_{\mathrm{h}}}{\overline{\mathbf{X}}\_{\mathrm{h}}} \boldsymbol{\theta}\_{\mathbf{x} \mathbf{z}\_{\mathbf{h}}} \left( \mathbf{e}\_{4} + \mathbf{e}\_{4} \mathbf{e}\_{5} - \mathbf{e}\_{4} \mathbf{e}\_{6} \right) \Big\} \left( \mathbf{1} + \mathbf{e}\_{2} \right)^{-1} \end{split} \tag{10}$$

We have the following expectations of the sample statistics of two-phase stratified sampling as

E e<sup>2</sup> 1 � � <sup>¼</sup> <sup>f</sup> 1C<sup>2</sup> yh , E e2 2 � � <sup>¼</sup> f1C<sup>2</sup> xh , E e<sup>2</sup> 3 � � <sup>¼</sup> <sup>f</sup> 2C<sup>2</sup> xh , E e2 4 � � <sup>¼</sup> f2C<sup>2</sup> zh E eð Þ¼ 1e2 f1ρyxh CyhCxh , E eð Þ¼ 1e3 f <sup>2</sup>ρyxh CyhCxh , E eð Þ¼ 2e3 f2C<sup>2</sup> xh , E eð Þ¼ 2e4 E eð Þ¼ 3e4 f <sup>2</sup>ρxzhCxhCzh , E eð Þ¼ 4e5 f <sup>2</sup> μ<sup>102</sup> ZhSxzh , E eð Þ¼ 4e6 f2 μ<sup>003</sup> ZhS<sup>2</sup> zh , E eð Þ¼ 2e5 f2 μ<sup>201</sup> XhSxzh , E eð Þ¼ 2e6 f <sup>2</sup> μ<sup>102</sup> XhS<sup>2</sup> zh , E eð Þ¼ 1e4 f <sup>2</sup>ρyzh CyhCzh : 9 >>>>>>>>>>>>>>= >>>>>>>>>>>>>>; (11)

where

where

Statistical Methodologies

pling as

A1h <sup>¼</sup> <sup>C</sup><sup>2</sup>

A3h <sup>¼</sup> <sup>C</sup><sup>2</sup> xh C2 yh

yh cs <sup>¼</sup> <sup>∑</sup><sup>L</sup>

where khis constant.

cs � �

Bias yh cs � � ffi <sup>∑</sup><sup>L</sup>

A5h ¼ ð Þ 1 � 2kh Cyh

And MSE y<sup>h</sup>

where bxzh n<sup>0</sup>

hth stratum.

s0

32

h

� 2ρyxh

<sup>h</sup>¼<sup>1</sup>whyh kh

ρyxhCxh þ

<sup>h</sup>¼<sup>1</sup>whyhA5h

<sup>h</sup>¼<sup>1</sup>w<sup>2</sup> hs 2 yh

product to regression chain type estimators as

4. Formulation of proposed estimation strategy

tp ¼ ∑ L h¼1

minimizing the M. S. E. of the class of estimator tp and x<sup>0</sup>

approximations using the following transformations:

yh ¼ Yhð Þ 1 þ e1 , xh ¼ Xhð Þ 1 þ e2 , x

z0 h <sup>¼</sup> <sup>S</sup><sup>2</sup> zh

xzh <sup>¼</sup> Sxzh ð Þ <sup>1</sup> <sup>þ</sup> e5 , <sup>s</sup><sup>2</sup>

1 nh � 1 n0 h � �

min <sup>¼</sup> <sup>∑</sup><sup>L</sup>

xh � <sup>ρ</sup>yxhCyhCxh and A2h <sup>¼</sup> <sup>C</sup><sup>2</sup>

and A4h <sup>¼</sup> <sup>C</sup><sup>2</sup>

Similarly, inspired with the technique adopted by Choudhary and Sing [9], one may frame the two-phase stratified random sampling estimator in stratified sam-

> z 0 h

!

Cxh Cyh

x0 h xh � � Zh

> 1 n0 h � 1 Nh � �

� �

� <sup>1</sup> nh � 1 Nh � �

whyh kh

2 6 4 zh � ρyzhCyhCzh

Czh Cyh

> h Zh

C2 xh þ

Cxh � <sup>1</sup> n0 h � <sup>1</sup> Nh � �

C2 xh <sup>þ</sup> <sup>1</sup> n0 h � <sup>1</sup> Nh � �

dh ¼ x<sup>0</sup> þ bxzh n<sup>0</sup>

h � � Zh � <sup>z</sup>

n o<sup>2</sup>

� �

1 n0 h � 1 Nh � �

> ρyzh Czh

C2 zh (6)

C2 zh

(7)

(8)

(9)

0 h � �

;

3 7 5

� 2ρyzh

xh x 0 h � � z<sup>0</sup>

1 nh � 1 Nh � �

ρyxh

zh C2 yh

þ ð Þ 1 � kh

þ kh

1 nh � <sup>1</sup> Nh � �

1 nh � <sup>1</sup> n0 h � �

" # � �

ρyzhCzh

�

Motivated with the earlier work, discussed above, we have constructed a class of

x0 h xh

where khð Þ h ¼ 1; 2; …; L is a real constant which can be suitably determined by

5. Bias and mean square errors of the proposed class of estimator tp

It can be easily noted that the proposed class of estimators tp defined in Eqs. (8) is chain product and regression type estimator. Therefore, it is biased estimator for population mean Y. So, we obtain biases and mean square errors under large sample

0

ð Þ 1 þ e6

<sup>h</sup> ¼ Xhð Þ 1 þ e3 , z

0

<sup>h</sup> ¼ Zhð Þ 1 þ e4 ,

þ ð Þ 1 � kh

� �

� � is the regression coefficient between the variables x and z at the

x0 idh xh

$$\mathbf{f}\_{1} = \frac{\mathbf{1}}{\mathbf{n}\_{\mathrm{h}}} - \frac{\mathbf{1}}{\mathbf{N}\_{\mathrm{h}}},\\\mathbf{f}\_{3} = \frac{\mathbf{1}}{\mathbf{n}\_{\mathrm{h}}} - \frac{\mathbf{1}}{\mathbf{n}\_{\mathrm{h}}'},\\\mathbf{f}\_{2} = \frac{\mathbf{1}}{\mathbf{n}\_{\mathrm{h}}'} - \frac{\mathbf{1}}{\mathbf{N}\_{\mathrm{h}}},$$

$$\boldsymbol{\mu}\_{\mathrm{pqr}} = \frac{\mathbf{1}}{\mathbf{N}\_{\mathrm{h}}} \sum\_{i=1}^{\mathrm{N}\_{\mathrm{h}}} \left(\mathbf{x}\_{\mathrm{i}} - \overline{\mathbf{X}}\_{\mathrm{h}}\right)^{\mathrm{p}} \left(\mathbf{y}\_{\mathrm{i}} - \overline{\mathbf{Y}}\_{\mathrm{h}}\right)^{\mathrm{q}} \left(\mathbf{z}\_{\mathrm{i}} - \overline{\mathbf{Z}}\_{\mathrm{h}}\right)^{\mathrm{r}};\\\left(\mathbf{p}, \mathbf{q}, \mathbf{r} \ge \mathbf{0}\right)^{\mathrm{q}}$$

Expanding binomially, using results from Eq. (1) and retaining the terms up to first order of sample size, we have derived the expressions of bias B(.) and mean square error M(.) of the class of estimators tp as

$$\begin{split} \mathbf{B}(\mathbf{t}\_{\rm p}) &= \mathbf{E}(\mathbf{t}\_{\rm p} - \overline{\mathbf{Y}\_{\rm h}}) = \sum\_{\mathbf{h}=1}^{L} \mathbf{w}\_{\rm h} \overline{\mathbf{V}} \left[ (1 - \mathbf{k}\_{\rm h}) \mathbf{b}\_{\rm xx\_{\rm h}} \frac{\overline{\mathbf{Z}\_{\rm h}}}{\overline{\mathbf{X}\_{\rm h}}} \left( \mathbf{f}\_{2} \frac{\mathbf{S}\_{\rm xx\_{\rm h}}}{\overline{\mathbf{X}\_{\rm h}} \overline{\mathbf{Z}\_{\rm h}}} - \mathbf{f}\_{1} \frac{\mathbf{S}\_{\rm yx\_{\rm h}}}{\overline{\mathbf{Y}\_{\rm h}} \overline{\mathbf{Z}\_{\rm h}}} - \mathbf{f}\_{2} \frac{\mu\_{\rm 02}}{\mathbf{S}\_{\rm xx\_{\rm h}} \overline{\mathbf{Z}\_{\rm h}}} - \frac{\mu\_{\rm 03}}{\mathbf{S}\_{\rm x} \overline{\mathbf{Z}\_{\rm h}}} \right) \right] \\ &+ \mathbf{f}\_{3} \left( \frac{\mathbf{S}\_{\rm x}^{2}}{\overline{\mathbf{X}\_{\rm h}}} - \frac{\mathbf{S}\_{\rm yx\_{\rm h}}}{\overline{\mathbf{Y}\_{\rm h} \overline{\mathbf{X}}}} \right) \Bigg] \end{split} \tag{12}$$

$$\mathbf{M(t\_p)} = \mathbf{E}\left[\mathbf{t\_p} - \overline{\mathbf{Y}}\_\mathbf{h}\right]^2 = \sum\_{\mathbf{h=1}}^L \mathbf{w\_h} \overline{\mathbf{Y}}\_\mathbf{h}^2 \left[\mathbf{f\_1} \mathbf{C}\_{\mathbf{y\_h}}^2 + \mathbf{k\_h}^2 \mathbf{a} + 2\mathbf{k\_h}\mathbf{b} + \mathbf{c}\right] \tag{13}$$

where a <sup>¼</sup> <sup>f</sup> <sup>2</sup>ρ<sup>2</sup> xzh � �C<sup>2</sup> xh and b ¼ f <sup>2</sup>ρyzh <sup>ρ</sup>xzhCyhCxh � <sup>f</sup> <sup>2</sup>ρ<sup>2</sup> xzh � �C<sup>2</sup> xh <sup>c</sup> <sup>¼</sup> f3C<sup>2</sup> xh � 2 f <sup>3</sup>ρyxh CyhCxh <sup>þ</sup> <sup>f</sup> <sup>2</sup>ρ<sup>2</sup> xzh � �C<sup>2</sup> xh � 2f <sup>2</sup>ρyzh ρxzhCyhCxh :

#### 6. Bias reduction for the proposed class of estimators

In recent time serious drawback is bias of an estimator. Therefore, unbiased versions of the proposed classes of estimators are more desirable. Motivated with this argument and influenced by the bias correction techniques of Tracy et al. [5] and Bandyopadhyay and Singh [11] we proceed to derive the unbiased version of our proposed class of estimator tp.

Substituting the optimum value of the constant kh in Eq. (19), we have the

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling

<sup>p</sup> as

<sup>2</sup> <sup>f</sup> 1C<sup>2</sup> yh � b2 a þ C

It is important to investigate the performance of the proposed class of estimators

with respect to the existing ones. We use the two natural population and one artificially generated population data set to justify the supremacy of the proposed

The data set of two natural populations has been presented below.

ρyx1 ¼ 0:81381, ρyz1 ¼ 0:9364, ρxz1 ¼ 0:9044

ρyx2 ¼ 0:8883, ρyz2 ¼ 0:9259, ρxz2 ¼ 0:8456

ρyx3 ¼ 0:9295, ρyz3 ¼ 0:9835, ρxz3 ¼ 0:9366

ρyx4 ¼ 0:9787, ρyz4 ¼ 0:9692, ρxz4 ¼ 0:9454

y: Factory output in thousand rupees, x: Number of workers in the factory, and z:

The data consist of 80 observations which are divided into four strata according to the auxiliary variable z as: (i) z ≤ 500, (ii) 500 < z ≤ 1000, (iii) 1000 < z ≤ 2000, and z > 2000 respectively for allocation of sample size to different strata, Propor-

> <sup>1</sup> ¼ 11, n1 ¼ 5,Y1 ¼ 2669:247,X1 ¼ 65:15789 Z1 ¼ 349:6842, Cy1 ¼ 0:28363,Cx1 ¼ 0:17153,Cz1 ¼ 0:31299

> > <sup>2</sup> ¼ 17, n2 ¼ 8,Y2 ¼ 4657:625,X2 ¼ 139:9668

<sup>3</sup> ¼ 8, n3 ¼ 3, Y3 ¼ 6537:214,X3 ¼ 403:2143 Z3 ¼ 1539:571, Cy3 ¼ 0:06365, Cx3 ¼ 0:20117, Cz3 ¼ 0:18004

<sup>4</sup> ¼ 9, n4 ¼ 4, Y4 ¼ 7843:667,X4 ¼ 763:2 Z4 ¼ 2620:533,Cy4 ¼ 0:08232,Cx4 ¼ 0:22464, Cz4 ¼ 0:14156

Z2 ¼ 706:5938,Cy2 ¼ 0:14366,Cx2 ¼ 0:3169,Cz2 ¼ 0:15457

" #

(18)

minimum variance of the class of estimators t0

DOI: http://dx.doi.org/10.5772/intechopen.82850

strategy.

Min:V t<sup>0</sup> p � � <sup>¼</sup> <sup>∑</sup> L h¼1 W<sup>2</sup> hYh

8. Efficiency comparison of the proposed strategy

8.1 Empirical investigations through natural populations

• Population I (Source: Murthy [12], p. 228)

Fixed capital of factory in thousand rupees.

tional allocation is used.

Stratum 1 ð Þ z≤500

N1 ¼ 19, n<sup>0</sup>

Stratum 2 ð Þ 500 , z≤ 1000

N2 ¼ 32, n<sup>0</sup>

Stratum 3 ð Þ 1000 , z≤2000

N3 ¼ 14, n<sup>0</sup>

Stratum 4 ð Þ z . 2000 N4 ¼ 15, n<sup>0</sup>

35

From Eq. (12), we observe that the expression of bias of the estimator tp contains the population parameters such as <sup>μ</sup>003, <sup>μ</sup>102, Syxh , Syzh , <sup>S</sup><sup>2</sup> xh , S<sup>2</sup> yh , Yh,Xh, Syzh and S<sup>2</sup> zh . Since S<sup>2</sup> zh is known while <sup>μ</sup>003, <sup>μ</sup>102, Syxh , Syzh , <sup>S</sup><sup>2</sup> xh , S<sup>2</sup> yh ,Yh,Xh and Syz<sup>h</sup> are unknown, replacing <sup>μ</sup>003, <sup>μ</sup>102, Syxh , Syzh , <sup>S</sup><sup>2</sup> xh , S<sup>2</sup> yh , Yh,Xh, by their respective sample estimator (based on the second phase sample of size m) m003, m102,syzh ,s2 xh ,s<sup>2</sup> yh , yh, xh and syzh , we get an estimator of B(tp) and

$$\mathbf{b}\left(\mathbf{t\_{P}}\right) = \sum\_{\mathbf{h}=1}^{L} \mathbf{w}\_{\mathbf{h}} \overline{\mathbf{y}}\_{\mathbf{h}} \left[ (\mathbf{1} - \mathbf{k}\_{\mathrm{h}}) \mathbf{b}\_{\mathbf{x}\mathbf{z}\_{\mathbf{h}}} \frac{\mathbf{z}\_{\mathbf{h}}}{\overline{\mathbf{x}}\_{\mathbf{h}}} \left( \mathbf{f}\_{2} \frac{\mathbf{s}\_{\mathbf{x}\mathbf{z}\_{\mathbf{h}}}}{\overline{\mathbf{x}}\_{\mathbf{h}} \overline{\mathbf{z}}\_{\mathbf{h}}} - \mathbf{f}\_{1} \frac{\mathbf{s}\_{\mathbf{y}\mathbf{z}\_{\mathbf{h}}}}{\overline{\mathbf{y}}\_{\mathbf{h}} \overline{\mathbf{z}}\_{\mathbf{h}}} - \mathbf{f}\_{2} \frac{\mathbf{m}\_{102}}{\mathbf{s}\_{\mathbf{x}\mathbf{z}\_{\mathbf{h}}} \overline{\mathbf{z}}\_{\mathbf{h}}} \right) + \mathbf{f}\_{3} \left( \frac{\mathbf{s}\_{\mathbf{x}\mathbf{z}\_{\mathbf{h}}}^{2}}{\overline{\mathbf{x}}\_{\mathbf{h}}^{2}} - \frac{\mathbf{s}\_{\mathbf{y}\mathbf{x}\_{\mathbf{h}}}}{\overline{\mathbf{y}}\_{\mathbf{h}} \overline{\mathbf{x}\_{\mathbf{h}}}} \right) \right]. \tag{14}$$

where mpqr <sup>¼</sup> <sup>1</sup> <sup>m</sup> <sup>∑</sup><sup>m</sup> <sup>i</sup>¼<sup>1</sup>ð Þ xhi � xh <sup>p</sup> yhi � yh � �<sup>q</sup> ð Þ zhi � zh <sup>r</sup> :

Motivating with the bias reduction techniques of Tracy et al. [5] and Bandyopadhyay and Singh [11], we have derived the unbiased version of the proposed class of estimators tp to the first order of approximations two-phase stratified sampling.

$$\mathbf{t'\_p} = \mathbf{t\_p} - \mathbf{b(t\_p)}$$

which becomes

t 0 <sup>p</sup> ¼ ∑ L h¼1 wh yh kh x0 h xh þ ð Þ 1 � kh x0 idh xh g � yh ð Þ 1 � kh bxzh zh xh f 2 sxzh xh zh � f <sup>1</sup> syzh yh zh � f2 m102 sxzh zh � m003 s2 Zh zh " ( " ! þ f3 s2 xh xh <sup>2</sup> � syxh yhxh ! ## (15)

Thus, the variance of t<sup>0</sup> <sup>p</sup> to the first order of approximation are obtained as

$$\mathbf{V}\left(\mathbf{t'\_p}\right) = \mathbf{M}\left(\mathbf{t\_p}\right) = \sum\_{\mathbf{h=1}}^{L} \overline{\mathbf{Y}}\_{\mathbf{h}}^2 \left[\mathbf{f}\_1 \mathbf{C}\_{\mathbf{y}\_{\mathbf{h}}}^2 + \mathbf{k}\_{\mathbf{h}}^2 \mathbf{a} + 2\mathbf{k}\_{\mathbf{h}}\mathbf{b} + \mathbf{c}\right] \tag{16}$$

From Eqs. (10) and (15) it is to be noted that the class of estimators t<sup>0</sup> <sup>p</sup> is preferable over the class of estimators tp of two –phase sampling set up as t<sup>0</sup> <sup>p</sup> is unbiased (up to first order of sample size) class of estimator of Yh while the class of estimator tp is biased.

#### 7. Minimum variance of proposed class of estimators

It is obvious from the Eq. (16) that the variances of the proposed class of estimator t<sup>0</sup> <sup>p</sup> depend on the value of the constant kh. Therefore, we desire to minimize their variances and discussed them below. The optimality condition under which proposed class of estimators t<sup>0</sup> <sup>p</sup> have minimum variance is obtained as

$$\mathbf{k}\_{\mathbf{h}} = -\frac{\mathbf{b}}{\mathbf{a}}\tag{17}$$

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling DOI: http://dx.doi.org/10.5772/intechopen.82850

Substituting the optimum value of the constant kh in Eq. (19), we have the minimum variance of the class of estimators t0 <sup>p</sup> as

$$\text{Min.V}\left(\mathbf{t'\_p}\right) = \sum\_{\mathbf{h=1}}^{\mathcal{L}} \mathbf{W}^2{}\_{\mathbf{h}} \overline{\mathbf{Y}}\_{\mathbf{h}}^2 \left[\mathbf{f}\_1 \mathbf{C}\_{\mathbf{y}\_h}^2 - \frac{\mathbf{b}^2}{\mathbf{a}} + \mathbf{C}\right] \tag{18}$$

#### 8. Efficiency comparison of the proposed strategy

It is important to investigate the performance of the proposed class of estimators with respect to the existing ones. We use the two natural population and one artificially generated population data set to justify the supremacy of the proposed strategy.

#### 8.1 Empirical investigations through natural populations

The data set of two natural populations has been presented below.

• Population I (Source: Murthy [12], p. 228)

y: Factory output in thousand rupees, x: Number of workers in the factory, and z: Fixed capital of factory in thousand rupees.

The data consist of 80 observations which are divided into four strata according to the auxiliary variable z as: (i) z ≤ 500, (ii) 500 < z ≤ 1000, (iii) 1000 < z ≤ 2000, and z > 2000 respectively for allocation of sample size to different strata, Proportional allocation is used.

Stratum 1 ð Þ z≤500

this argument and influenced by the bias correction techniques of Tracy et al. [5] and Bandyopadhyay and Singh [11] we proceed to derive the unbiased version of

From Eq. (12), we observe that the expression of bias of the estimator tp contains

xh , S<sup>2</sup> yh

!

" # !

ð Þ zhi � zh <sup>r</sup>

xh , S<sup>2</sup> yh

m102 sxzh z<sup>h</sup>

:

,Yh,Xh and Syz<sup>h</sup>

, Yh,Xh, by their respective sample estimator

� m003 s2 Zh zh

> � f <sup>1</sup> syzh yh zh � f2

a þ 2khb þ c

xh ,s<sup>2</sup> yh

þ f3

, Yh,Xh, Syzh and S<sup>2</sup>

s2 xh x2 h � syxh yhxh

m102 sxzh zh � m003 s2 Zh zh

(15)

(16)

(17)

<sup>p</sup> is

<sup>p</sup> is

are unknown,

, yh, xh and

zh .

:

(14)

our proposed class of estimator tp.

Statistical Methodologies

replacing <sup>μ</sup>003, <sup>μ</sup>102, Syxh , Syzh , <sup>S</sup><sup>2</sup>

syzh , we get an estimator of B(tp) and

whyh ð Þ 1 � kh bxzh

<sup>m</sup> <sup>∑</sup><sup>m</sup>

where mpqr <sup>¼</sup> <sup>1</sup>

which becomes

wh yh kh

s2 xh xh <sup>2</sup> � syxh yhxh ! ##

estimator tp is biased.

estimator t<sup>0</sup>

34

þ f3

x0 h xh

Thus, the variance of t<sup>0</sup>

V t0 p � �

which proposed class of estimators t<sup>0</sup>

þ ð Þ 1 � kh

x0 idh xh

¼ M tp

� � <sup>¼</sup> <sup>∑</sup> L h¼1 Yh <sup>2</sup> f1C<sup>2</sup> yh þ kh 2

7. Minimum variance of proposed class of estimators

Since S<sup>2</sup>

b tp � � <sup>¼</sup> <sup>∑</sup> L h¼1

sampling.

t 0 <sup>p</sup> ¼ ∑ L h¼1

the population parameters such as <sup>μ</sup>003, <sup>μ</sup>102, Syxh , Syzh , <sup>S</sup><sup>2</sup>

zh xh f 2 sxzh xhz<sup>h</sup> � f <sup>1</sup> syzh yhz<sup>h</sup> � f <sup>2</sup>

xh , S<sup>2</sup> yh

(based on the second phase sample of size m) m003, m102,syzh ,s2

<sup>i</sup>¼<sup>1</sup>ð Þ xhi � xh <sup>p</sup> yhi � yh

t 0

� �<sup>q</sup>

<sup>p</sup> ¼ tp � b tp

� �

zh xh f 2 sxzh xh zh

<sup>p</sup> to the first order of approximation are obtained as

h i

<sup>p</sup> have minimum variance is obtained as

" ( " !

Motivating with the bias reduction techniques of Tracy et al. [5] and Bandyopadhyay and Singh [11], we have derived the unbiased version of the proposed class of estimators tp to the first order of approximations two-phase stratified

g � yh ð Þ 1 � kh bxzh

From Eqs. (10) and (15) it is to be noted that the class of estimators t<sup>0</sup>

preferable over the class of estimators tp of two –phase sampling set up as t<sup>0</sup>

unbiased (up to first order of sample size) class of estimator of Yh while the class of

It is obvious from the Eq. (16) that the variances of the proposed class of

<sup>p</sup> depend on the value of the constant kh. Therefore, we desire to minimize their variances and discussed them below. The optimality condition under

> kh ¼ � <sup>b</sup> a

zh is known while <sup>μ</sup>003, <sup>μ</sup>102, Syxh , Syzh , <sup>S</sup><sup>2</sup>

 $\mathbf{N}\_1 = 19, \mathbf{n}\_1' = 11, \mathbf{n}\_1 = 5, \overline{\mathbf{Y}}\_1 = 2669.247, \overline{\mathbf{X}}\_1 = 65.15789$ 
 $\mathbf{\overline{Z}}\_1 = 349.6842, \mathbf{C}\_{\mathbf{y}\_1} = 0.28363, \mathbf{C}\_{\mathbf{x}\_1} = 0.17153, \mathbf{C}\_{\mathbf{z}\_1} = 0.31299$ 
 $\rho\_{\mathbf{yx}\_1} = 0.81381, \rho\_{\mathbf{yz}\_1} = 0.9364, \rho\_{\mathbf{xz}\_1} = 0.9044$ 

Stratum 2 ð Þ 500 , z≤ 1000

\*\*X2 = 32, n'\_2 = 17, n\_2 = 8, \overline{Y}\_2 = 4657.625, \overline{X}\_2 = 139.9668 $ \*\*Z$  = 706.5938, \ C\_{\overline{Y}\_2} = 0.14366, \ C\_{\mathbf{x}\_2} = 0.3169, \ C\_{\mathbf{z}\_2} = 0.15457 $ \*\*Z$  = 0.8883, \ \rho\_{\mathbf{yx}\_2} = 0.9259, \ \rho\_{\mathbf{xz}\_2} = 0.8456

Stratum 3 ð Þ 1000 , z≤2000

N3 ¼ 14, n<sup>0</sup> <sup>3</sup> ¼ 8, n3 ¼ 3, Y3 ¼ 6537:214,X3 ¼ 403:2143 Z3 ¼ 1539:571, Cy3 ¼ 0:06365, Cx3 ¼ 0:20117, Cz3 ¼ 0:18004 ρyx3 ¼ 0:9295, ρyz3 ¼ 0:9835, ρxz3 ¼ 0:9366

Stratum 4 ð Þ z . 2000

\*\*1.\*\*  $\mathbf{N\_4} = \mathbf{15}$ ,  $\mathbf{n\_4} = \mathbf{9}$ ,  $\mathbf{n\_4} = \mathbf{4}$ ,  $\mathbf{Y\_4} = 7843.667$ ,  $\mathbf{X\_4} = 763.2$ 

\*\*Z\_4 = 2620.533,  $\mathbf{C\_{Y\_4}} = 0.08232$ ,  $\mathbf{C\_{x\_4}} = 0.22464$ ,  $\mathbf{C\_{z\_4}} = 0.14156$ 
 $\rho\_{\mathbf{yx\_4}} = 0.9787$ ,  $\rho\_{\mathbf{yz\_4}} = 0.9692$ ,  $\rho\_{\mathbf{xz\_4}} = 0.9454$ 

• Population II (Source: Koyuncu and Kadilar [13]).

y: Number of teachers, x: Number of students both primary and secondary schools, and z: Number of classes both primary and secondary schools. There are 923 districts in 6 regions (as: (i) Marmara, (ii) Agean, (iii) Mediterranean, (iv) Central Anatolia, (v) Black Sea, (vi): East and Southeast Anatolia) in Turkey in 2007 (source: The Turkish Republic Ministry of Education).

generated the following transformed variables of the population U with the values

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling

1k þ

1k

We have split total population of size N = 100 into 5 strata each of size 20

and nh ¼ 8; hð Þ ¼ 1; 2; …; 5 for the efficiency comparison of the proposed strategy. The percentage relative efficiencies the proposed class of estimators t0

From the construction of estimation strategy and efficiency comparison of the

1. Form Table 1, it is clear that the proposed class of estimators is at least 1%

3. It may also be noted from Tables 1 and 2 that the artificially generated

are almost same for different strata) where the natural populations are

2. Similarly from Table 2 it is found that the new estimator is at least 28% better

population is homogeneous (the mean and variance of the respective variables

heterogeneous (the mean and variance of the respective variables are different for different strata) in nature. Our suggested estimators performs with equal

Population I Population II

<sup>p</sup> with respect to different estimators through data set of natural population.

better than the existing one in estimating the population mean.

respect to different estimators (under their respective optimum conditions) are derived through the data set of the artificially generated population are

½ � i:e:; Nh ¼ 20;ð Þ h ¼ 1; 2; …; 5 taking them sequentially and consider n<sup>0</sup>

<sup>z</sup> ¼ 50 and μ<sup>z</sup> ¼ 20 as

y0 1k

k

<sup>h</sup> ¼ 12

0

<sup>p</sup> with respect to

<sup>p</sup> with

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 � ρ<sup>2</sup> yx � � <sup>q</sup>

h i

1k <sup>þ</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 � ρ<sup>2</sup> xz � � <sup>p</sup> <sup>z</sup><sup>0</sup>

h i

<sup>x</sup> <sup>¼</sup> <sup>100</sup>, <sup>μ</sup><sup>x</sup> <sup>¼</sup> <sup>50</sup>, <sup>σ</sup><sup>2</sup>

y1k ¼ μ<sup>y</sup> þ σ<sup>y</sup> ρxyx<sup>0</sup>

zk ¼ μ<sup>z</sup> þ σ<sup>z</sup> ρxzx<sup>0</sup>

x1k ¼ μ<sup>x</sup> þ σxx<sup>0</sup>

y2k ¼ y1k and x2k ¼ x1k :

proposed methodology, following matters are noted.

of σ<sup>2</sup>

obtained as:

9. Conclusion

yh

Table 1.

37

different estimators as PRE <sup>¼</sup> <sup>V</sup>ð Þ<sup>y</sup>

PRE of the proposed estimator t

than the existing one.

efficiency for both the types.

Estimator PRE

� 100.

Min:V t<sup>0</sup> ð Þ<sup>p</sup>

0

yst � � 173.3608 192.951 yð Þ <sup>h</sup> RP 101.1429 131.5654

cs 118.3215 172.226 We use following expression to obtain the percent relative efficiency (PRE) of the proposed estimator t

<sup>y</sup> <sup>¼</sup> <sup>50</sup>, <sup>μ</sup><sup>y</sup> <sup>¼</sup> <sup>10</sup>, <sup>σ</sup><sup>2</sup>

DOI: http://dx.doi.org/10.5772/intechopen.82850

#### Marmara region

$$\begin{aligned} \mathbf{N}\_1 &= 127, \mathbf{n}\_1' = 60, \mathbf{n}\_1 = 31, \overline{\mathbf{Y}}\_1 = 703.74, \overline{\mathbf{X}}\_1 = 20804.59, \\ \overline{\mathbf{Z}}\_1 &= 498.28, \mathbf{C}\_{\mathbf{Y}\_1} = 1.25591, \mathbf{C}\_{\mathbf{x}\_1} = 1.46538, \mathbf{C}\_{\mathbf{z}\_1} = 1.115 \\ \rho\_{\mathbf{yx}\_1} &= 0.936, \rho\_{\mathbf{yz}\_1} = 0.97891, \rho\_{\mathbf{xz}\_1} = 0.93958 \end{aligned}$$

#### Agean region

$$\begin{aligned} \mathbf{N\_2} &= \mathbf{117}, \mathbf{n\_2'} = 40, \mathbf{n\_2} = 21, \mathbf{Y\_2} = 413, \mathbf{X\_2} = 9211.79 \\ \overline{\mathbf{Z\_2}} &= \mathbf{318.83}, \mathbf{C\_{Y\_2}} = \mathbf{1.56155}, \mathbf{C\_{x\_2}} = \mathbf{1.64797}, \mathbf{C\_{z\_2}} = \mathbf{1.14804} \\ \rho\_{\mathbf{yx\_2}} &= \mathbf{0.996}, \rho\_{\mathbf{yz\_2}} = \mathbf{0.97624}, \rho\_{\mathbf{xz\_2}} = \mathbf{0.96958} \end{aligned}$$

#### Mediterranean

N3 ¼ 103, n<sup>0</sup> <sup>3</sup> ¼ 50, n3 ¼ 29, Y3 ¼ 573:17,X3 ¼ 14309:3 Z3 ¼ 431:36,Cy3 ¼ 1:80307,Cx3 ¼ 1:9253,Cz3 ¼ 1:42097 ρyx3 ¼ 0:994, ρyz3 ¼ 0:98351, ρxz3 ¼ 0:97655

#### Central Anatolia region

$$\begin{aligned} \mathbf{N\_4} &= 170, \mathbf{n\_4'} = 75, \mathbf{n\_4} = 38, \mathbf{Y\_4} = 424.66, \mathbf{X\_4} = 9478.85 \\ \overline{\mathbf{Z\_4}} &= 311.32, \mathbf{C\_{Y\_4}} = 1.90878, \mathbf{C\_{x\_4}} = 1.92206, \mathbf{C\_{z\_4}} = 1.47124 \\ \rho\_{\mathbf{yx\_4}} &= 0.983, \rho\_{\mathbf{yz\_4}} = 0.98296, \rho\_{\mathbf{xz\_4}} = 0.96362 \end{aligned}$$

#### Black sea region

$$\begin{aligned} \mathbf{N}\_{\\$} &= 205, \mathbf{n}\_{\\$}^{\prime} = 40, \mathbf{n}\_{\\$} = 25, \overline{\mathbf{Y}}\_{\\$} = 267.03, \overline{\mathbf{X}}\_{\\$} = 5569.95, \overline{\mathbf{Y}}\_{\\$} \\ \mathbf{T}\_{\\$} &= 227.20, \mathbf{C}\_{\mathbf{y}\_{\\$}} = 1.51162, \mathbf{C}\_{\mathbf{x}\_{\\$}} = 1.52564, \mathbf{C}\_{\mathbf{z}\_{\\$}} = 1.14811 \\ \boldsymbol{\rho}\_{\mathbf{yx}\_{\\$}} &= 0.989, \boldsymbol{\rho}\_{\mathbf{yz}\_{\\$}} = 0.96434, \boldsymbol{\rho}\_{\mathbf{xz}\_{\\$}} = 0.96725. \end{aligned}$$

The percentage relative efficiencies (PRE) the proposed class of estimators t<sup>0</sup> p with respect to different estimators under their respective optimum conditions are shown below.

#### 8.2 Empirical investigations through artificially generated population

An important aspect of simulation is that one builds a simulation model to replicate the actual system. Simulation allows comparison of analytical techniques and helps in concluding whether a newly developed technique is better than the existing ones. Motivated by Singh and Deo [14], Singh et al. [15] and Maji et al. [16] who have been adopted the artificial population generation techniques, we have generated five sets of independent random numbers of size N (N = 100) namely x0 1k , y<sup>0</sup> 1k , x<sup>0</sup> 2k , y<sup>0</sup> 2k and z<sup>0</sup> <sup>k</sup>ð Þ k ¼ 1; 2; 3; …; N from a standard normal distribution with the help of R-software. By varying the correlation coefficients ρyx and ρxz, we have

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling DOI: http://dx.doi.org/10.5772/intechopen.82850

generated the following transformed variables of the population U with the values of σ<sup>2</sup> <sup>y</sup> <sup>¼</sup> <sup>50</sup>, <sup>μ</sup><sup>y</sup> <sup>¼</sup> <sup>10</sup>, <sup>σ</sup><sup>2</sup> <sup>x</sup> <sup>¼</sup> <sup>100</sup>, <sup>μ</sup><sup>x</sup> <sup>¼</sup> <sup>50</sup>, <sup>σ</sup><sup>2</sup> <sup>z</sup> ¼ 50 and μ<sup>z</sup> ¼ 20 as

$$\begin{split} \mathbf{y}\_{\mathbf{1}\_{\mathbf{k}}} &= \boldsymbol{\mu}\_{\mathbf{y}} + \sigma\_{\mathbf{y}} \left[ \boldsymbol{\rho}\_{\mathbf{xy}} \mathbf{x}\_{\mathbf{1}\_{\mathbf{k}}}^{\prime} + \left( \sqrt{\mathbf{1} - \boldsymbol{\rho}\_{\mathbf{yx}}^{2}} \right) \mathbf{y}\_{\mathbf{1}\_{\mathbf{k}}}^{\prime} \right] \\ \mathbf{x}\_{\mathbf{1}\_{\mathbf{k}}} &= \boldsymbol{\mu}\_{\mathbf{x}} + \sigma\_{\mathbf{x}} \mathbf{x}\_{\mathbf{1}\_{\mathbf{k}}}^{\prime} \\ \mathbf{z}\_{\mathbf{k}} &= \boldsymbol{\mu}\_{\mathbf{z}} + \sigma\_{\mathbf{z}} \left[ \boldsymbol{\rho}\_{\mathbf{xx}} \mathbf{x}\_{\mathbf{1}\_{\mathbf{k}}}^{\prime} + \left( \sqrt{\mathbf{1} - \boldsymbol{\rho}\_{\mathbf{xx}}^{2}} \right) \mathbf{z}\_{\mathbf{k}}^{\prime} \right] \\ \mathbf{y}\_{\mathbf{2}\_{\mathbf{k}}} &= \mathbf{y}\_{\mathbf{1}\_{\mathbf{k}}} \\ \text{and } \mathbf{x}\_{\mathbf{2}\_{\mathbf{k}}} &= \mathbf{x}\_{\mathbf{1}\_{\mathbf{k}}}. \end{split}$$

We have split total population of size N = 100 into 5 strata each of size 20 ½ � i:e:; Nh ¼ 20;ð Þ h ¼ 1; 2; …; 5 taking them sequentially and consider n<sup>0</sup> <sup>h</sup> ¼ 12 and nh ¼ 8; hð Þ ¼ 1; 2; …; 5 for the efficiency comparison of the proposed strategy.

The percentage relative efficiencies the proposed class of estimators t0 <sup>p</sup> with respect to different estimators (under their respective optimum conditions) are derived through the data set of the artificially generated population are obtained as:

#### 9. Conclusion

• Population II (Source: Koyuncu and Kadilar [13]).

2007 (source: The Turkish Republic Ministry of Education).

Marmara region

Statistical Methodologies

Agean region

Mediterranean

N1 ¼ 127, n<sup>0</sup>

N2 ¼ 117, n<sup>0</sup>

N3 ¼ 103, n<sup>0</sup>

Central Anatolia region

Black sea region

shown below.

x0 1k , y<sup>0</sup> 1k , x<sup>0</sup> 2k , y<sup>0</sup>

36

N4 ¼ 170, n<sup>0</sup>

N5 ¼ 205, n<sup>0</sup>

2k and z<sup>0</sup>

y: Number of teachers, x: Number of students both primary and secondary schools, and z: Number of classes both primary and secondary schools. There are 923 districts in 6 regions (as: (i) Marmara, (ii) Agean, (iii) Mediterranean, (iv) Central Anatolia, (v) Black Sea, (vi): East and Southeast Anatolia) in Turkey in

Z1 ¼ 498:28, Cy1 ¼ 1:25591,Cx1 ¼ 1:46538,Cz1 ¼ 1:115

ρyx1 ¼ 0:936, ρyz1 ¼ 0:97891, ρxz1 ¼ 0:93958

ρyx2 ¼ 0:996, ρyz2 ¼ 0:97624, ρxz2 ¼ 0:96958

ρyx3 ¼ 0:994, ρyz3 ¼ 0:98351, ρxz3 ¼ 0:97655

ρyx4 ¼ 0:983, ρyz4 ¼ 0:98296, ρxz4 ¼ 0:96362

ρyx5 ¼ 0:989, ρyz5 ¼ 0:96434, ρxz5 ¼ 0:96725:

8.2 Empirical investigations through artificially generated population

<sup>1</sup> ¼ 60, n1 ¼ 31,Y1 ¼ 703:74,X1 ¼ 20804:59

<sup>2</sup> ¼ 40, n2 ¼ 21, Y2 ¼ 413,X2 ¼ 9211:79 Z2 ¼ 318:83,Cy2 ¼ 1:56155,Cx2 ¼ 1:64797,Cz2 ¼ 1:14804

<sup>3</sup> ¼ 50, n3 ¼ 29, Y3 ¼ 573:17,X3 ¼ 14309:3

<sup>4</sup> ¼ 75, n4 ¼ 38, Y4 ¼ 424:66,X4 ¼ 9478:85

<sup>5</sup> ¼ 40, n5 ¼ 25,Y5 ¼ 267:03,X5 ¼ 5569:95

<sup>k</sup>ð Þ k ¼ 1; 2; 3; …; N from a standard normal distribution with

p

Z3 ¼ 431:36,Cy3 ¼ 1:80307,Cx3 ¼ 1:9253,Cz3 ¼ 1:42097

Z4 ¼ 311:32,Cy4 ¼ 1:90878, Cx4 ¼ 1:92206,Cz4 ¼ 1:47124

Z5 ¼ 227:20, Cy5 ¼ 1:51162, Cx5 ¼ 1:52564,Cz5 ¼ 1:14811

The percentage relative efficiencies (PRE) the proposed class of estimators t<sup>0</sup>

with respect to different estimators under their respective optimum conditions are

An important aspect of simulation is that one builds a simulation model to replicate the actual system. Simulation allows comparison of analytical techniques and helps in concluding whether a newly developed technique is better than the existing ones. Motivated by Singh and Deo [14], Singh et al. [15] and Maji et al. [16] who have been adopted the artificial population generation techniques, we have generated five sets of independent random numbers of size N (N = 100) namely

the help of R-software. By varying the correlation coefficients ρyx and ρxz, we have

From the construction of estimation strategy and efficiency comparison of the proposed methodology, following matters are noted.



We use following expression to obtain the percent relative efficiency (PRE) of the proposed estimator t 0 <sup>p</sup> with respect to different estimators as PRE <sup>¼</sup> <sup>V</sup>ð Þ<sup>y</sup> Min:V t<sup>0</sup> ð Þ<sup>p</sup> � 100.

#### Table 1.

PRE of the proposed estimator t 0 <sup>p</sup> with respect to different estimators through data set of natural population. different estimators as PRE <sup>¼</sup> <sup>V</sup>ð Þ<sup>y</sup>

Min:V t<sup>0</sup> ð Þ<sup>p</sup>

� 100.


References

[1] Sukhatme B. Some ratio type estimators in two-phase sampling. Journal of the American Statistics Associations. 1962;57:628-632

[2] Chand L. Some ratio type estimators based on two or more auxiliary variables [unpublished PhD thesis]. Ames, Iowa (USA): Iowa State University; 1975

DOI: http://dx.doi.org/10.5772/intechopen.82850

[10] Parichha P, Basu K, Bandyopadhyay A, Mukhopadhyay P. Development of efficient estimation technique for population mean in two phase sampling using fuzzy tools. Journal of Applied Mathematics, Statistics and Informatics. 2017;13(2):5-28. DOI: 10.1515/jamsi-

[11] Bandyopadhyay A, Singh GN. Predictive estimation of population mean in two-phase sampling.

Communications in Statistics: Theory and Methods. 2016;45(14):4249-4267. DOI: 10.1080/03610926.2014.919396

[12] Murthy MN. Sampling Theory and Methods. Calcutta: Statistical Publishing

[13] Koyuncu N, Kadilar C. Family of estimators of population mean using two auxiliary variables in stratified sampling. Communications in Statistics:

Theory and Methods. 2009;38:

Papers. 2003;4:555-579

[14] Singh S, Deo B. Imputation by power transformation. Statistical

[15] Singh S, Joarder AH, Tracy DS. Median estimation using double sampling. Australian & New Zealand Journal of Statistics. 2001;43(1):33-46

[16] Maji R, Singh GN, Bandyopadhyay A. Estimation of population mean in presence of random non-response in

Communications in Statistics: Theory and Methods, ISSN: 0361-0926. 2018. DOI: 10.1080/03610926.2018.1478101

two-stage cluster sampling.

2017-0006

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling

Society; 1967

2398-2417

[3] Kiregyera B. A chain ratio type estimators in finite population double sampling using two auxiliary variables.

[4] Kiregyera B. Regression type

estimators using two auxiliary variables and the model of double sampling from finite populations. Metrika. 1984;31:

[5] Tracy DS, Singh HP, Singh R. An alternative to the ratio-cum-product estimator in sample surveys. Journal of Statistical Planning and Inference. 1996;

[6] Singh HP, Espejo MR. Double sampling ratio-product estimator of a finite population mean in sampling surveys. Journal of Applied Statistics.

[7] Gupta S, Shabbir J. on the use of transformed auxiliary variables in estimating population mean by using two auxiliary variables. Journal of Statistical Planning and Inference. 2007;

[8] Shukla D, Pathak S, Thakur NS. Estimation of population mean using two auxiliary sources in sample surveys. Statistics in Transition. 2012;13(1):21-36

[9] Choudhury S, Singh BK. A class of chain ratio–product type estimators with two auxiliary variables under double sampling scheme. Journal of the Korean Statistical Society. 2012;41:

Metrika. 1980;17:217-223

215-226

53:375-387

2007;34(1):71-85

137:1606-1611

247-256

39

Table 2. PRE of the proposed estimator t 0 <sup>p</sup> with respect to different estimators through data set of artificially generated population.

4.The unbiased version of the proposed technique has been obtained which make the proposed class of estimators much more practicable.

Thus, it is found that the proposed estimation technique has addressed the problems of estimation through two-phase stratified sampling which may truthful for real life application where population is especially heterogeneous in nature and stratification is essential. Due to the benefits achieved by the new estimator, the survey statistician may be suggested to use it.

#### Author details

Partha Parichha<sup>1</sup> , Kajla Basu<sup>2</sup> and Arnab Bandyopadhyay<sup>1</sup> \*


\*Address all correspondence to: arnabbandyopadhyay4@gmail.com

<sup>© 2019</sup> The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Development of Estimation Procedure of Population Mean in Two-Phase Stratified Sampling DOI: http://dx.doi.org/10.5772/intechopen.82850

### References

4.The unbiased version of the proposed technique has been obtained which make

Artificially generated population

<sup>p</sup> with respect to different estimators through data set of artificially generated

0

<sup>p</sup> with respect to

Thus, it is found that the proposed estimation technique has addressed the problems of estimation through two-phase stratified sampling which may truthful for real life application where population is especially heterogeneous in nature and stratification is essential. Due to the benefits achieved by the new estimator, the

, Kajla Basu<sup>2</sup> and Arnab Bandyopadhyay<sup>1</sup>

1 Department of Mathematics, Asansol Engineering College, Asansol, India

\*Address all correspondence to: arnabbandyopadhyay4@gmail.com

provided the original work is properly cited.

2 Department of Mathematics, National Institute of Technology, Durgapur, India

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

\*

the proposed class of estimators much more practicable.

Estimator PRE

 179.623 yð Þ <sup>h</sup> RP 128.256

cs 154.879

� 100.

Min:V t<sup>0</sup> ð Þ<sup>p</sup>

0

We use following expression to obtain the percent relative efficiency (PRE) of the proposed estimator t

survey statistician may be suggested to use it.

Author details

yst

Statistical Methodologies

yh

Table 2.

population.

different estimators as PRE <sup>¼</sup> <sup>V</sup>ð Þ<sup>y</sup>

PRE of the proposed estimator t

Partha Parichha<sup>1</sup>

38

[1] Sukhatme B. Some ratio type estimators in two-phase sampling. Journal of the American Statistics Associations. 1962;57:628-632

[2] Chand L. Some ratio type estimators based on two or more auxiliary variables [unpublished PhD thesis]. Ames, Iowa (USA): Iowa State University; 1975

[3] Kiregyera B. A chain ratio type estimators in finite population double sampling using two auxiliary variables. Metrika. 1980;17:217-223

[4] Kiregyera B. Regression type estimators using two auxiliary variables and the model of double sampling from finite populations. Metrika. 1984;31: 215-226

[5] Tracy DS, Singh HP, Singh R. An alternative to the ratio-cum-product estimator in sample surveys. Journal of Statistical Planning and Inference. 1996; 53:375-387

[6] Singh HP, Espejo MR. Double sampling ratio-product estimator of a finite population mean in sampling surveys. Journal of Applied Statistics. 2007;34(1):71-85

[7] Gupta S, Shabbir J. on the use of transformed auxiliary variables in estimating population mean by using two auxiliary variables. Journal of Statistical Planning and Inference. 2007; 137:1606-1611

[8] Shukla D, Pathak S, Thakur NS. Estimation of population mean using two auxiliary sources in sample surveys. Statistics in Transition. 2012;13(1):21-36

[9] Choudhury S, Singh BK. A class of chain ratio–product type estimators with two auxiliary variables under double sampling scheme. Journal of the Korean Statistical Society. 2012;41: 247-256

[10] Parichha P, Basu K, Bandyopadhyay A, Mukhopadhyay P. Development of efficient estimation technique for population mean in two phase sampling using fuzzy tools. Journal of Applied Mathematics, Statistics and Informatics. 2017;13(2):5-28. DOI: 10.1515/jamsi-2017-0006

[11] Bandyopadhyay A, Singh GN. Predictive estimation of population mean in two-phase sampling. Communications in Statistics: Theory and Methods. 2016;45(14):4249-4267. DOI: 10.1080/03610926.2014.919396

[12] Murthy MN. Sampling Theory and Methods. Calcutta: Statistical Publishing Society; 1967

[13] Koyuncu N, Kadilar C. Family of estimators of population mean using two auxiliary variables in stratified sampling. Communications in Statistics: Theory and Methods. 2009;38: 2398-2417

[14] Singh S, Deo B. Imputation by power transformation. Statistical Papers. 2003;4:555-579

[15] Singh S, Joarder AH, Tracy DS. Median estimation using double sampling. Australian & New Zealand Journal of Statistics. 2001;43(1):33-46

[16] Maji R, Singh GN, Bandyopadhyay A. Estimation of population mean in presence of random non-response in two-stage cluster sampling. Communications in Statistics: Theory and Methods, ISSN: 0361-0926. 2018. DOI: 10.1080/03610926.2018.1478101

Chapter 4

Abstract

1. Introduction

41

Analysis

and S.G. Kolesnikov

that the proposed method is promising.

the greatest interest for the SAO extraction.

Methods of Russian Patent

Dmitriy Korobkin, Sergey Vasiliev, Sergey Fomenkov

The article presents a method for extracting predicate-argument constructions characterizing the composition of the structural elements of the inventions and the relationships between them. The extracted structures are converted into a domain ontology and used in prior art patent search and information support of automated invention. The analysis of existing natural language processing (NLP) tools in relation to the processing of Russian-language patents has been carried out. A new method for extracting structured data from patents has been proposed taking into account the specificity of the text of patents and is based on the shallow parsing and segmentation of sentences. The value of the F1 metric for a rigorous estimate of data extraction is 63% and for a lax estimate is 79%. The results obtained suggest

Keywords: patents, information extraction, SAO, ontology, prior art patent search

From year to year, the number of patents and patent applications is increasing. The escalating applications flow, and more than 20 million world set of granted patents (from 1980 to 2015) increase the time that patent examiners have to spend to examine all incoming applications. Also, automation development of inventions has been gaining momentum, and computer-aided invention (CAI) systems are used to search for new technical solutions. The success of such systems largely depends on the completeness of the ontologies of the subject areas and the fullness of the various knowledge bases that allow generating new technical solutions. The task of prior art patent search and information support for new technical solution synthesis can be seen as the task of extracting the subject-action-object (SAO) semantic structures. Patent claims are considered to be a direct source of data for retrieval. They express the essence of the invention, and therefore it is of

In the paper [1], authors proposed a methodology to solve the problem of prior art patent search, consisting of a statistical and semantic analysis of patent documents, machine translation of patent application, and calculation of semantic similarity between application and patents on the base of subject-action-object (SAO) triples. On the step of the semantic analysis, the authors applied a new method for building a semantic representation of SAO on the base of meaning-text theory [2]. On the step of semantic similarity calculation, the authors compare the SAOs from

### Chapter 4
