**5.2 Variables and cost driver analysis**

278 Current Issues of Water Management

Fig. 4. Size structure of water supply utilities in Germany (ATT et al., 2009, p. 14)

To achieve robust modelling results, we follow a three step approach. First, we cluster the observations with regard to their size. By that, we implicitly assume that small companies have a different production technology than larger ones. Secondly, we perform some theoretical and empirical analyses on the potential variables to develop reasonable inputoutput combinations for our latter modelling. This is then performed in the third subsection

Rödl & Partner, the consultancy which performs the benchmarking for several German *Bundesländer* and which provided the data for calculating the efficiencies, has been clustering all participating companies according to the annual accounted water. In workshops with water supply companies they agreed to form three groups. The first cluster comprises 38 companies with a water delivery of 500,000 m³ annually, the second one comprises 97 companies with water delivery between 500,000 m³ and 2,500,000 m³ and for the last one, all remaining companies with annual water delivery up to 50,000,000 m³ (61

Such a differentiation, according to the size of companies, is extremely important. Our models will later reveal that the production functions of the three different groups vary. Thus, a data set should always contain enough observations in order to be able to form

**5. Methodological approach** 

on section 5.

companies).

groups.

**5.1 Clustering** 

We assume that the objective of a water service provider is not solely to produce drinking water. The objective rather is to provide the option for clients to use as much drinking water as they wish at any time. This implies that the set-up of the network with transportation and distribution pipes, tanks, pumps, valves and service areas should also be considered as outputs, which at least in the short-run cannot be influenced by the company.

Correlation analysis provides an initial determination of the statistical relationship between costs on the one hand and potential explanatory variables reflecting the specific frameworks faced by each water supplier on the other hand. Such correlation analyses are the basis for estimating costs as a function of multiple drivers, i.e., regressors, as they help specifying the efficiency-analyzing models later on. In this step it is made sure that the exogenous variables, like outputs and cost drivers, explain the endogenous variable, costs, sufficiently.12

Analyses revealed, particularly for the bigger companies, that the five variables of group one in Table 1 are highly correlated, both with operational distribution costs as well as with one another. Both the technical common understanding and the analysis of the empirical literature stress the explanatory power of these variables.13 It thus makes sense to always have at least one of these variables in the DEA- or SFA-functions in a cost or production function model. Variables of group two to four were tested for additional explanatory value.

Walter et al. (2010, p. 228) refer to a number of studies which display the significance of "water losses" as an explanatory variable. For countries like Brazil, Spain or Peru this might certainly be of importance due to high variations in the quality of the network. For a country like Germany however, where the level of water losses is only about 6.5 % (ATT et al., 2011, p. 56) on average,14 water losses cannot serve as a good proxy for the quality of the network or associated operational costs, respectively.

The two variables "downturn of demand since 1992" and "downturn of demand since 1998" are surely interesting for explaining the development of prices. Many companies, which face a significant decrease of demand due to various reasons, need to increase prices if they lack the appropriate tariff models. Too often only a minor share of the total fixed costs is actually covered by earnings, which are independent from actual demand. However, for a cost benchmarking – particularly the operational distributional costs - these variables are insignificant.

Whereas all variables of the fourth group were not taken into account any longer, the variables of the third group were tested in DEA- and SFA-functions, where a certain combination of variables made sense from a technical water perspective. Particularly, the client structure ("Household supply relative to accounted water") is quite often used to explain differences in both operational distribution costs as well as total costs. We, however,

 <sup>12</sup> Tests for heteroskedasticity (Breusch-Pagan/Cook-Weisberg test) and multicollinearity (Variable Inflation Factor, VIF) have been applied to fulfill general conditions of multivariate regression analysis and specifically Ordinary Least Squares conditions.

<sup>13</sup> Besides the literature discussed in Walter et al. (2010) also see Lin (2005), Picazo-Tadeo et al. (2009) and Coelli & Walding (2006). All of them, however, only apply either DEA or SFA. Due to rather bad data quality they were also not able to analyze other than total costs. 14 For more detailed data on German water losses see IGES (2010, p.30).

Analysis of the Current German Benchmarking

Identified outliers are removed from the sample.

Cobb-Douglas, Translog, and log-linear models.

ln (Distribution pipes to accounted water (excl. re-distribution))

ln (Distribution pipes per household connection)

**5.3 Methods** 

**5.4 Results** 

costs

Approach and Its Extension with Efficiency Analysis Techniques 281

Based on the definition of relevant cost drivers we apply a parametric and a non-parametric benchmarking approach, namely SFA and DEA (compare Section 3). Because DEA is sensitive towards extreme values, an outlier analysis is applied in addition. Therein, firms that are most efficient in many of the observations are iteratively taken out of the sample and, hence, the efficiency analysis. The process stops when the average value of efficiency of all transmission system operators, including the potential outliers, is statistically indifferent (at 95% confidence) to the average value of efficiency excluding the potential outliers. A t-test (according to Satterthwaite) is used to compare the expected values.

Multiple specifications of SFA models are estimated to compare specifications given by similar correlation coefficients in earlier phases of the analysis. To conclude on an improved goodness of fit of one specification against the other, Akaike's and Schwarz's information criteria are used as well as a comparison of the log-likehood values. Given insignificant parameters, a likelihood ratio test is performed. Also, we test different functional forms:

ln (Distribution pipes) 1,377\*\*\* 0,119 Table 2. SFA-Model Large Companies (2.5-50 Mill. m³ per year) for operational distribution

The results of the DEA- and SFA-analysis have shown that a combination of the variables Distribution pipes, Distribution pipes per household connection and Distribution pipes to accounted water (excl. re-distribution) suits particularly well for an efficiency evaluation of operational distribution costs in the group of the largest water suppliers (2.5 mio. m³ up to 50 mill. m³ annual supply): All three indicators are significant with a minimum confidence level of 99%. Besides, the combination of those three variables explains about 70% of operational distribution costs in this group of firms (R² = 0,706). The English water regulation authority OFWAT, in comparison, uses models sometimes with less than 30% explanatory power. Last, but not least, the sign on the coefficients for Distribution pipes and Distribution pipes per household connection is positive, as expected. Increasing absolute, as well as relative grid length, independently increases costs. Only the regressor Distribution pipes to accounted water (excl. re-distribution) is expected to have a negative influence on costs as it increases with population per km². Because of simultaneous modeling of Distribution pipes per household and Distribution pipes to accounted water (excl. redistribution) the result can additionally be interpreted in the way that costs of a one unit

**Variable Coefficient Standard Deviation** 

0,861\*\*\* 0,133

1,27\*\*\* 0,229

The best model for the largest companies is displayed in the following table.

increase in grid length overcompensate the grid density advantages.

encountered that this criterion did not have any significant explanatory power. The reason for it might be that the companies within our three groups are actually quite homogenous, which again stresses our hypothesis that companies need to be analyzed according to groups. Our findings might have been different if we would have followed the same path as other researchers which have not had such a detailed database, both quality and quantity wise, and therefore were not able to cluster their observations.


Table 1. Variables for explaining operational distribution costs
