2.12 Modeling spatially correlated error

from the seven initial models were then added in order of decreasing adjusted R<sup>2</sup>

Adjusted R<sup>2</sup> for the eight initial linear regression models. All regressions include an intercept plus the variables

Because the effect of FWI from one river on a given location in PS could change based on the FWI from another river during the same time period, we evaluated the addition of the 6 pair-wise interactions among the four 1wkFWII\_rit, the 6 pair-wise interactions among the four 2moFWII\_rit, and the twelve interactions between the 1wkFWII\_rit and the 2moFWII\_rit, excluding interactions of one river's 1wkFWII\_rit with its own 2moFWII\_rit. Despite a decrease in error degrees of freedom by 24,

Spatial coordinate variables were evaluated last in groups according to their polynomial order, with squared and cubic terms added before interactions. We considered these variables last because we wanted to include them only if they explained additional variability in the response after more interpretable variables

it, and interactions northingit <sup>∗</sup> eastingit, northing<sup>2</sup>

To build the time model, we followed the same procedure described above, selecting for the base of the mean function a set of time period indicator variables because a linear regression of salit on these variables had an adjusted R2 of 0.41 (Table 1). (Note that such a model is equivalent to fitting an ANOVA model using the time periods as groups.) Again, we added other sets of explanatory variables in

timeper\_τ<sup>t</sup> f g , τ ¼ 1, … , 39 , 2moFWII\_rit f g ,r ¼ 1, … , 4 , closest\_inlet\_distit, and

timeper\_τ<sup>t</sup> f g , τ ¼ 1, … , 39 and the 2moFWII\_rit f g ,r ¼ 1, … , 4 were added, the

. The final process model mean func-

; eastingit, easting<sup>2</sup>

. Before evaluating interactions, the mean trend time

it,

it ∗ eastingit, and

Following this procedure, the mean trend model grew to contain 10 variables — 2moFWII\_rit f g ,r ¼ 1, … , 4 , closest\_inlet\_distit, 1wkFWII\_rit f g ,r ¼ 1, … , 4 , and

Explanatory variable or set of explanatory variables Adj R<sup>2</sup> closest\_inlet\_distit 0.34 categoryt 0.049 inverse\_days\_surveyt 0.035 num\_stormst 0.029 montht 0.015 1wkFWII\_rit,r ¼ 1, … , 4 0.27 2moFWII\_rit,r ¼ 1, … , 4 0.38 timeper\_τt, τ ¼ 1, … , 39 0.41

Lagoon Environments Around the World - A Scientific Perspective

—with adjusted R<sup>2</sup> 0.57.

were included. We determined that including all variables except

it increased the adjusted R<sup>2</sup>

<sup>1</sup>wkFWII\_rit <sup>∗</sup> <sup>2</sup>moFWII\_qit,<sup>r</sup> 6¼ <sup>q</sup> ; inverse\_days\_surveyt

model had an adjusted R<sup>2</sup> of 0.78 and contained 48 variables:

1wkFWII\_rit f g ,r ¼ 1, … , 4 . When interactions among the

tion thus had an adjusted R<sup>2</sup> of 0.73 and included the following:

2moFWII\_rit f g ,r ¼ 1, … , 4 ; closest\_inlet\_distit; 1wkFWII\_rit f g ,r ¼ 1, … , 4 ; <sup>1</sup>wkFWII\_rit <sup>∗</sup> <sup>1</sup>wkFWII\_qit,<sup>r</sup> 6¼ <sup>q</sup> ; 2moFWII\_rit <sup>∗</sup> <sup>2</sup>moFWII\_qit,<sup>r</sup> 6¼ <sup>q</sup> ;

adjusted R2 was 0.66, so the set was retained.

inverse\_days\_surveyt

Table 1.

listed.

northing<sup>2</sup>

it ∗ easting<sup>2</sup>

it.

order of decreasing adjusted R<sup>2</sup>

northingit, northing<sup>2</sup>

northingit ∗ easting<sup>2</sup>

2.11 Time model

168

.

The variable selection analyses above used ordinary least squares (OLS) regression to model salinity as a function of explanatory variables. That model can be written as

$$\text{scal}\_{\text{it}} = \boldsymbol{\beta}\_0 + \boldsymbol{\beta}\_1 \mathbf{x}\_{\text{1it}} + \boldsymbol{\beta}\_2 \mathbf{x}\_{2\text{it}} + \dots \\ \boldsymbol{\beta}\_P \mathbf{x}\_{\text{Pi}} + \boldsymbol{\varepsilon}\_{\text{it}}, \ t = 1, \dots, 40, \ i = 1, \dots n\_t \tag{2}$$

where xpit represents the value of the pth explanatory variable at space–time location it, for p ¼ 1, … P, where P is the total number of explanatory variables. β0, β1, … , β<sup>P</sup> represent the intercept and regression coefficients, and deviations from the mean trend εit are assumed to be independent and identically distributed <sup>ε</sup>it � <sup>N</sup> 0, <sup>σ</sup><sup>2</sup> ð Þ with mean 0 and variance <sup>σ</sup>2. The model can be equivalently written as salit <sup>¼</sup> <sup>x</sup><sup>T</sup> itβ þ ειτ, , where xit is the ð Þ� P þ 1 1 vector containing the values of the explanatory variables at space-time location it, and β represents the ð Þ� P þ 1 1vector of regression coefficients. The same model written in matrix form is

$$\mathbf{Y} = \mathbf{X}\boldsymbol{\mathfrak{f}} + \mathbf{e}, \ \mathbf{e} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}),\tag{3}$$

where bold print indicates vectors so that Y, ε, and 0 are N � 1 vectors containing, respectively, all observations of salinity in the space–time domain, all deviations from the mean function, and all zeros. X is the N � ð Þ P þ 1 design matrix whose rows represent space-time locations and whose columns contain the values of the explanatory variables (with a column of ones for the intercept), and I is the N � N identity matrix. Since a histogram of salinity observations is somewhat symmetric and bell-shaped, use of the normal distribution is justified.

Rarely, however, does the assumption of independent and identically distributed errors hold for observations of natural phenomena associated with locations in space and time. While it is intuitive that values of salinity located close together in space should be similar, it is also generally the case that the deviations from the mean function of observations located close together are similar. That similarity is referred to as spatial covariance, and the spatial covariance between deviations from the mean trend at two locations within the same time period can be modeled as a function of the distance separating them. Including in the overall model both a deterministic mean function and a spatial covariance function allowed predictions of salinity at locations where there were no observations.

Σ ¼

Process-Based Statistical Models Predict Dynamic Estuarine Salinity

DOI: http://dx.doi.org/10.5772/intechopen.89911

parameters and allowing a mean function to be fit. Diagonal elements

covariance between sites i and j in time period t.

Yo Yu � �

� N

cross-covariance between observed and unobserved locations. Thus,

Σou

0 Σou

t � �

<sup>Y</sup><sup>u</sup> <sup>Y</sup><sup>o</sup> <sup>j</sup> , <sup>β</sup>, <sup>ψ</sup> � N X<sup>u</sup><sup>β</sup> <sup>þ</sup> <sup>Σ</sup>uo <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup> <sup>Y</sup><sup>o</sup> � <sup>X</sup><sup>o</sup> ð Þ<sup>β</sup> , <sup>Σ</sup><sup>u</sup> � <sup>Σ</sup>uo <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup>

Let ψ represent the vector that contains all spatial covariance parameters for every time period—either 80 or 81 parameters depending on whether a nugget effect is used. Standard normal distribution theory gives the distribution of unobserved salinity Y<sup>u</sup> conditioned on knowing the values of observations Y<sup>o</sup> and

<sup>Σ</sup>ou <sup>¼</sup>

and <sup>Σ</sup>uo <sup>¼</sup> <sup>Σ</sup>ou ð Þ<sup>T</sup>. The elements <sup>Σ</sup>ou

covariance function.

171

all of the parameter values:

salinity, given by

Σ<sup>1</sup> 000 0 Σ<sup>2</sup> 0 0 0 0 ⋱ 0 000 Σ<sup>T</sup>

where zero matrices for off-diagonal elements indicate that deviations in one time period are not correlated with those in another. We make this assumption partially due to the long time span separating June and September, but also because no SAS® procedure has the capacity to model such space-time correlation while at the same time allowing every time period to have different spatial covariance

Σ1, Σ2, … , Σ<sup>t</sup> … , Σ<sup>T</sup> are individual spatial covariance matrices for each time period with dimensions nt � nt, and elements <sup>Σ</sup><sup>t</sup> ½ �ij <sup>¼</sup> Cov <sup>ε</sup>it, <sup>ε</sup>jt � � representing the spatial

Understanding how predictions of salinity and prediction standard errors are generated from this model will make the results and analysis in Sections 6 and 7 easier to understand. To predict salinity at space–time locations where it is not observed, the following results are needed. Superscripts differentiate between locations where salinity is observed and unobserved. Model (4), represents observations of salinity (by virtue of the dimensions of the vectors and matrices), but we model salinity observations and unobserved values of salinity at other space-time locations using a similar model, the joint distribution of unobserved and observed

> Xo β X<sup>u</sup>β

Here, <sup>Y</sup><sup>o</sup> represents the <sup>N</sup> � 1 vector of salinity observations, and, letting <sup>N</sup><sup>u</sup> represent the number of space-time locations at which we want to predict salinity, <sup>Y</sup><sup>u</sup> represents the <sup>N</sup><sup>u</sup> � 1 vector of unknown values of salinity at these locations. All the symbols in (5) have the same meaning as in (4), except for the distinction between observed and unobserved locations. The <sup>N</sup> � <sup>N</sup><sup>u</sup> matrix <sup>Σ</sup>ou contains the

<sup>1</sup> 000

0 0 ⋱ 0 0 00 Σou

ij <sup>¼</sup> Cov <sup>ε</sup><sup>o</sup>

<sup>2</sup> 0 0

T

it, ε<sup>u</sup>

<sup>Σ</sup>ou n o: (6)

jt n o also come from the spatial

� �, <sup>Σ</sup><sup>o</sup> <sup>Σ</sup>ou

Σuo Σ<sup>u</sup>

� � � � : (5)

Figure 3. Sample covariogram for June 1994 calculated from process model residuals (blue dots). The solid red line illustrates a spherical covariance function fit to the covariogram. Covariance is in units of salinity squared.

Valid covariance functions ensure that the covariance matrix will be positive definite, which, in turn, ensures that variances will be non-negative. Each covariance function has a shape defined by a range parameter, a partial sill, and sometimes a nugget effect. Appendix Table A1 gives formulas for determining spatial covariance according to the exponential, Gaussian, and spherical covariance functions, each with and without a nugget effect. Figure 3 shows an example of the spherical covariance function—the solid red line—fit to a sample covariogram—the blue dots—of deviations from the process model for June 1994. The range parameter—θ for the exponential and Gaussian covariance functions, ρ for spherical—is related to the distance that must separate two sites before their deviations are independent, where independence corresponds to a covariance of zero or virtually zero. In Figure 3, the range is approximately 10 km. In the absence of a nugget effect, the partial sill σ<sup>2</sup> is the value of the covariance at distance zero—that is, it is the variance of deviations from the mean—and in Figure 3 this value is approximately 2.5 squared units of salinity. In the presence of the nugget σ<sup>2</sup> <sup>n</sup>, there is a discontinuity in the covariance function at distance zero, so that the intercept is slightly greater than the limit of the smooth part of the function as distance approaches zero. In this case, the variance of the deviations is equal to the sum of the partial sill and nugget: <sup>σ</sup><sup>2</sup> <sup>þ</sup> <sup>σ</sup><sup>2</sup> <sup>n</sup>. It may be the case that variance is higher when values of deviations are higher. Since covariance parameters represent physical quantities that may change over time, we used the capabilities of SAS® Proc Mixed to allow a different partial sill and range parameter for each time period.

Model (3), modified to include spatial correlation, becomes

$$\mathbf{Y} = \mathbf{X}\boldsymbol{\mathfrak{f}} + \mathbf{e}, \ \mathbf{e} \sim N(\mathbf{0}, \boldsymbol{\Sigma}), \tag{4}$$

where Σ represents the N � N block-diagonal covariance matrix

Process-Based Statistical Models Predict Dynamic Estuarine Salinity DOI: http://dx.doi.org/10.5772/intechopen.89911

$$
\boldsymbol{\Sigma} = \begin{bmatrix}
\boldsymbol{\Sigma}\_1 & \mathbf{0} & \mathbf{0} & \mathbf{0} \\
\mathbf{0} & \boldsymbol{\Sigma}\_2 & \mathbf{0} & \mathbf{0} \\
\mathbf{0} & \mathbf{0} & \ddots & \mathbf{0} \\
\mathbf{0} & \mathbf{0} & \mathbf{0} & \boldsymbol{\Sigma}\_T
\end{bmatrix},
$$

,

where zero matrices for off-diagonal elements indicate that deviations in one time period are not correlated with those in another. We make this assumption partially due to the long time span separating June and September, but also because no SAS® procedure has the capacity to model such space-time correlation while at the same time allowing every time period to have different spatial covariance parameters and allowing a mean function to be fit. Diagonal elements Σ1, Σ2, … , Σ<sup>t</sup> … , Σ<sup>T</sup> are individual spatial covariance matrices for each time period with dimensions nt � nt, and elements <sup>Σ</sup><sup>t</sup> ½ �ij <sup>¼</sup> Cov <sup>ε</sup>it, <sup>ε</sup>jt � � representing the spatial covariance between sites i and j in time period t.

Understanding how predictions of salinity and prediction standard errors are generated from this model will make the results and analysis in Sections 6 and 7 easier to understand. To predict salinity at space–time locations where it is not observed, the following results are needed. Superscripts differentiate between locations where salinity is observed and unobserved. Model (4), represents observations of salinity (by virtue of the dimensions of the vectors and matrices), but we model salinity observations and unobserved values of salinity at other space-time locations using a similar model, the joint distribution of unobserved and observed salinity, given by

$$
\begin{pmatrix} \mathbf{Y}^{\flat} \\ \mathbf{Y}^{u} \end{pmatrix} \sim N \left\{ \begin{pmatrix} X^{o} \mathfrak{f} \\ X^{u} \mathfrak{f} \end{pmatrix}, \begin{pmatrix} \Sigma^{o} & \Sigma^{u} \\ \Sigma^{uo} & \Sigma^{u} \end{pmatrix} \right\}. \tag{5}
$$

Here, <sup>Y</sup><sup>o</sup> represents the <sup>N</sup> � 1 vector of salinity observations, and, letting <sup>N</sup><sup>u</sup> represent the number of space-time locations at which we want to predict salinity, <sup>Y</sup><sup>u</sup> represents the <sup>N</sup><sup>u</sup> � 1 vector of unknown values of salinity at these locations. All the symbols in (5) have the same meaning as in (4), except for the distinction between observed and unobserved locations. The <sup>N</sup> � <sup>N</sup><sup>u</sup> matrix <sup>Σ</sup>ou contains the cross-covariance between observed and unobserved locations. Thus,

$$
\boldsymbol{\Sigma}^{\boldsymbol{\alpha}} = \begin{bmatrix}
\boldsymbol{\Sigma}\_1^{\boldsymbol{\alpha}} & \mathbf{0} & \mathbf{0} & \mathbf{0} \\
\mathbf{0} & \boldsymbol{\Sigma}\_2^{\boldsymbol{\alpha}} & \mathbf{0} & \mathbf{0} \\
\mathbf{0} & \mathbf{0} & \ddots & \mathbf{0} \\
\mathbf{0} & \mathbf{0} & \mathbf{0} & \boldsymbol{\Sigma}\_T^{\boldsymbol{\alpha}}
\end{bmatrix},
$$

and <sup>Σ</sup>uo <sup>¼</sup> <sup>Σ</sup>ou ð Þ<sup>T</sup>. The elements <sup>Σ</sup>ou t � � ij <sup>¼</sup> Cov <sup>ε</sup><sup>o</sup> it, ε<sup>u</sup> jt n o also come from the spatial covariance function.

Let ψ represent the vector that contains all spatial covariance parameters for every time period—either 80 or 81 parameters depending on whether a nugget effect is used. Standard normal distribution theory gives the distribution of unobserved salinity Y<sup>u</sup> conditioned on knowing the values of observations Y<sup>o</sup> and all of the parameter values:

$$\mathbf{Y}^{\mu}|\mathbf{Y}^{o},\boldsymbol{\theta},\boldsymbol{\Psi} \sim N\left\{ \mathbf{X}^{\mu}\boldsymbol{\mathfrak{\boldsymbol{\theta}}} + \boldsymbol{\Sigma}^{\mu\boldsymbol{o}}(\boldsymbol{\Sigma}^{\boldsymbol{o}})^{-1}(\mathbf{Y}^{o} - \mathbf{X}^{o}\boldsymbol{\mathfrak{\boldsymbol{\theta}}}), \ \boldsymbol{\Sigma}^{\boldsymbol{u}} - \boldsymbol{\Sigma}^{\boldsymbol{u}o}(\boldsymbol{\Sigma}^{\boldsymbol{o}})^{-1}\boldsymbol{\Sigma}^{\boldsymbol{o}u} \right\}.\tag{6}$$

Valid covariance functions ensure that the covariance matrix will be positive definite, which, in turn, ensures that variances will be non-negative. Each covariance function has a shape defined by a range parameter, a partial sill, and sometimes a nugget effect. Appendix Table A1 gives formulas for determining spatial covariance according to the exponential, Gaussian, and spherical covariance functions, each with and without a nugget effect. Figure 3 shows an example of the spherical covariance function—the solid red line—fit to a sample covariogram—the blue dots—of deviations from the process model for June 1994. The range parameter—θ for the exponential and Gaussian covariance functions, ρ for spherical—is related to the distance that must separate two sites before their deviations are independent, where independence corresponds to a covariance of zero or virtually zero. In Figure 3, the range is approximately 10 km. In the absence of a nugget effect, the partial sill σ<sup>2</sup> is the value of the covariance at distance zero—that is, it is the variance of deviations from the mean—and in Figure 3 this value is approxi-

Sample covariogram for June 1994 calculated from process model residuals (blue dots). The solid red line illustrates a spherical covariance function fit to the covariogram. Covariance is in units of salinity squared.

Lagoon Environments Around the World - A Scientific Perspective

mately 2.5 squared units of salinity. In the presence of the nugget σ<sup>2</sup>

the partial sill and nugget: <sup>σ</sup><sup>2</sup> <sup>þ</sup> <sup>σ</sup><sup>2</sup>

Figure 3.

170

discontinuity in the covariance function at distance zero, so that the intercept is slightly greater than the limit of the smooth part of the function as distance approaches zero. In this case, the variance of the deviations is equal to the sum of

values of deviations are higher. Since covariance parameters represent physical quantities that may change over time, we used the capabilities of SAS® Proc Mixed

to allow a different partial sill and range parameter for each time period. Model (3), modified to include spatial correlation, becomes

where Σ represents the N � N block-diagonal covariance matrix

<sup>n</sup>, there is a

<sup>n</sup>. It may be the case that variance is higher when

Y ¼ Xβ þ ε, ε � Nð Þ 0, Σ , (4)

The pipe symbol (|) means "given" or "conditioned on knowing the values of" the terms following the pipe symbol. The terms before the comma represent the mean of the multivariate normal distribution, which is used for the salinity prediction, and the terms after the comma represent the variance-covariance matrix, which is used for prediction standard errors. Salinity predictions are the sum of the mean trend, Xuβ, and the spatial interpolation of observation deviations from the mean trend, <sup>Σ</sup>uo <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup> <sup>Y</sup><sup>o</sup> � <sup>X</sup><sup>o</sup> ð Þ<sup>β</sup> . If the deviations <sup>Y</sup><sup>o</sup> � <sup>X</sup><sup>o</sup> ð Þ<sup>β</sup> are large for a given time period, then the partial sill σ<sup>2</sup> <sup>t</sup> will be large for that time period, so that diagonal elements of Σ<sup>o</sup> and Σ<sup>u</sup> will be large. For a given location, the prediction standard error is the diagonal element of the matrix <sup>Σ</sup><sup>u</sup> � <sup>Σ</sup>uo <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup> Σou. If the diagonal elements of <sup>Σ</sup><sup>o</sup> and <sup>Σ</sup><sup>u</sup> are large, then the diagonal elements of <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup> are small, and the prediction standard error is a large number minus a small number. That is, the prediction standard error will be high for time periods in which observation deviations from the mean function are large. When observation deviations from the mean trend are small, the reverse is true, and prediction standard errors tend to be low for that time period.

The salinity predictor

$$(X^{\mu}\mathfrak{P} + \Sigma^{\mu o}(\Sigma^{o})^{-1}(Y^{o} - X^{o}\mathfrak{P}) \tag{7}$$

2.13 Examining freshwater influx scenarios

highlighted as it was chosen as the best model of PS salinity for our modeling context.

Model type 2 log

DOI: http://dx.doi.org/10.5772/intechopen.89911

Exponential Infinite

Exponential + σ<sup>2</sup>

Gaussian + σ<sup>2</sup>

Spherical + σ<sup>2</sup>

Exponential + σ<sup>2</sup>

Gaussian + σ<sup>2</sup>

Spherical + σ<sup>2</sup>

at the α = 0.05 level of significance: H01: σ<sup>2</sup>

The symbol "σ<sup>2</sup>

the base dataset.

Table 2.

173

likelihood

Process-Based Statistical Models Predict Dynamic Estuarine Salinity

AIC BIC RMSE

Exponential 7430.7 7584.7 7714.7 2.0 0.95 1.03 0.89

Gaussian 8198.0 8356.0 8489.5 2.3 0.94 1.37 0.84

Spherical 7570.0 7722.0 7850.4 2.0 0.95 1.07 0.88

Gaussian 6281.0 6433.0 6561.3 2.2 0.90\* 1.98\* 0.86

Spherical 6199.6 6315.6 6479.9 2.2 0.91\* 1.86\* 0.86

<sup>n</sup>" indicates that a nugget was included. Stars (\*) indicate rejection of the appropriate null hypothesis

Process IID 9935.9 9937.9 9943.5 2.9 0.98 0.84 0.74

Time IID 7077.5 7079.5 7084.9 2.6 0.83\* 3.47\* 0.83

Process and time mean functions with no spatial covariance (IID) and with each of six covariance functions were used.

Summary statistics comparing salinity observations in the test dataset to predictions based on fitting models to

(psu)

<sup>n</sup>\* 7424.0 7580.0 7711.7 2.0 0.96 0.96 0.89

<sup>n</sup>\* 7532.0 7686.0 7816.0 2.1 0.94 1.15 0.87

<sup>n</sup>\* 7571.6 7727.6 7859.3 2.0 0.96 0.93 0.89

<sup>n</sup> 6217.1 6367.1 6493.7 2.1 0.92\* 1.55\* 0.87

<sup>n</sup>\* 6214.0 6366.0 6494.4 2.2 0.91\* 1.90\* 0.86

<sup>n</sup> 6201.3 6357.3 6489.1 2.2 0.91\* 1.86\* 0.86

<sup>n</sup> = 0; H02: β<sup>1</sup> = 1; H03: β<sup>0</sup> = 0. The exponential plus nugget process model is

Slope/β<sup>1</sup> Intercept/β<sup>0</sup> R<sup>2</sup>

To examine variations in the spatial distribution of salinity under drought, average, and flood conditions, we classified freshwater influx from each river within each time period (1wkFWI\_rt and 2moFWI\_rt) as LOW if it fell below the 25th percentile of observed FWI across all time periods, MODERATE if it fell between the 25th and 75th percentiles, HIGH if it fell between the 75th and 95th percentiles, and FLOOD if it fell above the 95th percentile. Next, we classified oneweek and two-month FWI for the entire time period as LOW (or HIGH) if at least two rivers exhibited low (or high) inflow, MODERATE if at least three rivers exhibited moderate inflow, and FLOOD if at least one river exhibited extremely high (>95th percentile) inflow. These classifications are mutually exclusive, though some of the 40 time periods did not fall into any of them. The first two columns of Table 3 list the 16 combinations of classifications, and the third column shows the classification and salinity rank for each time period. Time periods were ranked 1–40

by mean predicted salinity (1 = highest mean salinity; 40 = lowest).

Moderate-to-moderate FWI. June 2005 (Figure 4) experienced moderate FWI in

both the 2 months and 1 week prior to the survey in PS with predicted salinity ranked 37th—the lowest of the moderate-to-moderate time periods. Legend colors for model predictions in the left pane and observations in the upper right pane of Figure 4 (as well as Figure 5A and B, 6A and B) are based on percentiles of the distribution of observed salinity across all time periods: minimum to 5%; 5–10%; 10–25%; 25–50%; 50–75%; 75–90%; 90–95%; and 95% to maximum. From the left pane of Figure 4, predicted salinity in June 2005 increased moving east across PS,

is an exact predictor: the prediction of salinity at a site where there is an observation will exactly equal the observation. For this reason, to determine which spatial covariance function to use, we randomly selected 10% of the observations to withhold as a cross-validation dataset, the test dataset; the remaining 90% we term the base dataset. For every combination of the two mean functions—process and time and the six spatial covariance functions in Appendix Table A1, we fit model (4) to the base dataset, and predicted salinity values at the space–time locations of the test dataset using the results given in (5) and (6). When the model predicted salinity to be less than zero, we set the prediction equal to zero before calculating the following statistics. Predictions of negative values could be avoided using a truncated normal distribution, but SAS® Proc Mixed does not permit specification of this distribution. The root mean squared error (RMSE) of predictions—with the same units as salinity—are given in Table 2, along with the slope, intercept, and coefficient of determination (R<sup>2</sup> ) from a regression of actual salinity values in the test dataset on predictions of them. If predictions were perfect, this regression would have slope equal to one, intercept equal to zero, and R<sup>2</sup> equal to 1.

Salinity predictions are better when a spatial covariance function is combined with either mean function. For example, of the time models, the exponential covariance function with a nugget produced predictions with the lowest RMSE (2.1), slope closest to one (0.92), and intercept closest to zero (1.55). Comparing process models, the exponential and spherical, each with and without a nugget, performed equally well, and better than the time models. To select the best model from this group of four, we examined statistics based on how well the model fit the base dataset. The model with an exponential covariance function with a nugget had the lowest AIC (7580.0) and BIC (7711.7) and was thus chosen as the final model. It explained 89% of variability in the test dataset and generated predictions with RMSE 2.0.

Next, we fit this model using the full dataset, and produced retrospective maps of salinity predictions and standard errors at evenly spaced 1 nmi (1.85 km) increments for each time period. Forty-two salinity predictions—less than 0.1% of the total number of predictions—were negative and set to zero.


Process-Based Statistical Models Predict Dynamic Estuarine Salinity DOI: http://dx.doi.org/10.5772/intechopen.89911

Process and time mean functions with no spatial covariance (IID) and with each of six covariance functions were used. The symbol "σ<sup>2</sup> <sup>n</sup>" indicates that a nugget was included. Stars (\*) indicate rejection of the appropriate null hypothesis at the α = 0.05 level of significance: H01: σ<sup>2</sup> <sup>n</sup> = 0; H02: β<sup>1</sup> = 1; H03: β<sup>0</sup> = 0. The exponential plus nugget process model is highlighted as it was chosen as the best model of PS salinity for our modeling context.

#### Table 2.

The pipe symbol (|) means "given" or "conditioned on knowing the values of" the terms following the pipe symbol. The terms before the comma represent the mean of the multivariate normal distribution, which is used for the salinity prediction, and the terms after the comma represent the variance-covariance matrix, which is used for prediction standard errors. Salinity predictions are the sum of the mean trend, Xuβ, and the spatial interpolation of observation deviations from the mean trend, <sup>Σ</sup>uo <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup> <sup>Y</sup><sup>o</sup> � <sup>X</sup><sup>o</sup> ð Þ<sup>β</sup> . If the deviations <sup>Y</sup><sup>o</sup> � <sup>X</sup><sup>o</sup> ð Þ<sup>β</sup> are large for a given

elements of Σ<sup>o</sup> and Σ<sup>u</sup> will be large. For a given location, the prediction standard

ments of <sup>Σ</sup><sup>o</sup> and <sup>Σ</sup><sup>u</sup> are large, then the diagonal elements of <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup> are small, and the prediction standard error is a large number minus a small number. That is, the prediction standard error will be high for time periods in which observation deviations from the mean function are large. When observation deviations from the mean trend are small, the reverse is true, and prediction standard errors tend to be

is an exact predictor: the prediction of salinity at a site where there is an observation will exactly equal the observation. For this reason, to determine which spatial covariance function to use, we randomly selected 10% of the observations to withhold as a cross-validation dataset, the test dataset; the remaining 90% we term the base dataset. For every combination of the two mean functions—process and time and the six spatial covariance functions in Appendix Table A1, we fit model (4) to the base dataset, and predicted salinity values at the space–time locations of the test dataset using the results given in (5) and (6). When the model predicted salinity to be less than zero, we set the prediction equal to zero before calculating the following statistics. Predictions of negative values could be avoided using a truncated normal distribution, but SAS® Proc Mixed does not permit specification of this distribution. The root mean squared error (RMSE) of predictions—with the same units as salinity—are given in Table 2, along with the slope, intercept, and coefficient of deter-

) from a regression of actual salinity values in the test dataset on

predictions of them. If predictions were perfect, this regression would have slope

with either mean function. For example, of the time models, the exponential covariance function with a nugget produced predictions with the lowest RMSE (2.1), slope closest to one (0.92), and intercept closest to zero (1.55). Comparing process models, the exponential and spherical, each with and without a nugget, performed equally well, and better than the time models. To select the best model from this group of four, we examined statistics based on how well the model fit the base dataset. The model with an exponential covariance function with a nugget had the lowest AIC (7580.0) and BIC (7711.7) and was thus chosen as the final model. It explained 89% of variability in the test dataset and generated predictions with

Salinity predictions are better when a spatial covariance function is combined

Next, we fit this model using the full dataset, and produced retrospective maps of salinity predictions and standard errors at evenly spaced 1 nmi (1.85 km) increments for each time period. Forty-two salinity predictions—less than 0.1% of the

equal to one, intercept equal to zero, and R<sup>2</sup> equal to 1.

total number of predictions—were negative and set to zero.

error is the diagonal element of the matrix <sup>Σ</sup><sup>u</sup> � <sup>Σ</sup>uo <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup>

Lagoon Environments Around the World - A Scientific Perspective

<sup>t</sup> will be large for that time period, so that diagonal

<sup>X</sup><sup>u</sup><sup>β</sup> <sup>þ</sup> <sup>Σ</sup>uo <sup>Σ</sup><sup>o</sup> ð Þ�<sup>1</sup> <sup>Y</sup><sup>o</sup> � <sup>X</sup><sup>o</sup> ð Þ<sup>β</sup> (7)

Σou. If the diagonal ele-

time period, then the partial sill σ<sup>2</sup>

low for that time period. The salinity predictor

mination (R<sup>2</sup>

RMSE 2.0.

172

Summary statistics comparing salinity observations in the test dataset to predictions based on fitting models to the base dataset.

## 2.13 Examining freshwater influx scenarios

To examine variations in the spatial distribution of salinity under drought, average, and flood conditions, we classified freshwater influx from each river within each time period (1wkFWI\_rt and 2moFWI\_rt) as LOW if it fell below the 25th percentile of observed FWI across all time periods, MODERATE if it fell between the 25th and 75th percentiles, HIGH if it fell between the 75th and 95th percentiles, and FLOOD if it fell above the 95th percentile. Next, we classified oneweek and two-month FWI for the entire time period as LOW (or HIGH) if at least two rivers exhibited low (or high) inflow, MODERATE if at least three rivers exhibited moderate inflow, and FLOOD if at least one river exhibited extremely high (>95th percentile) inflow. These classifications are mutually exclusive, though some of the 40 time periods did not fall into any of them. The first two columns of Table 3 list the 16 combinations of classifications, and the third column shows the classification and salinity rank for each time period. Time periods were ranked 1–40 by mean predicted salinity (1 = highest mean salinity; 40 = lowest).

Moderate-to-moderate FWI. June 2005 (Figure 4) experienced moderate FWI in both the 2 months and 1 week prior to the survey in PS with predicted salinity ranked 37th—the lowest of the moderate-to-moderate time periods. Legend colors for model predictions in the left pane and observations in the upper right pane of Figure 4 (as well as Figure 5A and B, 6A and B) are based on percentiles of the distribution of observed salinity across all time periods: minimum to 5%; 5–10%; 10–25%; 25–50%; 50–75%; 75–90%; 90–95%; and 95% to maximum. From the left pane of Figure 4, predicted salinity in June 2005 increased moving east across PS,


classified. Boldfaced time periods are examined in Section 6. Stars (\*) indicate time periods in which hurricanes occurred within the 61 days prior to the survey.

#### Table 3.

Sixteen combinations of 2mo\_ and 1wk\_FWIrt classifications; time periods that exhibit each set of conditions; and mean predicted salinity rank (1 = highest).

salinity predictions. In June 2005, as in all other time periods, predictions were generated only for locations within S, which does not extend either to Albemarle

Salinity model predictions (left), prediction standard errors (bottom right), and P195 survey observations (top

be due to (1) the fact that in the fourth river, the Roanoke, 1wkFWI\_rt and 2moFWI\_rt in June 1999 were twice their values in June 2002, or (2) that by June 2002, NC had been experiencing drought conditions for 4 years (186 weeks) as opposed to less than one (30 weeks) and that this cumulative FWI deficit became

ity, the majority of prediction standard errors are less than 1.01. In June 1999, however, SEs fell between 1.01 and 1.81 at all prediction locations except those that were very close to observations. This result shows that the conditions affecting salinity in PS were better represented by the mean function in June 2002 than they

Flood to flood FWI—with and without hurricanes. FWI was extremely high in September 1999 (Figure 5A) as a result of the 500-year floods produced by Hurricanes Dennis and Floyd that occurred 24 and 12 days before the survey, respectively. In June 2003 (Figure 5B), extremely high FWI was due to an eight-month period of above-average precipitation totals prior to the survey. Though these are

more pronounced over time.

were in June 1999.

175

Figure 4.

Low to low FWI in early and late-stage drought. June 1999 (Figure 5A) and June 2002 (Figure 5B)—which mark early and late stages of North Carolina's 1998–2002 drought [42]—experienced low long- and short-term FWI with predicted salinity ranking 12th and 4th, respectively. At every point in PS, predicted salinity in these two time periods was higher than in June 2005, and predicted salinity was much higher in June 2002 than June 1999, though both have similar values for 1wkFWI\_rt and 2moFWI\_rt variables from three of the four tributary rivers. The difference may

Though June 2002 salinity observations have a larger mean and greater variabil-

Sound or to the heads of the Neuse and Pamlico Rivers (Figure 4).

right) for June 2005, classified as moderate-to-moderate FWI.

Process-Based Statistical Models Predict Dynamic Estuarine Salinity

DOI: http://dx.doi.org/10.5772/intechopen.89911

reaching a maximum just south of Oregon Inlet. We note the same east-west salinity gradient when comparing this pane to the June 2005 map of observed salinities (top right pane), indicating that prediction maps typically mirror trends seen in observation maps. The area of highest predicted salinity corresponds to a lone purple observation of 26.5 just south of Oregon Inlet (Figure 4). Plumes of relatively higher salinity are evident in the vicinity of all three ocean inlets (Figure 4).

The lower right pane of Figure 4 (as well as Figure 5A and B, 6A and B) displays prediction standard errors (SE) with the same units as salinity. The same eight percentile groups classify colors on the SE legend, here based on the distribution of prediction standard errors across all time periods. The transition from low SE at sample sites to higher SE moving away from sample sites reflects the fact that the exact predictor (6) reproduces observations, so confidence intervals closer to sample sites are narrower than those further away.

This spatial trend in SEs is further illustrated by comparing locations of high SE in the same time period, which are also consistent over time. High SEs occur between the mouths of the Neuse and Pamlico Rivers and along a margin of varying width following the outline of the Outer Banks, areas within which sampling does not occur (Figure 1). We note here that because SEs increase as distance from sample site increases, we chose to generate only interpolated (and not extrapolated) Process-Based Statistical Models Predict Dynamic Estuarine Salinity DOI: http://dx.doi.org/10.5772/intechopen.89911

#### Figure 4.

reaching a maximum just south of Oregon Inlet. We note the same east-west salinity gradient when comparing this pane to the June 2005 map of observed salinities (top right pane), indicating that prediction maps typically mirror trends seen in observation maps. The area of highest predicted salinity corresponds to a lone purple observation of 26.5 just south of Oregon Inlet (Figure 4). Plumes of relatively higher salinity are evident in the vicinity of all three ocean inlets

Only time periods that fit each scenario as defined in Section 6 are listed; the remaining 7 time periods were not classified. Boldfaced time periods are examined in Section 6. Stars (\*) indicate time periods in which hurricanes

Sixteen combinations of 2mo\_ and 1wk\_FWIrt classifications; time periods that exhibit each set of conditions;

2mo\_FWIrt 1wk\_FWIrt Time periods and mean predicted salinity rank (mmyy, r)

Flood Flood (0603, 40), (0999\*, 30) High none Moderate (0687, 28), (0689, 27) Low None

Lagoon Environments Around the World - A Scientific Perspective

High Flood (0903\*, 39), (0690, 29)

Moderate Flood (0996\*, 33)

Low Flood (0987, 18)

occurred within the 61 days prior to the survey.

and mean predicted salinity rank (1 = highest).

High (0904\*, 32)

Low None

Moderate (0698, 38), (0693, 36), (0697, 35)

High (0696, 26), (0900, 24)

Low (0694, 17)

High (0695, 6) Moderate (0905, 20)

Moderate (0605, 37), (0989, 31), (0601, 26), (0600, 22), (0604, 21), (0688, 16),

Low (0997, 15), (0699, 12), (0901, 8), (0902, 7), (0993, 5), (0602, 4), (0988, 3), (0994, 1)

(0990, 13), (0692, 10)

The lower right pane of Figure 4 (as well as Figure 5A and B, 6A and B) displays prediction standard errors (SE) with the same units as salinity. The same eight percentile groups classify colors on the SE legend, here based on the distribution of prediction standard errors across all time periods. The transition from low SE at sample sites to higher SE moving away from sample sites reflects the fact that the exact predictor (6) reproduces observations, so confidence intervals closer to sam-

This spatial trend in SEs is further illustrated by comparing locations of high SE

in the same time period, which are also consistent over time. High SEs occur between the mouths of the Neuse and Pamlico Rivers and along a margin of varying width following the outline of the Outer Banks, areas within which sampling does not occur (Figure 1). We note here that because SEs increase as distance from sample site increases, we chose to generate only interpolated (and not extrapolated)

ple sites are narrower than those further away.

(Figure 4).

174

Table 3.

Salinity model predictions (left), prediction standard errors (bottom right), and P195 survey observations (top right) for June 2005, classified as moderate-to-moderate FWI.

salinity predictions. In June 2005, as in all other time periods, predictions were generated only for locations within S, which does not extend either to Albemarle Sound or to the heads of the Neuse and Pamlico Rivers (Figure 4).

Low to low FWI in early and late-stage drought. June 1999 (Figure 5A) and June 2002 (Figure 5B)—which mark early and late stages of North Carolina's 1998–2002 drought [42]—experienced low long- and short-term FWI with predicted salinity ranking 12th and 4th, respectively. At every point in PS, predicted salinity in these two time periods was higher than in June 2005, and predicted salinity was much higher in June 2002 than June 1999, though both have similar values for 1wkFWI\_rt and 2moFWI\_rt variables from three of the four tributary rivers. The difference may be due to (1) the fact that in the fourth river, the Roanoke, 1wkFWI\_rt and 2moFWI\_rt in June 1999 were twice their values in June 2002, or (2) that by June 2002, NC had been experiencing drought conditions for 4 years (186 weeks) as opposed to less than one (30 weeks) and that this cumulative FWI deficit became more pronounced over time.

Though June 2002 salinity observations have a larger mean and greater variability, the majority of prediction standard errors are less than 1.01. In June 1999, however, SEs fell between 1.01 and 1.81 at all prediction locations except those that were very close to observations. This result shows that the conditions affecting salinity in PS were better represented by the mean function in June 2002 than they were in June 1999.

Flood to flood FWI—with and without hurricanes. FWI was extremely high in September 1999 (Figure 5A) as a result of the 500-year floods produced by Hurricanes Dennis and Floyd that occurred 24 and 12 days before the survey, respectively. In June 2003 (Figure 5B), extremely high FWI was due to an eight-month period of above-average precipitation totals prior to the survey. Though these are

#### Figure 5.

Salinity model predictions (left), prediction standard errors (bottom right), and P195 survey observations (top right) for June 1999 (A) and June 2002 (B), both classified as low-to-low FWI.

3. Discussion

Figure 6.

177

Because water exchange between lagoonal estuaries and the open ocean can be relatively restricted, there is a relatively high potential in systems like PS for changes in precipitation patterns and storm frequencies associated with global climate change to result in changes in salinity patterns and subsequent ecosystem alterations. Changes in precipitation will affect the amount and timing of river flow, which will impact nutrient cycling, estuarine flushing rates, and salinity. Increased storm activity may open new inlets, which would alter current flow, increase tidal action, and allow a greater influx of seawater that carries with it both different chemical signals and mobile species. Salinity is therefore a practical estuarine characteristic to use to study the impacts of these changes, as both effects mentioned above include enhanced water exchange that impacts overall estuarine salinity content [43, 44].

Salinity model predictions (left), prediction standard errors (bottom right), and P195 survey observations

(top right) for September 1999 (A) and June 2003 (B), both classified as flood-to-flood FWI.

Process-Based Statistical Models Predict Dynamic Estuarine Salinity

DOI: http://dx.doi.org/10.5772/intechopen.89911

the only two time periods categorized as flood-to-flood, predicted salinity in September 1999 ranks a surprisingly high 30th, while in June 2003 it ranks 40th. Observed and predicted salinity for these two time periods are lower than those in the low-FWI time periods of June 1999 and 2002, but in September 1999, salinity was higher at most prediction locations, and more variable, than in moderate-FWI June 2005. Water at locations near the two southerly inlets to PS was more saline in September 1999 than in these same locations during moderate-FWI of June 2005 likely due to storm surge-generated inlet plumes. Salinity at locations near the Neuse and Tar-Pamlico Rivers was similar to that in June 2005. Standard errors were lower sound-wide in June 2003 than in September 1999. SEs in September 1999 were highest sound-wide relative to the other four time periods examined (Figures 5 and 6).

Process-Based Statistical Models Predict Dynamic Estuarine Salinity DOI: http://dx.doi.org/10.5772/intechopen.89911

Figure 6.

the only two time periods categorized as flood-to-flood, predicted salinity in September 1999 ranks a surprisingly high 30th, while in June 2003 it ranks 40th. Observed and predicted salinity for these two time periods are lower than those in the low-FWI time periods of June 1999 and 2002, but in September 1999, salinity was higher at most prediction locations, and more variable, than in moderate-FWI June 2005. Water at locations near the two southerly inlets to PS was more saline in September 1999 than in these same locations during moderate-FWI of June 2005 likely due to storm surge-generated inlet plumes. Salinity at locations near the Neuse and Tar-Pamlico Rivers was similar to that in June 2005. Standard errors were lower sound-wide in June 2003 than in September 1999. SEs in September 1999 were highest sound-wide relative to the other four time periods examined

Salinity model predictions (left), prediction standard errors (bottom right), and P195 survey observations

(top right) for June 1999 (A) and June 2002 (B), both classified as low-to-low FWI.

Lagoon Environments Around the World - A Scientific Perspective

(Figures 5 and 6).

176

Figure 5.

Salinity model predictions (left), prediction standard errors (bottom right), and P195 survey observations (top right) for September 1999 (A) and June 2003 (B), both classified as flood-to-flood FWI.
