**2.2.2 General considerations for statistical analysis**

For evaluating distance decay relationships one basically plots similarity between sites against their geographical distance (Nekola & White, 1999). So the first to calculate are similarities between sample locations. There exists a multitude of coefficients for the calculation of compositional resemblance of species samples.

Sørensen similarity (Sørensen, 1948) is used to calculate compositional similarity based on plot inventories of all tree species throughout the presented study. Sørensen similarity does satisfy the criteria of linearity, homogeneity (if all values are multiplied by the same factor the value is not changing), symmetry (independence from calculation direction, after (Janson and Vegelius, 1981) and scaling between 0 and 1 (Koleff et al., 2003). It is well

Spatial Patterns of Phytodiversity - Assessing Vegetation Using (Dis) Similarity Measures 155

(redundancy analysis) rely on Euclidean distance, and DCA (detrended correspondence analysis), CA (correspondence analysis), CCA (canonical correspondence analysis) rely on ChiSquare distance, NMDS (as well as PCoA - principal coordinates analysis) is open to any kind of distance measure. Therefore it allows the implementation of measures, which have been proven adequate for ecological data as Bray-Curtis distance (Faith et al., 1987; Minchin, 1989). A good dissimilarity measure for communities has a good rank order relation to distance along environmental gradients. Because NMDS uses only rank information and maps these ranks non-linearly onto ordination space, it can handle non-linear species responses of any shape and effectively and robustly find the underlying gradients (Oksanen, 2005). It is an iterative ordination method that attempts to minimize a stress function, which measures the difference between the original floristic distances among sampling units and their new distances in the ordination space. Since NMDS is a non-metric method, it optimizes the rank order of distances rather than their actual values (Legendre & Legendre,

NMDS has been shown to be a very robust method regarding its reliability even when certain assumptions (like Gaussian species responses or sampling pattern) are violated (Minchin, 1989). NMDS was able to deal with any kind of response model, which was not the case for its strongest competitor, the DCA (ibid.). On the other hand the scaling of the axis scores does not allow drawing any conclusions regarding the position on the axis and ecological implications thereof. However, the positioning of sampled sites in a NMDS plot allows interpretation regarding their neighbors. These are similar in their species composition. When sites occur clumped in the NMDS this might be attributable to specific

DCA bases on CA, which can be seen as a weighted principal coordinates analysis (PCoA), computed on ChiSquare distances (Faith et al., 1987). It therefore depends on the relationship between ChiSquare distance and ecological distance. This is the most important difference between the two methods: DCA is based on an underlying model of species distributions, the unimodal model, while NMDS is not. However, not all species exhibit the same response curve (e.g. Gaussian responses) (Minchin, 1989). Thus, it is preferable to use NMDS especially with no specific hypothesis regarding species environment interaction in mind. Furthermore Legendre & Legendre (1998) state that detrending should be avoided, except for the specific purpose of estimating the lengths of gradients. De'ath (1999) formulates that there are two classes of ordination methods - 'species composition representation' (e.g. NMDS) and 'gradient analysis' (e.g. the various flavors of CA). This means that NMDS rather is a mapping method, which allows for projecting the multivariate data-space onto a two-dimensional map whereas PCA, CA and its relatives base on

In the present case the alteration of species composition through time was in focus and not the gradient representation in species composition. Therefore NMDS is the method of choice as it furthermore enabled the use of Bray-Curtis distance which is the quantitative onecomplement of the Sørensen index. Thus, the results are easily comparable and

Distance decay or spatial auto-correlation of quantitative univariate variables is usually calculated using semi-variograms (Legendre & Legendre, 1998). For multivariate data Mantel correlograms can be applied (Legendre & Legendre 1998; Sokal & Rohlf 1981). A simple

interpretation may not be hindered due to the implementation of different metrics.

geographic positions or environmental conditions.

projection and rotation (Oksanen, 2004).

**2.2.6 Distance decay** 

1998).

established and extensively used especially in vegetation ecology (e.g. Condit et al., 2002; Kluth & Bruelheide, 2004). This guarantees comparability with other studies. Sørensen is favored over Jaccard because the latter is more important in zoological studies (Koleff et al., 2003). Sørensen differs insofar that it does weight the shared species double which is seen as advantageous by Legendre & Legendre (1998) since shared species have more explanatory power regarding the underlying processes of the found patterns (Watt, 1947). Geographic distances between plots were obtained through the calculation of Euclidean distances between the x- and y-coordinates with the function dist () of the R package base (R Development Core Team, 2005).

#### **2.2.3 Slope and aspect**

Slope aspect and slope inclination may have a significant effect on species richness (Badano et al., 2005) and species composition especially in semi-arid vegetation (Sternberg & Shoshany, 2001). To obtain a distance measure integrating aspect and inclination, we use the model of a unit sphere and calculate great-circle distances between virtual locations. This allows for the generation of continuous rather than class variables as e.g. found in Kjällgren and Kullman (1998). For each plot a virtual location on the sphere is defined using the values for aspect as longitude and 90°-inclination as latitude. Therefore the virtual points are located in the pole region as long as inclination is low which leads to small (virtual) distances between them. The idea behind is that solar radiation; wind or other factors highly depending on aspect and inclination (Wilkinson & Humphreys, 2006) are not considerably different on plots with varying aspects as long as inclination is low. The longitude values on the unit sphere are derived from the directional reference made in the field. The equator of the sphere is thought as the compass circle. The Prime Meridian of the virtual sphere is the great circle through North and South of the compass. As in geographic terms longitude counts positive in Eastern and negative in Western direction. With Phi = latitude = 90° inclination and Lambda = longitude = aspect the great circle distance between A and B can be calculated with formula 1. As we use a unit sphere the maximum distance between two inclination/aspect pairs is perimeter/2 of the sphere, which is by definition Pi. To scale the possible distances between 0 and 1 the results of formula [1] are divided by Pi. Thus, a greatcircle distance of 1 is rather scarce in the real world; however, two vertical rock walls with opposite aspect would share it.

$$\mathcal{L} = \arccos(\sin(\phi\_A) \cdot \sin(\phi\_\mathbb{B}) + \cos(\phi\_A) \cdot \cos(\phi\_\mathbb{B}) \cdot \cos(\mathcal{L}\_\mathbb{B} - \mathcal{L}\_A)) \tag{1}$$

#### **2.2.4 Statistical analysis**

All statistical analyses were performed using functions of the packages base, stats, vegan and simba of the R statistics system (R Development Core Team, 2005). For better reading it is referred to the functions in the form 'function [package]' when the package is not mentioned before or 'function ()' when it is clear which package is meant in the following.

#### **2.2.5 Compositional similarity**

A common way to evaluate the structuring of compositional similarity within a data set is to use NMDS (Non-Metric multidimensional Scaling) plots. Non-metric multidimensional scaling (Kruskal, 1964) differs from other known ordination methods in that it does not build on a specific distance measure. Whereas PCA (principal components analysis), RDA (redundancy analysis) rely on Euclidean distance, and DCA (detrended correspondence analysis), CA (correspondence analysis), CCA (canonical correspondence analysis) rely on ChiSquare distance, NMDS (as well as PCoA - principal coordinates analysis) is open to any kind of distance measure. Therefore it allows the implementation of measures, which have been proven adequate for ecological data as Bray-Curtis distance (Faith et al., 1987; Minchin, 1989). A good dissimilarity measure for communities has a good rank order relation to distance along environmental gradients. Because NMDS uses only rank information and maps these ranks non-linearly onto ordination space, it can handle non-linear species responses of any shape and effectively and robustly find the underlying gradients (Oksanen, 2005). It is an iterative ordination method that attempts to minimize a stress function, which measures the difference between the original floristic distances among sampling units and their new distances in the ordination space. Since NMDS is a non-metric method, it optimizes the rank order of distances rather than their actual values (Legendre & Legendre, 1998).

NMDS has been shown to be a very robust method regarding its reliability even when certain assumptions (like Gaussian species responses or sampling pattern) are violated (Minchin, 1989). NMDS was able to deal with any kind of response model, which was not the case for its strongest competitor, the DCA (ibid.). On the other hand the scaling of the axis scores does not allow drawing any conclusions regarding the position on the axis and ecological implications thereof. However, the positioning of sampled sites in a NMDS plot allows interpretation regarding their neighbors. These are similar in their species composition. When sites occur clumped in the NMDS this might be attributable to specific geographic positions or environmental conditions.

DCA bases on CA, which can be seen as a weighted principal coordinates analysis (PCoA), computed on ChiSquare distances (Faith et al., 1987). It therefore depends on the relationship between ChiSquare distance and ecological distance. This is the most important difference between the two methods: DCA is based on an underlying model of species distributions, the unimodal model, while NMDS is not. However, not all species exhibit the same response curve (e.g. Gaussian responses) (Minchin, 1989). Thus, it is preferable to use NMDS especially with no specific hypothesis regarding species environment interaction in mind. Furthermore Legendre & Legendre (1998) state that detrending should be avoided, except for the specific purpose of estimating the lengths of gradients. De'ath (1999) formulates that there are two classes of ordination methods - 'species composition representation' (e.g. NMDS) and 'gradient analysis' (e.g. the various flavors of CA). This means that NMDS rather is a mapping method, which allows for projecting the multivariate data-space onto a two-dimensional map whereas PCA, CA and its relatives base on projection and rotation (Oksanen, 2004).

In the present case the alteration of species composition through time was in focus and not the gradient representation in species composition. Therefore NMDS is the method of choice as it furthermore enabled the use of Bray-Curtis distance which is the quantitative onecomplement of the Sørensen index. Thus, the results are easily comparable and interpretation may not be hindered due to the implementation of different metrics.

#### **2.2.6 Distance decay**

154 The Dynamical Processes of Biodiversity – Case Studies of Evolution and Spatial Distribution

established and extensively used especially in vegetation ecology (e.g. Condit et al., 2002; Kluth & Bruelheide, 2004). This guarantees comparability with other studies. Sørensen is favored over Jaccard because the latter is more important in zoological studies (Koleff et al., 2003). Sørensen differs insofar that it does weight the shared species double which is seen as advantageous by Legendre & Legendre (1998) since shared species have more explanatory power regarding the underlying processes of the found patterns (Watt, 1947). Geographic distances between plots were obtained through the calculation of Euclidean distances between the x- and y-coordinates with the function dist () of the R package base (R

Slope aspect and slope inclination may have a significant effect on species richness (Badano et al., 2005) and species composition especially in semi-arid vegetation (Sternberg & Shoshany, 2001). To obtain a distance measure integrating aspect and inclination, we use the model of a unit sphere and calculate great-circle distances between virtual locations. This allows for the generation of continuous rather than class variables as e.g. found in Kjällgren and Kullman (1998). For each plot a virtual location on the sphere is defined using the values for aspect as longitude and 90°-inclination as latitude. Therefore the virtual points are located in the pole region as long as inclination is low which leads to small (virtual) distances between them. The idea behind is that solar radiation; wind or other factors highly depending on aspect and inclination (Wilkinson & Humphreys, 2006) are not considerably different on plots with varying aspects as long as inclination is low. The longitude values on the unit sphere are derived from the directional reference made in the field. The equator of the sphere is thought as the compass circle. The Prime Meridian of the virtual sphere is the great circle through North and South of the compass. As in geographic terms longitude counts positive in Eastern and negative in Western direction. With Phi = latitude = 90° inclination and Lambda = longitude = aspect the great circle distance between A and B can be calculated with formula 1. As we use a unit sphere the maximum distance between two inclination/aspect pairs is perimeter/2 of the sphere, which is by definition Pi. To scale the possible distances between 0 and 1 the results of formula [1] are divided by Pi. Thus, a greatcircle distance of 1 is rather scarce in the real world; however, two vertical rock walls with

arccos(sin( ) sin( ) cos( ) cos( ) cos( )) *A B A B BA*

All statistical analyses were performed using functions of the packages base, stats, vegan and simba of the R statistics system (R Development Core Team, 2005). For better reading it is referred to the functions in the form 'function [package]' when the package is not mentioned before or 'function ()' when it is clear which package is meant in the following.

A common way to evaluate the structuring of compositional similarity within a data set is to use NMDS (Non-Metric multidimensional Scaling) plots. Non-metric multidimensional scaling (Kruskal, 1964) differs from other known ordination methods in that it does not build on a specific distance measure. Whereas PCA (principal components analysis), RDA

 

 

 (1)

Development Core Team, 2005).

opposite aspect would share it.

**2.2.5 Compositional similarity** 

**2.2.4 Statistical analysis** 

  

**2.2.3 Slope and aspect** 

Distance decay or spatial auto-correlation of quantitative univariate variables is usually calculated using semi-variograms (Legendre & Legendre, 1998). For multivariate data Mantel correlograms can be applied (Legendre & Legendre 1998; Sokal & Rohlf 1981). A simple

Spatial Patterns of Phytodiversity - Assessing Vegetation Using (Dis) Similarity Measures 157

Results on the phytosociological parameters for the random plots are presented in Table 2. A total of 278 (zone-1) and 679 plots (zone-2) covering major vegetation types were enumerated to assess species richness, composition and diversity patterns. A sum of 25,621 individuals belonging to 963 species, 512 genera and 133 families were observed in the entire study area. It was observed that species richness was higher in zone-2 with 818 species belonging to 124 families as compared to 372 species of 95 families in zone-1. Herb species richness in zone-2 (350) was higher than in zone-1 having 101 species (Table 2). A total of 57 grass species majority of which belong to Poaceae followed by Cyperaceae were

**Richness**

**Semievergreen** 18 68 18 31 12 3 148 299 58 5.3 415.3 40.95 **Moist Deciduous** 77 76 31 66 20 7 227 794 69 5.2 257.8 15.79 **Dry Deciduous** 108 84 30 70 27 6 240 2100 75 5.0 486.1 19.04 **Dry Evergreen** 17 52 17 37 12 135 181 57 5.2 266.2 8.65 **Thorn Forest** 10 17 14 27 12 1 100 50 60 3.7 156.3 2.89 **Degraded Forest** 48 64 19 44 18 4 165 863 59 4.2 449.5 12.94 **Total** 278 137 51 101 44 10 372 4287 95 17.39 **Moist Deciduous** 102 167 71 184 61 22 520 1511 104 6.6 370.3 13.28 **Dry Deciduous** 325 191 87 281 73 37 680 5024 117 6.6 386.5 14.67 **Hardwickia Mixed** 85 145 73 166 51 21 466 952 103 6.1 290.2 11.13 **Red Sanders Mixed** 65 118 62 157 52 15 415 937 93 5.9 360.4 12.64 **Dry Evergreen** 47 125 46 128 39 15 365 629 88 6.2 334.6 11.10 **Thorn Forest** 36 112 55 93 48 11 326 427 86 5.7 296.5 8.61 **Degraded Forest** 19 106 36 66 21 11 247 235 75 6.2 309.2 10.45 **Total** 679 205 119 339 90 51 818 9715 124 13.15

Table 2. Species Richness in different vegetation type and habits in Northern and Southern

**No. of Species** 43 65 72 53 61 55 **No. of Individuals** 308 525 495 407 307 415 **No. of Families** 24 33 39 27 30 35 **No. of Genera** 37 59 61 46 54 48 **Species Diversity** 2.9 3.0 3.5 3.3 3.6 3.4

**No. of Endemics** 2 0 0 1 0 138 **No. of RET** 004922 **Stand Density (ha -1)** 616 1050 990 814 614 830

Table 3. Consolidated results for the six transect sites in Northern and Southern Eastern

**No. of tree** 

**123456**

**Site**

**/ha)** 14.71 17.61 24.42 4.35 10.11 6.82

**Individuals Families Species** 

**Diversity (H')**

**Stand Density (ha -1)**

**Basal Area (m2 /ha)**

**Plots Trees Shrubs Herbs Climbers Grass Species** 

**3. Results** 

**3.1 Phytosociology analysis** 

observed in this study.

**Site Vegetation type No. of** 

Eastern Ghats of Andhra Pradesh, India

**Parameters**

**Basal area (m<sup>2</sup>**

Ghats of Andhra Pradesh, India

**Northern Eastern ghats (Zone-1)**

**Southern Eastern Ghats (Zone-2)**

**3.1.1 Tree species richness and diversity indices** 

possibility for vegetation data is to regress similarity of units regarding species composition against their geographical separation (Nekola and White, 1999; Steinitz et al., 2006). To test for the influence of different vegetation types on patterns of compositional similarity data was divided into subsets. As the plots can be assigned to 2 geographically distinct regions, the distance decay of different vegetation types and subsets based on other categorical variables (fragmentation, slope, disturbance, etc.) is compared between the two regions.

The similarity values of the subsets are compared with an ANOVA-like function (mrpp[vegan]), (Oksanen et al., 2007)) and tested for significant differences using a permutation procedure (diffmean[simba]). Normal tests and ANOVA might fail here because the similarities are not independent from each other.

The Multiple Response Permutation Procedure (mrpp) allows testing whether there is significant difference between two or more groups of sampling units. The method is insofar similar to anova, in that it compares dissimilarities within and among groups. If two groups of sampling units are really different (e.g. in their species composition), then average withingroup compositional dissimilarities ought to be less than the average of the dissimilarities between two random collections of sampling units drawn from the entire population. The mrpp statistic delta gives the overall weighted mean of within-group means of the pairwise dissimilarities among sampling units. The mrpp[vegan] algorithm first computes all pairwise distances in the entire dataset, then calculates delta. Then the sampling units and their associated pairwise distances are permuted, and delta is recalculated based on the permuted data. The last steps are repeated N times. N defaults to 1000 which provides a possible significance-level of p<0.001 as significance is tested against the distribution of the permuted deltas.

After testing for significant differences between subsets, the differences in mean similarity are tested with a permutation procedure (diffmean[simba]). The difference in mean similarity between two sets is calculated. The two sets are joints together and two random sets the same size as the original sets are selected and their difference in mean is calculated. Then the sampling units and their associated pairwise distances are permuted, and the difference in mean is recalculated based on the permuted data. The last steps are repeated N times. N defaults to 1000 which provides a possible significance-level of p<0.001 as significance is tested against the distribution of the permuted values.

To answer the question if distance decay is significantly different between the various evaluated subsets of the data, the slopes of the distance decay relationship have been calculated for the three subsets and compared. A permutation procedure following Nekola & White (1999) has been implemented as an R function (diffslope[simba]) to test for significance. For each subset compositional similarity between plots is regressed against their geographical separation. Before calculating the difference in slope between two subsets the values of compositional similarity are rescaled to a common mean. So testing the difference in slope of the distance decay relationship is independent of differences in the mean (Nekola & White, 1999; Steinitz et al., 2006). Linear regression is carried out on both of the subsets, and the difference in slope is calculated and stored. Then the variable pairs (geographical separation, compositional similarity) are randomly reassigned to the two data-subsets. Regression is calculated for each of the random subsets and the difference in slope is obtained again. The last steps are repeated 1000 times. Finally the difference between the observed slopes is compared to the differences based on random reassignment. Number of times when randomization are being produced differences in slope which is higher than the original data are summed up and divided by the number of permutations to get a p-value.
