**2. Data and methodology**

### **2.1 Description of the dataset**

To conduct a study on hemispheric teleconnectivity, it is critical to have high-resolution, global gridded data with a long period of record. The NCEP/NCAR reanalysis (NNRP – Kalnay et al., 1996) dataset, which is formulated on a 2.5° x 2.5° latitude-longitude grid, includes over 200 meteorological variables at up to 17 vertical levels at 6 hour intervals from 1948 - present. Variables in the NNRP were rated by Kalnay et al. (1996) based on the influence of observations, model derived data, and climatology in their formulation. As this study is concerned with mid-tropospheric intraseasonal modes of variability, 500 hPa Northern Hemispheric geopotential heights from the NNRP (rated most reliable by Kalnay et al., 1996) were used in the analysis.

To maintain consistency with the methodology of BL87, monthly averages of 500 hPa heights were formulated from 63 years of the NNRP (1948-2010). The domain of interest used by BL87 did not include height data south of 20°N latitude, primarily owing to the lack of variability in the tropics in mid-level height data. To maintain consistency, the same 20°N threshold for the southern-most latitude line was used in the present study. Fig. 1a shows the study domain and the grid spacing associated with the NNRP.

The NNRP are provided on a latitude-longitude grid; hence, gridpoints will converge as the domain extends poleward. This convergence causes artificial inflation of correlation values at higher latitudes. To mitigate this issue, the data were analysed objectively using a Barnes

Fig. 1. The NNRP grid (panel a) and the interpolated Fibonacci grid (panel b).

analysis (Barnes, 1964) to place the data on a Fibonacci grid (Swinbank & Purser, 2006) that, by definition, has equal grid spacing. Such a grid will not result in artificial inflation of the correlations between the 2303 gridpoints (Fig. 1b). To ensure no data extrapolation at lower latitudes, the spacing associated with the Fibonacci grid was kept identical to that at the equator. Interpolation error of the Barnes analysis was calculated at less than 1%. The Fibonacci grid defines the domain and is used in the computation of the correlation matrix.

#### **2.2 RPCA methodology**

274 Atmospheric Model Applications

In this work, we extend the eigenvector-based approach by relaxing the orthogonality constraint and test each analysis for the optimal number of eigenvectors to retain and rotate. While BL87 used cutting-edge analyses for the 1980's, the limited availability of data (35 years) and computational power (limiting the number of gridpoints to 358) are suboptimal by current standards. Innovation in the analysis procedure in recent years combined with newer data sets, the availability of 63 years of data and much denser grids to document the climate system are strong motivators to re-examine Northern Hemisphere geopotential

To conduct a study on hemispheric teleconnectivity, it is critical to have high-resolution, global gridded data with a long period of record. The NCEP/NCAR reanalysis (NNRP – Kalnay et al., 1996) dataset, which is formulated on a 2.5° x 2.5° latitude-longitude grid, includes over 200 meteorological variables at up to 17 vertical levels at 6 hour intervals from 1948 - present. Variables in the NNRP were rated by Kalnay et al. (1996) based on the influence of observations, model derived data, and climatology in their formulation. As this study is concerned with mid-tropospheric intraseasonal modes of variability, 500 hPa Northern Hemispheric geopotential heights from the NNRP (rated most reliable by Kalnay

To maintain consistency with the methodology of BL87, monthly averages of 500 hPa heights were formulated from 63 years of the NNRP (1948-2010). The domain of interest used by BL87 did not include height data south of 20°N latitude, primarily owing to the lack of variability in the tropics in mid-level height data. To maintain consistency, the same 20°N threshold for the southern-most latitude line was used in the present study. Fig. 1a shows

The NNRP are provided on a latitude-longitude grid; hence, gridpoints will converge as the domain extends poleward. This convergence causes artificial inflation of correlation values at higher latitudes. To mitigate this issue, the data were analysed objectively using a Barnes

the study domain and the grid spacing associated with the NNRP.

Fig. 1. The NNRP grid (panel a) and the interpolated Fibonacci grid (panel b).

modes of variability.

**2. Data and methodology 2.1 Description of the dataset** 

et al., 1996) were used in the analysis.

Variation of geopotential height is a function of latitude. To avoid biasing the analyses and to permit smaller, but equally important, variation in the southern regions of the domain to be represented equally, a correlation matrix (**R**), rather than variance-covariance matrix, was computed. Using the correlation matrix standardizes the data to a mean of zero and standard deviation of one. Hence, all the subsequent analyses are standardized anomalies (**Z**) from the mean. The correlation matrix was formed among the grid points by summing over the 63 year sample for January or July, providing a representative month in the cold or warm season, respectively. The correlation matrix for January (July) was decomposed into a square matrix of eigenvectors (**V**) and associated diagonal matrix of eigenvalues (**D**), given by the equation

$$\mathbf{R} = \mathbf{V} \mathbf{D} \mathbf{V}^{\mathrm{T}} \tag{1}$$

The rank of the eigenvector matrix is equal to the smaller of the number of gridpoints or number of observations minus 1. Because there were 63 observations, only 62 eigenvalues were nonzero and 62 eigenvectors were extracted. The goal of this stage of the analysis is to create a set of basis vectors that compress the original variability in **R** into a new reference frame. It is possible to plot the elements of each vector (**V**) on spatial maps; however, the patterns in **V** do not result in any localization of the spatial variance, nor do they represent well the variability in **R** (Richman, 1986). The eigenvectors were scaled by the square root of the corresponding eigenvalue to create principal component loadings (**A**). Doing so permits the data to be expressed as

$$\mathbf{Z} = \mathbf{F} \mathbf{A}^{\mathsf{T}} \tag{2}$$

where the vectors in **F** represent the new set of basis functions, known as principal component scores and **A** is the matrix of weights that relates the original standardized data (**Z**) to **F**. The vectors in **A** contain elements that are regression coefficients between **Z** and **F**.

Many of the 62 dimensions represent small-scale signals (sub-planetary scale) that have variance properties indistinguishable from noise, associated with very small eigenvalues. We truncate the number of principal components to represent only that variance associated with planetary scale wavetrains. To accomplish this goal, a two-step process is applied. First, the magnitudes of the eigenvalues are examined and those with relatively large eigenvalues are retained to yield a subset of *l* principal component loading vectors. The value *l* is selected by implementing the scree test (Wilks, 2011) to provide a visual estimate of the approximate number of non-degenerate eigenvectors to retain. At this stage, a number of roots, *l,* is selected to be liberal, intentionally representing more than the ideal number of roots, *k*, associated with the large-scale signal. To assess the coherent signal, the vectors of **A** are linearly transformed to a new set of vectors, **B**, known as rotated principal

Identification of Intraseasonal Modes of Variability Using Rotated Principal Components 277

the RPC loading vector and the point teleconnection pattern that has a basepoint location coinciding with the largest magnitude RPC loading (hereafter referred to as the pattern/teleconnection congruence) is -0.91, suggesting a close match between the RPC and correlation structure. Wallace & Gutzler (1981) identified this pattern in the 500 hPa geopotential heights during boreal winter. They found the thickness pattern consistent with a cold-core equivalent barotropic structure. BL87 found WP/NPO in a 700 hPa analysis of 1950–84 monthly winter height anomalies. Hsu & Wallace (1985) investigated the temporal structure of WP variability at subseasonal time scales, relating the anomalies to Rossby wave dispersion. The NPO has been associated with Alaskan blocking events (Renwick & Wallace, 1996) and modulation of the Pacific storm track (Lau 1988; Rogers 1990). Linkin & Nigam (2008) claim the WP pattern is a basic analog to the North Atlantic Oscillation and has impact on the weather in the Pacific Northwest, especially in coastal regions, in the south-central Great Plains, and on marginal sea ice zones in the Arctic. Our analysis of the pattern time series (Fig. 3a) suggests no significant trend and no year to year persistence in the autocorrelation function (ACF; Fig. 3b) However, the year to year January variability in the WP/NPO (Fig 3a) shows decadal nonstationarity as there exist several periods where the mean is significantly different from zero (Table 1) and the variance structure undergoes dramatic rapid year to year variability in the first three decades of the analysis with lower

*Pattern 2* – Subtropical zonal winter pattern (SZW; Fig. 4) has a unique morphology with one major anomaly centered over western China (34°N, 90°E). Nearly all regions south of 35°N have negative RPC loadings, suggesting a zonal pattern in the subtropics. There are a number of secondary centers of the same sign extending to the west across northern Africa and in the southwestern US. The pattern accounts for 9.4 percent of the total 500 hPa variance in January. The pattern/teleconnection congruence is -0.92, a close match between the RPC and correlation structure. The time series of PC scores for SZW is nonstationary with a strong inverse linear trend that implies the sign of the anomalies has reversed over the 63 year period (Fig. 5a). There is a sharp discontinuity around 1980. The statistical significance of the linear trend line (Fig. 5a) is *p*=6x10-6, which supports the idea of strong height rises in the subtropical regions. The ACF (Fig. 5b) has a unique pattern with multiyear persistence, with 5 of the first 8 lags significant, owing to the trend in subtropical heights. The decadal analysis of the mean RPC scores strengthens the idea of a reversal in anomaly pattern as the scores are positive in the first three decades and negative in the latter three (Table 1). The decadal variance is below the overall mean of 1 in every decade,

suggesting that this pattern persists substantially from year to year (Table 2).

*Pattern 3* – The Northern Asian pattern (NA; Fig. 6) has its main center of action close to the North Pole at 80°N and 40°E. Secondary centers of opposite sign are situated over Mongolia at 50°N 100°-120°E and just west of the United Kingdom. Esbensen (1984) identified a similar pattern, although the main center was shifted south of the position shown herein. This mode explains 8.3% of the January height variance. The pattern/teleconnection congruence is 0.95, representing a very close match to the traditional teleconnection pattern. The time series (Fig. 7a) has no discernible trends but has periods of high variance in the 1960's and the 1990's -2000's (Table 2). The ACF (Fig. 7b) shows no year to year persistence. *Pattern 4* - The eastern Atlantic teleconnection (AE; Fig. 8), accounting for 6.5% of the variance, has a dipole at ~ 40°W with one center at longitude 30°W and a second elongated

frequency variability in the past three decades (Table 2).

component loadings, which simplify or localize the hemispheric wavetrains to agree with the correlation structure of the data. This process can be summarized by the equation

$$\mathbf{B} = \mathbf{A} \,\mathrm{T}^{\mathrm{-1}} \tag{3}$$

where **T** is an invertible transformation matrix that represents a rotation of the reference frame into a position that results in the maximal simplification that the data permit. The rotation algorithm used in this analysis is Promax (Richman, 1986). Promax PC scores allow for correlations between the vectors in **F**. The goal of the rotation is to identify, in the vectors of **B,** height anomaly patterns that recur often during the month of January (July). The spatial properties of each vector in **B** are a function of the number of rotated PC vectors retained (*l*). Therefore, it is critical to select the optimal number of vectors, *k*. To accomplish that goal, the matrix **A** is transformed to **B** for a variable number of PC vectors retains (i.e., 2 to *l*). Each solution yields a different set of patterns. We desire a set *k* < *l* that capture as much coherent large scale signal as possible that matches the patterns embedded within **R**. The one set of *k* PC loadings that relates best to the correlation matrix generating them is determined and the number of PCs retained is set to *k*. The process, outlined in Richman & Lamb (1985), and refined in Richman (1986), selects each vector of **B** and identifies the location or gridpoint with the highest absolute PC loading and matches the rotated PC loading vector to the corresponding correlation vector using the congruence coefficent (Richman, 1986). The corresponding correlation vector in **R** is a teleconnection pattern. Hence, this method incorporates the logic of the traditional teleconnection approach, providing an objective procedure to selecting *k*. For January, the Promax solution with the optimum match occurred as *k* = 8 and for July, *k* = 4. The average absolute congruence match was found to be 0.904 Promax in January and 0.895 for Promax in July.

Owing to the linear decomposition of **R**, signs of the coefficients, in each vector of **B** can be multiplied by -1 with no loss of interpretation. Because positive values exceeding ~ +0.25 correspond to ridges (Richman & Gong, 1999), negative values of ~ –0.25 correspond to troughs, multiplication by -1 reverses the interpretation of the troughs and ridges. Furthermore, the sign of the PC score is multiplied by the sign of the PC loading to obtain a physical interpretation of any monthly map (Compagnucci & Richman, 2008). For example, a negative anomaly in a PC loading map multiplied by a negative PC score gives the interpretation of a positive height anomaly.
