Singular Value Decomposition

Singular value decomposition (SVD) is quite possibly the most widely-used multivariate statistical technique used in the atmospheric sciences. The technique was first introduced to meteorology in a 1956 paper by Edward Lorenz, in which he referred to the process as empirical orthogonal function (EOF) analysis. Today, it is also commonly known as principal-component analysis (PCA). All three names are still used, and refer to the same set of procedures within the Data Library.

The purpose of singular value decomposition is to reduce a dataset containing a large number of values to a dataset containing significantly fewer values, but which still contains a large fraction of the variability present in the original data. Often in the atmospheric and geophysical sciences, data will exhibit large spatial correlations. SVD analysis results in a more compact representation of these correlations, especially with multivariate datasets and can provide insight into spatial and temporal variations exhibited in the fields of data being analyzed.

There are a few caveats one should be aware of before computing the SVD of a set of data. First, the data must consist of anomalies. Secondly, the data should be de-trended. When trends in the data exist over time, the first structure often captures them. If the purpose of the analysis is to find spatial correlations independent of trends, the data should be de-trended before applying SVD analysis.

Analysis of Singular Value Decomposition

The first structure is the single pattern that represents the most variance in the data. The structures are the elements of the eigenvectors of the variance-covariance matrix of the data. In the Data Library, the eigenvectors are also known as EOF's. The first eigenvector (EOF) points to the direction in which the data vectors jointly exhibit the most variability. Essentially, a new coordinate system is created, with each axis aligned along the direction of maximum joint variability.

The second structure is the pattern that describes the second largest amount of variance, calculated the same way as the first structure. A very important property of the second structure is that it is completely uncorrelated with the first structure, as well as all other following structures. The second eigenvector is perpendicular to the first eigenvector, which is perpendicular to the third eigenvector and so on. This property is what led Lorenz to call the technique empirical orthogonal function analysis. All structures are mutually uncorrelated.

The variance of the nth principal component is the nth eigenvalue. Therefore, the total variation exhibited by the data is equal to the sum of all eigenvalues. In the Data Library, eigenvalues are normalized such that the sum of all eigenvalues equals 1. A normalized eigenvalue will indicate the percentage of total variance explained by its corresponding structure. Structures have also been normalized so that the root mean square equals 1. This way, the structures can be expressed in terms of standard deviation.

Singular values are equal to the square root of the eigenvalues. Since eigenvalues are automatically normalized in the Data Library, they do not easily provide information into the total amount of variance they explain. However, you may calculate the total variance explained by each EOF by squaring the singular values.

In the Data Library there is a time series associated with each structure. These time series are also known as principal components. The first time series is calculated by projecting the data matrix onto the first eigenvector of the variance-covariance matrix of the data, the second time series by projecting onto the second eigenvector, and so on. The time series values indicate the amount of the given structure needed to complete the data field. It follows that the structure (dimensionless) multiplied by the time series value at a single point in time (units of the data), summed over all structures, yields the original data at that point in time.

Mathematically, there are as many eigenvectors as there are elements in the vector data set. The first few eigenvectors will point in directions where the data jointly exhibits large variation. The remaining eigenvectors will point to directions where the data jointly exhibits less variation. For this reason, it is often possible to capture most of the variation by considering only the first few eigenvectors. The remaining eigenvectors, along with their corresponding principal components, are truncated. The ability of SVD to eliminate a large proportion of the data is a primary reason for its use.

Outline of Key Points

Datasets must consist of anomalies.
Better results when applied to de-trended data.
As many eigenvectors as temporal data values in the set.
Eigenvectors point in the direction of maximum joint variability.
Eigenvalues represent the amount of variance explained by the corresponding structure.
First eigenvalue will account for the most variation.
All but first few structures may be truncated in most cases.
All principal components mutually uncorrelated.

Example: SVD Analysis of North Atlantic Sea Surface Temperature Anomalies

Example: Perform a singular value decomposition of reconstructed sea surface temperature anomaly data in the North Atlantic for the months of December, January, and February from 1870 to 2004.

Locate Dataset and Variable	Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Air-Sea Interface" link. Select the ERSST dataset. Scroll down the page and select the "version2" link under the Datasets and Variables subheading. Select the "Sea Surface Temperature" link again under the Datasets and Variables subheading. CHECK
Compute Monthly Anomalies	Click on the "Filters" link in the function bar. Choose the anomalies command. CHECK EXPERT This operation calculates the SST anomalies for each month.
Select Temporal and Spatial Domains	Click on the "Data Selection" link in the function bar. Enter the text 10N to 70N, 5W to 80W, and Dec-Feb 1870-2004 in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK The time range entered will select only December, January, and February values for each year.
Compute Singular Value Decomposition	Click on the "Expert Mode" link in the function bar Enter the following line under the text already there: {Y cosd} [X Y] [T] svd Press the OK button. CHECK The svd function computes the singular value decomposition of the SST dataset weighted over the cosine of the latitude. Often, spatial data will be weighted over the cosine of the latitude to account for area changes between meridians at varying latitudes. A weight term, however, is not necessary to complete the SVD analysis. Five new variables appear under the Datasets and Variables subheading: normalized eigenvalues, structures, singular values, time series, and weights. While all of the variables are associated with the same new coordinate system generated by the SVD, each contain a different piece of information about the system.
View Normalized Eigenvalues	Click on the "normalized eigenvalues" link under the Datasets and Variables subheading. CHECK Select the time series viewer in the function bar. CHECK Normalized Eigenvalues vs. Eigenvectors of SVD SST Anomalies Notice the speed in which this function decays. The eigenvalues associated with the first few eigenvectors are much larger than the eigenvalues associated with subsequent eigenvectors. As mentioned earlier, the first few eigenvalues account for most of the variation present in the original data. Click on the right-most link in the blue source bar to exit the viewer. Select the "Tables" link in the function bar. Select the columnar table link. CHECK The first normalized eigenvalue is .233, the second eigenvalue is .151, and the third eigenvalue is .139. Recall that normalized eigenvalues represent the fraction of variance explained by the structure associated with that eigenvalue. Therefore, the first structure explains 23% of the variance, the second structure 15%, and so on. Looking at the table, there are 402 structures. Yet, the first three structures account for over 50% of the variance.
Return to Dataset Page	Select the "Additional Information" link at the top of the page to exit the table. In the source bar, click on the { Y cosd } [ X Y ] [ T ] svd link. CHECK This will remove the normalized eigenvector variable selection and return you to the SVD page.
View Structures	Click on the "structures" link under the Datasets and Variables subheading. CHECK In the function bar, select the viewer with land shaded in black. CHECK 1st Structure of SVD SST Anomalies This is an image of the 1st structure, which explains 23.2% of the total variance present in the original data. Recall that the structures have been normalized, and as a result, are unitless quantities. Note the large negative values off the coast of West Africa. This variability is caused by an ocean-atmosphere coupling system described in the third example. In the text box above the viewer window, enter the number 2. Press the redraw button. CHECK 2nd Structure of SVD SST Anomalies This is an image of the second structure, which explains 15% of the total variance present in the original data. Notice the large negative values off the east coast of the United States that extend into the Central Atlantic. These large values may be produced, in part, by the Gulf Stream current, which causes annual variability of SST's in the region. An image of the gulf stream current is provided below. The large values present in the 2nd EOF structure above and the vectors that represent the gulf stream current in the image below appear to overlap. This region is also aligned with the jet stream, a narrow area where weather patterns move off the coast and cause additional variability in SST's. The large values in the 2nd structure may also be caused by an atmospheric circulation pattern known as the North Atlantic Oscillation. The Gulf Stream Current Gyory, Joanna. The Gulf Stream. http://oceancurrents.rsmas.miami.edu/atlantic/gulf-stream.html.
Return to Dataset Page	Click on the right-most link in the blue source bar to exit the viewer. In the source bar, click on the { Y cosd } [ X Y ] [ T ] svd link. CHECK This will remove the structures variable selection and return you to the SVD page.
View Time Series	Click on the "time series" link under the Datasets and Variables subheading. CHECK Select the time series viewer. CHECK Time Series of SVD SST Anomalies There is a time series associated with each eigenvector/structure. This is the time series corresponding to the 1st eigenvector, but you may change the eigenvector by changing the number in the text box above the viewer. The time series illustrates the amount of the structure present in the data, or in other words, the amount of the structure needed to complete the data field at each time step. These time series can be correlated with time series and/or indices relating to other processes in order to demonstrate a relationship. *NOTE: The singular values variable can be accessed the same way as the other three variables shown above.

Example: SVD Analysis of North Atlantic Mean Sea Level Pressure Anomalies and Their Relation to the NAO

Example: Perform a singular value decomposition analysis of mean sea level pressure anomaly data in the North Atlantic for the months of December, January, and February from 1950 to 2004.

Locate Dataset and Variable	Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Historical Model Simulations" link. Select the NOAA NCEP-NCAR CDAS-1 dataset. Scroll down the page and select the "MONTHLY" link under the Datasets and Variables subheading. Select the "Intrinsic" link again under the Datasets and Variables subheading. Select the "Mean Sea Level" link again under the Datasets and Variables subheading. Select the "Pressure" link again under the Datasets and Variables subheading. CHECK
Compute Monthly Anomalies	Click on the "Filters" link in the function bar. Choose the anomalies command. CHECK EXPERT This operation calculates the mean sea level pressure anomalies for each month.
Select Temporal and Spatial Domains	Click on the "Data Selection" link in the function bar. Enter the text 5W to 80W, 10N to 70N, and Dec-Feb 1950-2004 in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK The time range entered will select only December, January, and February values for each year.
Compute Singular Value Decomposition	Again in Expert Mode, enter the following line under the text already there: {Y cosd} [X Y] [T] svd Press the OK button. CHECK The svd function computes the singular value decomposition of the mean sea level pressure dataset weighted over the cosine of the latitude.
Find Eigenvalue of 1st Structure	Click on the "normalized eigenvalues" link under the Datasets and Variables subheading. CHECK Select the "Tables" link in the function bar. Select the columnar table link. CHECK The first normalized eigenvalue is .402, the second eigenvalue is .278, and the third eigenvalue is .100. Normalized eigenvalues represent the fraction of varience explained by the structure associated with that eigenvalue. In this example, we will only be concerned with the first eigenvalue, which explains 40.2% of the total variance.
Return to Dataset Page	Select the "Additional Information" link at the top of the page to exit the table. In the source bar, click on the { Y cosd } [ X Y ] [ T ] svd link. CHECK This will remove the normalized eigenvector variable selection and return you to the SVD page.
View 1st Structure	Click on the "structures" link under the Datasets and Variables subheading. CHECK In the function bar, select the viewer with land shaded in black. CHECK 1st Structure of SVD MSLP Anomalies This is an image of the first structure, which explains 40.2% of the total variance present in the original data. The large positive values centered around 45° N and the large negative values centered around 65° N are indicative of two regions whose mean sea level pressures are generally inversely related. This system is a well known low-frequency atmospheric circulation pattern called the North Atlantic Oscillation. The NAO is characterized by large-scale MSLP variablity associated with a subtropical high / polar low system over the Northern Atlantic. During a postive NAO, the subtropical high is stronger than usual and the polar low is deeper than usual. The increased pressure gradient causes stronger winter storms to cross over the Atlantic. During a negative NAO, the subtropical high and polar low are both weaker than usual, resulting in fewer / less severe storms crossing the Atlantic.

Example: Correlation of a SVD Time Series of Mean Sea Level Pressure Anomalies with a SVD Time Series of SST Anomalies in the North Atlantic.

Example: Correlate a SVD time series of mean sea level pressure anomalies with a SVD time series of SST anomalies in the North Atlantic for the months of December, January, and February.

Select Dataset, Variable, and Domains	*NOTE: Datasets used in the example are similar to those used in the previous two examples. Return to the NOAA NCEP-NCAR CDAS-1 MSLP anomaly dataset with previously selected domains by clicking here or by following the steps listed above in the previous example. You may also enter the following commands into Expert Mode: SOURCES .NOAA .NCEP-NCAR .CDAS-1 .MONTHLY .Intrinsic .MSL .pressure yearly-anomalies Y (10N) (70N) RANGEEDGES T (Dec-Feb 1950-2004) VALUES X (5W) (80W) RANGEEDGES Press the OK button. CHECK
Compute Singular Value Decomposition	In Expert Mode, enter the following line under the text already there: {Y cosd} [X Y] [T] svd Press the OK button. CHECK The svd function computes the singular value decomposition of the mean sea level pressure dataset weighted over the cosine of the latitude.
Select Time Series Variable and 1st Eigenvector	Click the "Time Series" variable under the Datasets and Variables subheading. CHECK Click on the "Data Selection" link in the function bar. Enter the number 1 in the ev text box. Press the Restrict Ranges button and then the Stop Selecting button. CHECK You have selected the first eigenvector, and its associated time series.
Add the Second Structure SVD Time Series of Reconstructed SST Anomaly Data.	Enter into Expert Mode by clicking the "Expert Mode" link in the function bar, if you are not already there. Enter the following lines under the text already there: SOURCES .NOAA .NCDC .ERSST .version2 .SST yearly-anomalies X (5W) (80W) RANGEEDGES Y (10N) (70N) RANGEEDGES T (Dec-Feb 1870-2004) VALUES {Y cosd}[X Y][T]svd .Ts ev (1) VALUE Press the OK button. CHECK The above commands add the SST anomaly data to the interface. The singular value decomposition of this data has already been preformed, and the 1st eigenvector has been selected.
Correlate Datasets	In Expert Mode, enter the following line under the text already there: [T] correlate Press the OK button. CHECK The above command correlates the two sets of data. The correlation coefficient is located under the Expert Mode text box in bold: 0.249616. We can conclude there is a slight correlation between MSLP anomalies and SST anomalies in the North Atlantic. The correlation coefficient is not very high because correlations between the 1st SST anomaly strucuture, for example, can be found in multiple MSLP anomaly structures. SVD analyses of the MSLP and SST datasets are independent of each other. There is no guarantee that the maximum amount of association between two variables will be found in two distinct principal component analysis time series. However, it has been proven that there is a relationship between these two datasets, specifically between these two structures. Atmospheric anomalies do cause SST anomalies, and vice versa. In this example, changes in MSLP sometimes cause an anomalous atmospheric cyclonic circulation centered around 40° W and 30° N. The cyclone weakens the normal northerly winds off the west coast of Africa. As a result, coastal upwelling is reduced and positive SST anomalies occur. Scroll up the page to the first EOF structure in the first example. Notice the extremely low values off the coast of West Africa. This SST variability is associated with variations in MSLP that produce the anomalous low.

Disadvantages of Unrotated Singular Value Decomposition

Unrotated emperical orthogonal functions (EOFs) are often very useful to describe natural modes of variability in a data field, due to their spatial and temporal orthogonality, ability to extract the maximum variance from a field, and relative simplicity. Yet, unrotated emperical orthogonal functions generally do a poor job of isolating individual modes of variation. This weakness is largely due to four inherent characteristcs of unrotated EOFs: domain shape dependence, subdomain instability, sensitivity to sampling, and an inaccurate portrayal of the physical relationships embedded within the input data (Richman 1986).

Domain Shape Dependence

Unrotated EOFs can be primarily determined by the shape of the domain rather than by the covariation of the data. In these cases, structures of the unrotated EOF analysis do not resemble any of the single input patterns, but rather, they represent combinations of the input patterns.
Subdomain Instability

Unrotated EOFs usually exhibit poor subdomain stability, where subdomain instability refers to the stability of the modal patterns as sub-portions of the domain. Richman and Lamb (1985) did a study where unrotated EOF analyses were performed on the same set of data, once over an entire domain and once over the northern and southern halves of the domain separately. The results for each half of the domain did not correspond with the results of the entire domain, which leads to the question: How robust are the results from an unrotated EOF?
Sensitivity to Sampling

When eigenvalues are close together, they may be dominated by noise and the corresponding EOFs may not be well defined.
Lack of Physical Meaning

Unrotated EOFs sometimes produce results that are not physically meaningful.

Rotated Singular Value Decomposition

In a rotated EOF analysis, the eigenvectors are weighted by the square root of their corresponding eigenvalues, so that the weights (i.e., loadings) represent the correlations between each variable and principal component. Most rotations are simple expressions which approximate a simple structure through the application of mathematical algorithms which distribute the PC loadings such that the dispersion of the loadings is maximized.

Varimax rotation is the most widely accepted method for analytical rotation. The Varimax method reduces variances of the projection of the data onto the rotated basis, where the projection is the principal component time series. This improves the alignment of the basis with the actual data and improves the relationship between their spatial and temporal patterns and known physical mechanisms. Varimax is a method for rotating the axes of a plot such that the eigenvectors remain orthogonal as they are rotated. These rotations are used in principal component analysis so that the axes are rotated to a position in which the sum of the variances of the loadings is the maximum possible (Oilfield Glossary). In the Data Library, the varimax function requires the user to specify the number of eigenvectors to use in the rotation. The matrix of loadings is determined by the truncated eigenvectors.

Many atmospheric scientists argue that rotated EOF analysis is a more effective tool than unrotated EOF analysis for the study of atmospheric circulation patterns. While EOF rotation is often very useful, it is not meant to be a default operation after every EOF analysis. The application of actual EOFs should be guided by the specific analysis.

Advantages / Disadvantages of Varimax Rotation

Advantages

Less affected by domain dependence than unrotated EOF analyses.
Varimax analyses of subdomains more stable than unrotated EOF analyses.
When neighboring eigenvalues are similar in value, patterns not present in the unrotated EOFs may become present after rotation.
Eigenvectors still remain orthogonal as they are rotated.
Generally exhibits a stronger relationship between components and known physical mechanisms than unrotated EOFs.
Rotated EOFs often in better agreement with physical patterns than unrotated EOFs.

Disadvantages

More complex than unrotated EOFs.
Sometimes difficult to determine when rotation is useful.
Not applicable to cases where sole purpose of EOF analysis is data reduction.
In some cases, will not increase the physical explainablity of the data (may cause more harm than good).

Varimax Rotation of East Pacific Sea Surface Temperature Data

Example: Perform a varimax rotation of an SVD analysis of East Pacific sea surface temperatures.

Locate Dataset and Variable	Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Air-Sea Interface" link. Scroll down the page and select the NOAA NCEP EMC CMB GLOBAL Reyn_Smith dataset. Click on the "Reyn_SmithOIv2" link. Click on the "monthly" link. Click on the "Sea Surface Temperature Anomaly" link under the Datasets and Variables subheading. CHECK
Select Temporal and Spatial Domains	Click on the "Data Selection" link in the function bar. Enter the text 180W to 70W and 35S to 35N in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK
Compute Singular Value Decomposition	Click on the "Expert Mode" link in the function bar Enter the following line under the text already there: {Y cosd} [X Y] [T] svd Press the OK button. CHECK The svd function computes the singular value decomposition of the SST dataset weighted over the cosine of the latitude. Five new variables appear under the Datasets and Variables subheading: normalized eigenvalues, structures, singular values, time series, and weights.
View Structures	Click on the "structures" link under the Datasets and Variables subheading. CHECK In the function bar, select the viewer with land shaded in black. CHECK 1st Structure of SVD SST Anomalies The first structure is representative of the El Niño Southern Oscillation. Recall that the first structure is the pattern that explains the most variability in the original set of data. The relatively large positive values located immediately off the west coast of South America correspond to the variability in SSTs caused by upwelling during La Niño years and the lack of upwelling during El Niño years. Notice that these values extend westward in a narrow line, and as a result, do not cover much surface area in the Pacific. However, ENSO generally effects a greater area than depicted by this first structure. One explanation is that part of the ENSO pattern might be contained in another strucuture, or multiple structures.
Return to Dataset Page	Click on the right-most link in the blue source bar to exit the viewer. In the source bar, click on the { Y cosd } [ X Y ] [ T ] svd link. CHECK This will remove the structures variable selection and return you to the SVD page.
Perform Varimax Rotation	Click on the Expert Mode link in the function bar. Enter the following line under the text already there: 3 varimax Press the OK button. CHECK The varimax function above performs a varimax rotation using the first three eigenvectors. Changing the number before the varimax command will change the number of eigenvalues to be entered into the function. Seven new variables appear under the Datasets and Variables subheading: varimax rotation, communalities, energy, rotated structures, singular values, time series, and weights.
Select Rotated Structures Variable	Click on the "rotated structures" link under the Datasets and Variables subheading. CHECK
View Structures	In the function bar, select the viewer with land shaded in black. CHECK 1st Structure of SVD Varimax Rotated SST Anomalies Notice that the colorscale is not centered around 0°. To enhance the interpretability of the image, the colormap can be adjusted so that the scale is centered around 0°.
Return to Dataset Page	Click on the right-most link in the blue source bar to exit the viewer. CHECK
Generate Colormap	In the Expert Mode text box, enter the following lines below the text already there: startcolormap -1.5 1.5 RANGE white DarkViolet DarkViolet -1.5 VALUE cyan 0 VALUE white 0 bandmax yellow orange 0.5 VALUE red 1.5 VALUE firebrick endcolormap Press the OK button. CHECK The colorscale is depicted at the bottom of the dataset page. Values less than -1.5° are assigned the color DarkViolet and values greater than 1.5° are assigned the color firebrick. Values of 0° are white. Missing values are also white. For more information on colorscales, see the Data Library Tutorial.
View Structures	In the function bar, select the viewer with land shaded in black. CHECK 1st Structure of SVD Varimax Rotated SST Anomalies By rotating the first three eigenvectors via the varimax method, the resulting structure is more representative of the physical pattern (ENSO) than the unrotated EOF structure illustrated earlier in the example. Pieces of the ENSO pattern contained in the multiple unrotated principal components have been incorporated into one rotated component. The negative values now extend farther north and south, as well as to the west. Many times, rotating the EOFs / PCs will result in a solution that better explains the underlying physical patterns in the input data.