Measures of Dispersion
While measures of central tendency are used to estimate "normal" values of
a dataset, measures of dispersion are important for describing the spread of the
data, or its variation around a central value.
Two distinct samples may have the same
mean or
median, but completely different levels of variability, or vice versa. A proper
description of a set of data should include both of these characteristics. There
are various methods that can be used to measure the dispersion of a dataset, each
with its own set of advantages and
disadvantages.
Standard Deviation
- The standard deviation is the square root of the sample variance.
- Defined so that it can be used to make inferences about the population variance.
- Calculated using the formula:
- The values computed in the squared term, xi - xbar, are
anomalies, which is discussed in another section.
- Not restricted to large sample datsets, compared to the root mean square anomaly discussed later in this section.
- Provides significant information into the distribution of data around
the mean, approximating normality.
- The mean ± one standard deviation contains approximately
68% of the measurements in the series.
- The mean ± two standard deviations contains approximately
95% of the measurements in the series.
- The mean ± three standard deviations contains approximately 99.7% of the measurements in the series.
- Climatologists often use standard deviations to help classify abnormal climatic conditions.
The chart below describes the abnormality of a data value by how many
standard deviations it is located away from the mean. The probablities in the third column assume the data is normally distributed.
Standard
Deviations Away From Mean
|
Abnormality
|
Probability of Occurance
|
beyond
-3 sd
|
extremely
subnormal
|
0.15%
|
-3
to -2 sd
|
greatly
subnormal
|
2.35%
|
-2
to -1 sd
|
subnormal
|
13.5%
|
-1
to +1 sd
|
normal
|
68.0%
|
+1
to +2 sd
|
above
normal
|
13.5%
|
+2
to +3 sd
|
greatly
above normal
|
2.35%
|
beyond
+3 sd
|
extremely
above normal
|
0.15%
|
Oliver, John E. Climatology: Selected Applications. p 45.
Example: Calculate the standard deviation of monthly cloud cover over Equatorial Africa for January 1960 to December 1962.
Locate Dataset and Variable |
-
Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Atmosphere" link.
- Scroll down the page and select the
UEA CRU New CRU05 dataset.
- Click on the "monthly" link.
- Select the "cloud cover" link under the Datasets and Variables subheading.
CHECK
|
Select Temporal and Spatial Domains |
- Click on the "Data Selection" link in the function bar.
- Enter the text 25W to 50E, 40S to 38N, and Jan 1960 to Dec 1962 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
Calculate Standard Deviation Values |
|
View Standard Deviation Values |
- To see the results of this operation, choose the viewer window with coasts outlined.
CHECK
Standard Deviation of Monthly Cloud Cover
Equatorial Africa exhibits low standard deviation values of monthly cloud cover
compared to regions to its north and south. High standard deviation values
correspond to areas with large interannual cloud cover variability.
Note that the root mean square anomaly can be substituted for the standard devation if the sample size is sufficiently large.
(Devore, Jay L. Probability and Statistics for Engineering and the Sciences. pp. 38-39, 259.)
|
Root Mean Square Anomaly / Root Mean Square
Root Mean Square Anomaly
- Also known as root mean square deviation.
- Very similar to standard devation, except used for large
sample sizes (i.e., divisior is n instead of n-1) (Devore).
- RMSA calculated using the formula:
, where xbar is the mean, xi is each data value, and
n is the number of observations.
- The term xi xbar is an anomaly, which is discussed
in another section.
- Provides similar information into the dispersion of data as the standard
deviation.
- Often used as a measurement of error.
- More commonly used than the standard deviation function in the statistical
analysis of climate data because climate-related datasets are generally quite large in
size, in terms of number of data points.
Root Mean Square
- Calculated using the formula:
- Unlike the RMSA or standard deviation, The mean is not removed in the
calculation.
- Acceptable to use only when dealing with large sample datasets (Devore).
Example: Calculate the root mean square anomaly of monthly cloud cover over Africa for January 1960 to December 1979.
Locate Dataset and Variable |
-
Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Atmosphere" link.
- Scroll down the page and select the
UEA CRU New CRU05 dataset.
- Click on the "monthly" link.
- Click on the "cloud cover" link under the Datasets and Variables subheading.
CHECK
|
Select Temporal and Spatial Domains |
-
Click on the "Data Selection" link in the function bar.
- Enter the text 25W to 50E, 40S to 38N, and Jan 1960 to Dec 1979 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
Calculate Root Mean Square Anomaly |
- Click on the "Filters" link in the function bar.
- Select the RMSA over "T" command.
CHECK EXPERT
The result is a set of root mean square anomaly values (i.e. root mean square with mean removed).
Higher (lower) values represent a larger (smaller) distribution of monthly cloud cover about the mean.
*Note: Choosing the Root Mean Square over T instead of the Root Mean Square Anomaly over T will produce very different results.
After completing the example, try going back and selecting the RMS over "T" command to see the difference between the two functions.
|
View Root Mean Square Values |
- To see the results of this operation, choose the viewer window with coasts outlined. CHECK
Root Mean Square Anomaly of Monthly Cloud Cover
Relatively low root mean square anomaly (RMSA) values are found
in Equatorial Africa while regions to the north and south possess higher values.
High RMSA values correspond to areas with large interannual cloud cover variability.
|
Interquartile Range (IQR)
- Calculated by taking the difference between the upper and lower quartiles (the 25th percentile subtracted from the 75th percentile).
- A good indicator of the spread in the center region of the data.
- Relatively easy to compute.
- More resistant to extreme values than the range.
- Doesn't incorporate all of the data in the sample, compared to the median absolute deviation discussed later in the section.
- Also called the fourth-spread.
Example: Find the interquartile range of climatological monthly
precipitation in South America for January 1970 to December 2003.
Locate Dataset and Variable |
-
Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Atmosphere" link.
- Select the NASA GPCP V2 dataset.
- Select the "multi-satellite" link under the Datasets and Variables subheading.
- Select the "precipitation" link again under the Datasets and Variables subheading.
CHECK
|
Select Temporal and Spatial Domains |
-
Click on the "Data Selection" link in the function bar.
- Enter the text 90W to 30W, 60S to 10N, and Jan 1970 to Dec 2003 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
Compute Monthly Climatologies |
-
Select the "Filters" link in the function bar.
- Choose the Monthly Climatology command.
CHECK
EXPERT
This command computes the average cloud cover over all years for each month, January through December (i.e. climatological monthly cloud covers).
|
Calculate Interquartile Range |
|
View Interquartile Range |
- To see the results of this operation, choose the viewer window with coasts outlined. CHECK
Interquartile Range of Climatological Monthly Precipitation
The higher the interquartile range, the more variability in the data. The Amazon Basin exhibits high intraannual precipitation variability, while areas to the north and south exhibit lower precipitation variability.
|
Median Absolute Deviation (MAD)
- A more comprehensive alternative to the IQR by incorporating all of the data in the sample.
- MAD = median |Xi q.5| where Xi represents
each value and q.5 represents the median.
Example: Find the median absolute deviation of climatological monthly precipitation in South America for January 1970 to December 2003.
Locate Dataset and Variable |
*NOTE: This example uses the same dataset and variable as the previous example.
-
Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Atmosphere" link.
- Select the NASA GPCP V2 dataset.
- Select the "multi-satellite" link under the Datasets and Variables subheading.
- Select the "precipitation" link again under the Datasets and Variables subheading.
CHECK
|
Select Temporal and Spatial Domains |
- Click on the "Data Selection" link in the function bar.
- Enter the text 90W to 30W, 60S to 10N, and Jan 1970 to Dec 2003 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
Compute Monthly Climatologies |
- Select the "Filters" link in the function bar.
- Choose the Monthly Climatology command.
CHECK
EXPERT
|
Calculate Median Absolute Deviation |
|
View Median Absolute Deviation |
- To see the results of this operation, choose the viewer window with coasts outlined. CHECK
Median Absolute Deviation of Climatological Monthly Precipitation
The higher the median absolute deviation, the more variability in the data.
Similar to the IQR example, the Amazon Basin exhibits high intraannual precipitation variability, while areas to the north and south exhibit lower precipitation variability.
|
Trimmed Variance
- Similar to variance, except that a proportion of the largest and smallest values in the dataset
are ommitted before it is calculated.
- Less affected by outliers since the largest x% and the smallest x%
of the sample are eliminated.
- Typical range for x% is 5% to 25%.
- Sometimes multiplied by an adjustment factor to
make it more consistant with the ordinary sample variance. (Wilks, Daniel S. Statisical Methods in the Atmospheric Sciences. p 26).
- Analogous to the trimmed mean.
Example: Find the trimmed variance of average OLR values in the eastern United States for January 1980 to December 1999.
Locate Dataset and Variable |
- Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Cloud Characteristics and Radiation Budget" link.
- Select the NOAA NCEP CPC GLOBAL dataset.
- Click on the "monthly" link
- Select the "outgoing longwave radiation" link under the Datasets and Variables subheading. CHECK
|
Select Temporal and Spatial Domains |
- Click on the "Data Selection" link in the function bar.
- Enter the text 70W to 90W, 20N to 60N, and Jan 1980 to Dec 1999 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
Calculate Spatial Average |
- Click on the "Filters" link in the function bar.
- Select the Average over "XY" link.
CHECK
EXPERT
This command takes a spatial average of the data.
|
Find Trimmed Variance |
Since this dataset is large, we can assume that the root mean square is an acceptable estimate of the standard deviation.
The value of the root mean square is 7.811205 W/m2.
Calculate the trimmed variance by squaring the value above.
- In Expert Mode, enter the command:
dup mul
- Click the OK button. CHECK
The trimmed variance should be 61.01493 kg2s-6.
|