Interpolation Techniques
Interpolation is the process of using known data values to estimate unknown data values.
Various interpolation techniques are often used in the atmospheric sciences. One of the simplest methods, linear interpolation, requires knowledge of two points and the constant rate of change
between them. With this information, you may interpolate values anywhere between those two points.
More sophisticated interpolations are also available in the Data Library. They are often applied to station datasets with irregular spacing between stations.
The Cressman and Weaver analysis interpolation techniques are covered in this tutorial section.
Both methods are primarily used to estimate equally-spaced latitude / longitude grid data from station data or gridded data with non-constant spacing.
Linear Interpolation
Linear interpolation is a simple technique used to estimate unknown values that lie between known values.
The concept of linear interpolation relies on the assumption that the rate of change between the known values is constant and can be calculated from these values using a simple slope formula.
Then, an unknown value between the two known points can be calculated using one of the
points and the rate of change. Linear interpolation is a relatively straightforward method, but is often not sophisticated enough to effectively interpolate station data to an even grid. Linear interpolation is often used to regrid
evenly-spaced data, such as longitude / latitude gridded data, to a higher or lower resolution.
Example: Regrid NOAA NCDC GCPS Monthly Gridded Precipitation Anomalies for Europe from a 5° x 5° resolution to a 1° x 1° resolution.
Locate Dataset and Variable |
- Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Atmosphere" link.
- Select the
NOAA NCDC GCPS MONTHLY GRIDDED dataset.
- Click on the "precipitation" link under the Datasets and Variables subheading.
- Click on the "anomalies" link, again under the Datasets and Variables subheading. CHECK
|
Select Temporal and Spatial Domains |
-
Click on the "Data Selection" link in the function bar.
- Enter the text 13W to 32E, 35N to 60N, and Oct 1993 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
View Gridded Data at a 5° x 5° Resolution |
- To see the results of this operation, choose the viewer window with coasts drawn. CHECK
October 1993 Precipitation Anomalies in Europe at 5°x 5° Resolution
The resolution of this dataset is relatively low, which makes the image appear fairly discontinuous.
Linear interpolation can be used to help smooth the data by changing the grid to a higher resolution (e.g., 1° x 1°).
|
Perform Linear Interpolation |
|
View Results at a 1° x 1° Resolution |
- To see the results of this operation, choose the viewer window with coasts drawn. CHECK
October 1993 Precipitation Anomalies in Europe at 1°x 1° Resolution
The data appears more continuous at a higher resolution.
Above average precipitation amounts are found over the Alps in Northern Italy and Southern France and precipitation deficits are located over the Northern United Kingdom and Ireland.
White-colored grid boxes represent regions of missing data.
|
Cressman Analysis
George Cressman developed the Cressman interpolation technique in 1959.
The technique interpolates station data to a user-defined latitude-longitude grid.
Multiple passes are made through the grid at consecutively smaller radii of influence to
increase precision.
The radius of influence is defined as the maximum radius from a grid point to a station
by which the observed station value may be weighted to estimate the value at the grid
point. Stations beyond the radius of influence have no bearing on a grid point value. At
each pass, a new value is calculated for each grid point based on its correction factor.
This correction factor is determined by analyzing each station within the radius of
influence.
For each such station, an error is defined as the difference between the station value
and a value arrived by interpolation from the grid to that station.
A distance-weighted formula (shown below) is then applied to all such errors within the
radius of influence of the grid point to arrive at a correction value for that grid point.
The correction factors are applied to all grid points before the next pass is made.
Observations nearest the grid point carry the most weight.
As the distance increases, the observations carry less weight. The
cressman
function in Ingrid calculates the weights as follows:
W = (R
2 - r
2)/(R
2 + r
2)
where R = influence radius and r = distance between the station and the gridpoint. The weighting function is pictured below.
As the radius of influence is tightened, results become more representative of the observed data.
The analysis value at each gridpoint is calculated as the analysis value from the previous pass added to the sum of the products of the calculated weights and the difference between the actual station value
and the interpolated background value at the station, divided by the sum of the weights. The Data Library performs three passes by default, at 4, 2.5, and 1.5. These numbers are proportions of the average minimum station distance calculated in the function.
These parameters may be changed when entering the command in Ingrid. There is also a minimum station number parameter, which ensures a certain number of station data points must be included within the radius of influence
for an analysis value to be calculated for that gridpoint. If the minimum station number requirement is not met for a given gridpoint, a missing value will be assigned.
Cressman schemes may be used with data assimilation as well. Data assimilation is the analytical process of incorporating an estimation of the state of the atmosphere into a numerical model.
By the early 1960s, weather centers within the United States began using data assimilation methods to improve forecasting techniques. They used interpolation techniques, such as the Cressman analysis, to interpolate
current atmospheric conditions onto an evenly spaced grid.
The Cressman analysis assigns weighted values of the observed stations to the model initialization, similar to the interpolation technique described above.
However, Cressman suggests that persistence (climatology) values can be assigned if there are an insufficient number of stations in the area.
Advantages
- Simple and computationally fast (speed depends upon the number of scans).
- Generally more accurate than other simple methods such as linear interpolation.
Disadvantages
- Can be unstable if grid density is higher than station density (i.e., more grid points than station data points).
- Sensitive to observational errors (random observation errors can generate unphysical features in analysis).
- Analysis may produce unrealistic extrema in the grid values, especially near the edges of the spatial domain.
- Does not account for the distribution of observations relative to each other.
- Consistency of the result with observations varies with observation (station) density.
- Optimum radii of influence have to be determined by trial and error.
Example: Perform a Cressman analysis of monthly surface temperature anomaly data over Australia for December 2000.
Locate Dataset and Variable |
- Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Atmosphere" link.
- Select the
NOAA NCEP CPC CAMS dataset.
- Click on the "station" link under the Datasets and Variables subheading.
- Click on the "temperature" link again under the Datasets and Variables subheading. CHECK
|
Select Stations In Australia and Surrounding Islands |
- Click on the "searches" link to the right of the map.
- In the lat text boxes under the Searches subheading, enter the values -40 and -10.
- In the lon text boxes under the Searches subheading, enter the values 112 and 155.
- Click the Search NOAA NCEP CPC CAMS station temperature button.
- Click on the link immediately below the Searches subheading that says "Dataset (and map) with all data found in search."
CHECK
You have selected all of the stations within 40° S to 10° S and 112° E to 155° E. This region encompasses Australia and a few surrounding islands. For more
information on finding station ID's, click the following link
to the Data Library tutorial: How
to Find A Station ID
|
Select Stations With 30-Year Climatology |
|
Compute Anomalies |
|
Select Temporal Domain |
- In the Expert Mode text box, enter the following line below the text already there:
T (Dec 2000) VALUE
- Press the OK button.
CHECK
The command selects the month of December in the year 2000.
*NOTE: You may not select ranges via the Data Selection link in the function bar. If you do, some commands entered earlier will be erased.
|
Perform Cressman Analysis |
|
View Results |
- To see the results of this operation, choose the viewer window with coasts drawn. CHECK
Australia Surface Temperature Anomaly for December 2000 at 1°x 1° Resolution
The default color scheme does not appear to be particularly ideal or intuitive.
Changing the color scheme may greatly improve the visual representation of the data.
|
Generate Colormap |
|
View Results with New Color Scheme |
- To see the results of this operation, choose the viewer window with coasts drawn. CHECK
Australia Surface Temperature Anomaly for December 2000 at 1°x 1° Resolution
Northern Australia experienced below average temperatures while Southern Australia generally experienced above average temperatures for December 2000.
The more extreme anomalies appear to line up along a constant meridian near 138° E.
|
Weaver Analysis
The Weaver analysis is another type of interpolation scheme, developed at the Climate Prediction Center in the 1970's.
In the Data Library, a simplified weaver function can be used to perform unweighted interpolation, examples of which are shown below.
Weaver analysis is different from the Cressman analysis in many ways.
First, the Weaver analysis does not use radii of influence. When the
weaver function is used to interpolate to an equally spaced
longitude/latitude grid, only the observations located within each
grid box are used to calculate the interpolated value for that grid
box. Any observations located outside the grid box, regardless of
their proximity to the boundary of the grid box, will not affect the
interpolated value of the grid box. Second, the weaver
function does not weight the values of the observations. The value of
each grid box is found by computing a simple arithmetic average of the
observations.
Advantages
- Simple and fast (speed depends upon the resolution of the grid).
- Generally more accurate than other simpler methods such as linear interpolation.
- Avoids unrealistic, extreme values at the edges of the domain.
Disadvantages
- Spatial resolution is increased at the expense of missing data, and vice versa.
- As the resolution decreases (i.e., number of stations per grid box increases), extreme observations generally have less impact on the interpolated values.
- As the resolution increases (i.e., number of stations per grid box decreases), more grid boxes will not contain at least one station, and will therefore be identified as missing.
- Ignores the influence of stations outside each grid box.
- Potential errors enhanced at very low spatial resolutions and when using datasets with low station density.
Example: Perform a Weaver analysis of monthly surface temperature anomaly data over Austrailia for December 2000.
Locate Dataset and Variable |
*NOTE: This example uses the same dataset and variable as the previous example.
- Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Atmosphere" link.
- Select the
NOAA NCEP CPC CAMS dataset.
- Click on the "station" link under the Datasets and Variables subheading.
- Click on the "temperature" link again under the Datasets and Variables subheading. CHECK
|
Select Stations In Australia and Surrounding Islands |
- Click on the "searches" link to the right of the map.
- In the lat text boxes under the Searches subheading, enter the values -40 and -10.
- In the lon text boxes under the Searches subheading, enter the values 112 and 155.
- Click the Search NOAA NCEP CPC CAMS station temperature button.
- Click on the link immediately below the Searches subheading that says "Dataset (and map) with all data found in search."
CHECK
You have selected all of the stations within 40° S to 10° S and 112° E to 155° E. This region encompasses Australia and a few surrounding islands. To get
general information on finding station ID's, click the following link
to the tutorial: How
to Find A Station ID
|
Select Stations With 30-Year Climatology |
|
Compute Anomalies |
|
Select Temporal Domain |
- In the Expert Mode text box, enter the following line below the text already there:
T (Dec 2000) VALUE
- Press the OK button.
CHECK
The command selects the month of December in the year 2000.
*NOTE: You may not select ranges via the Data Selection link in the function bar. If you do, some commands entered earlier will be erased.
|
Perform Weaver Analysis |
|
Generate Colormap |
|
View Results |
- To see the results of this operation, choose the viewer window with coasts drawn. CHECK
Australia Surface Temperature Anomaly for December 2000 at 1° x 1° Resolution
The effect of choosing such a high resolution (1° x 1°) is that most of the grid boxes contain missing values.
Yet, it is still possible to make out the below average temperatures in Northern Australia and the above average temperatures in Southern Australia.
However, the primary purpose of interpolation is to estimate values where there are no observations. We should therefore try using a lower resolution.
|
Perform Weaver Analysis at a 5° Resolution |
- Click on the right-most link in the blue source bar to exit the viewer.
- In the Expert Mode text box, change the following text:
- Change the number 1 in the X 100 180 1 RANGEEDGESTEP command to the number 5.
- Chage the number 1 in the Y -60 0 1 RANGEEDGESTEP command to the number 5. CHECK
These changes will adjust the resolution of the spatial grid from 1° x 1° to 5° x 5°. As a result, more grid boxes
will contain observations. Missing values should occupy less area in the spatial grid.
|
View Results |
- To see the results of this operation, choose the viewer window with coasts drawn. CHECK
Australia Surface Temperature Anomaly for December 2000 at 5° x 5° Resolution
At a lower resolution, there are fewer grid boxes with missing data. However, the accuracy of the
interpolated values generally declines as the resolution increases.
Observations are averaged over large distances, reducing variability within each grid box.
|
Example: Perform a Weaver interpolation of the Normalized Difference Vegetative Index (NDVI) in South America to an even longitude / latitude grid at a 0.2° resolution.
Locate Dataset and Variable |
-
Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Topographic and Land Characteristics" link.
- Select the NASA GES-DAAC PAL vegetation dataset.
- Select the "south america" link under the Datasets and Variables subheading.
CHECK
|
Select Domains |
- No ranges will be adjusted in this example.
The dataset will be analyzed over its entire temporal and spatial grids.
|
Perform Weaver Interpolation |
|
View Results |
- To see the results of this operation, choose the viewer with coasts drawn. CHECK
- You may adjust the time period represented by the image by editing the range in the text box above the viewer.
South America Regrided NDVI Data for July 13-20, 1981 at a 0.2° x 0.2° Resolution
The most common measurement of plant growth density is the Normalized Difference Vegetation Index (NDVI).
The NDVI, like most other vegetative indices, is calculated as a ratio between measured reflectivity in the red and near infrared sections of the electromagnetic spectrum.
These spectral bands are chosen because they are most affected by the absorption of chlorophyll in leafy green vegetation.Very low values of NDVI (0.1 and below) correspond to barren
areas of rock, sand, or snow. Moderate values represent shrub and grassland (0.2 to 0.3), while high values indicate temperate and tropical rainforests (0.6 to 0.8). The weaver function regrids the NDVI data to an equal latitude /
longitude grid. It appears from the image that in July of 1981, dense vegetation would be found within the Amazon River Basin, while less dense areas would be found to the south and east. If you change the time period above the image to a more recent date,
you will observe that a similar vegetative pattern in South America still exists today.
|