What
There are two types of analysis available. One is for a 24 hour period which ended at 6 in the winter or 7 in the summer on the morning of the day indicated. The other is for a range of days and is referred to as the "climate" surface. Climate surfaces are discussed separately.
For periods when rain occurred two surfaces and a drop list are provided. The "Core Area" map shows the central portion of the the CHARM network and precipitation is scaled in inches. The "Extended Area" map shows the entire region covered by the current network and is scaled in mm. Each map gives the number of stations which were used to make the surface, some simple statistics about the data, the number of stations that reported and were accepted for analysis, how many of the accepted reporting stations were dropped from the analysis, and how many of the accepted stations reported something other than zero. The gages which were used are mapped as blue triangles and the stations that were not accepted after analysis of the reported value are shown in red. All stations which reported anything at all, but were dropped or not accepted are given in the "Drop List". The title of each map gives the time when the map was made. If the time is 23:60 the surface was generated after the midnight following the close of the 24 hour period.
If sufficient data are not available to make a surface a "narrative" file is provided. There are two possible reasons a surface is not made and the narrative tells which is the case.
When
When the software is run in an automated loop the CHARM database is queried at 7 AM for reports from the previous reporting period. If possible surfaces are made and sent to the server. This is repeated on a 5 minute cycle until late in the morning, when the cycle is changed to 15 minutes. At noon the cycle is stopped and a final analysis done at 3 in the afternoon.
A map will be made each time more than 10 accepted stations have reported and there are either at least 5 stations reporting more than zero or the analysis software has detected a probable localized extreme event. If no map is generated a "Narrative" text is available which explains why a map was not made.
How
In technical terminology the CHARM Rain Network Surface Analysis software recursively fits a first degree Bspline to the CHARM region using increasingly smaller knot spacings. To do this the algorithm must handle several difficult aspects of the CHARM data.
First, the database of values has had very little checking before the analysis software gets it. Therefore, before accepting the report for a station it goes through a series of initial checks. For example dates are checked to eliminate such things as Feb. 31 and observation times must span 24 hours +/- 30 minutes. Each report that does not pass one of these checks it removed from the analysis and listed in the "Drop List" with a "#" at the start of the line. (The reason for the "#" symbol has to do with how the maps are made in gnuPlot.) The most common reason a report is not accepted for further analysis is more than 1 day is reported at a time.
Those gages which pass these individual tests are "accepted" for the next stage of checking, which is a three step process. In the first step, a "neighborhood" is defined for each gage. The neighborhood is found by finding the 5 nearest, accepted stations which are more than 100 meters away and do not coincide with a station already in the neighborhood. Note that the 5 nearest will depend on what stations have reported. Then if a gage reports more than 3 times the precipitation any of the 5 other gages in the neighborhood reports, the analysis software assumes the report is probably an erroneous high value and the station report is flagged. In similar fashion if a gage is less than a third of all other gages in the neighborhood, an erroneous low value is assumed and the station report is flagged. In the second step a new set of neighborhoods is computed, without using the gages that are probably wrong. Again each gage is checked against the neighborhood and probable errors are noted. In the third step the original neighborhoods of those gages flagged in the first step are checked to see if any other stations in its neighborhood were also flagged as being probable errors in the same sense, high versus low. We think it is more likely that only 2 gages will catch a local, high magnitude event than two independent observers, spatially separated by at least 100 meters but in the same neighborhood, will make wild errors with the same sense. Therefore, if such a condition is found all gages of the same sense, high versus low, in that neighborhood will be assumed to be correct. Otherwise, all the flagged stations are added to the "Drop List," but without a leading "#". In the maps these stations are shown as red triangles.
After this the number of reports that remain is checked. Building a surface requires at least 10 valid reports of which at least 5 must be greater than zero. Or, If there at least 10 valid reports and a local, high magnitude event has been detected a surface still will be made.
The surfacing of the data must be extremely robust. The minimum distance to another point typically varies by more than 2 orders of magnitude and can reach 3 orders of magnitude. The distribution of reporting stations varies with time. Not all of the gages are identical. While most of the gages are quite similar there is no standardization in their site environment. Thus, even with the preceding analysis there remains considerable local variation which is unlikely to be related to actual variations in precipitation. The spatial wavelength of significant precipitation variation can vary from many kilometers to meters. In some cases one or more storm cells may traverse the CHARM area, introducing spatial wavelengths that are wildly dissimilar in two directions.
The surfacing of the CHARM data must be extremely robust in order to handle all of these characteristics. The algorithm we have chosen is based on a recursive Bspline of order 1 in both axis. Once there are enough reports to proceed the software fits a plane through all the points. This model is used to estimate a grid of "model" values. The CHARM area is then recursively subdivided into smaller tiles and at each iteration a fit is made to the actual data along with the model values made by the previous iteration. The fit is done using a Bspline of order 1 in X and 1 in Y. The subdivision process is done in such a way that a single actual data value in a tile has the the same significance as all of the model values in the tile. If there is more than 1 observation within a tile the model values have relatively little significance in that area.
The process of subdivision continues until the tiles are 2.5 km on a side. This distance was chosen based on the current spatial distribution of the CHARM stations. The mean and median distances to the nearest station are 2930 and 1870 meters. The next station has a mean distance of 3270 meters with a median of 1990 meters. All other things being equal, the average of multiple measurements is likely to be superior to any individual measurement. Therefore, the modeling process should be halted when most tiles which do have any observed data have approximately 2 reports.
On Going Analysis
In the analysis of the neighborhoods the software currently uses a fixed ratio of 3 to determine if a value is likely to be an error. We would like to eliminate this fixed value and let a heuristic algorithm determine the value. As part of this effort we are currently reporting the coefficient of variation (Std. Dev/Mean) for each neighborhood that is dropped as well as the average and maximum coefficients of variation found for all of the final neighborhoods.
Other Products Currently Available
There are several other products currently being made by the analysis software. These are available to qualified researchers upon request.
The distance between all pairs of gages, based on the UTM projection, NAD83.
For each date a table of the stations used to make the surface, their coordinates and precipitation reported.
For each date the observed minimum and maximum values in X,Y,Z, the average and standard deviation for the entire data set, the average and maximum coefficient of variation for the neighborhoods.
The grid of surface values computed for each date.
There are also various products which are currently coded but not produced by default. For example the stations which compose each neighborhood, as well as the distance and the azimuth to each is available. Residuals of the surface versus the observations can also be produced.
Remarks
The methodology used to make the surfaces is objective and automatic. There is not even manual validation of reported values. Implicit and explicit assumptions are few. Implicit assumptions are
the use of a Bspline of degree 1 in X and Y,The sole explicit assumption isthat all of the model points used in a tile have a significance equal to a single observation,
final tile size of 2.5 km.
use of a ratio of 3 and 1/3 to detect probably erroneous values within a neighborhood.Both the implicit and explicit assumptions are in fact driven by observed data characteristics. The observations actually retain significant variation which is due to either very high frequency spatial variations in the precipitation or extraneous variables. Lacking any conceptual framework to discriminate between the possibilities removing them by averaging over multiple gages was chosen as the best option. The value of 3 was chosen by trial and error. It was found that a value of 2 or 2.5 removed too many reports believed to be correct. To the extent there is no provision in the code to vary these attributes of the algorithm based on input data the process should not be considered perfectly objective.
Page Curator: Paul J. Meyer (paul.meyer@msfc.nasa.gov)
Responsible Official: Dr. James E. Arnold (jim.arnold@msfc.nasa.gov)