**7. Quality control of the NCDC dataset to create a serially complete dataset.**

Development of continuous and high-quality climate datasets is essential to populate Webdistributed databases (17) and to serve as input to Decision Support Systems (e.g., 27).

Serially complete data are necessary as input to many risk assessments related to human en‐ deavor including the frequency analysis associated with heavy rains, severe heat, severe cold, and drought. Continuous data are also needed to understand the climate impacts on crop yield, and ecosystem production. The National Drought Mitigation Center (NDMC) and the High Plains Regional Climate Center (HPRCC) at the University of Nebraska are de‐ veloping a new drought atlas. The last drought atlas (1994) was produced with the data from 1119 stations ending in 1992. The forthcoming drought atlas will include additional stations and will update the analyses, maps, and figures through the period 1994 to the present time. A list was compiled from the Applied Climate Information System (ACIS) for stations with a length of at least 40 years of observations for all three variables: precipitation (PRCP), maximum (Tmax), and minimum (Tmin) temperatures. Paper records were scruti‐ nized to identify reported, but previously non-digitized data to reduce, to the extent possi‐ ble, the number of missing data. A list of 2144 stations was compiled for the sites that met the criterion of at least 40 years data with less than two months continuous missing gaps for at least one of the three variables. The remaining missing data in the dataset were supple‐ mented by the estimates obtained from the measurements made at nearby stations. The spa‐ tial regression test (SRT) and the inverse distance weighted (IDW) method were adopted in a dynamic data filling procedure to provide these estimates. The replacement of missing val‐ ues follows a reproducible process that uses robust estimation procedures and results in a serially complete data set (SCD) for 2144 stations that provide a firm basis for climate analy‐ sis. Scientists who have used more qualitative or less sophisticated quantitative QC techni‐ ques may wish to use this data set so that direct comparisons to other studies that used this SCD can be made without worry about how differences in missing dataprocedures would influence the results. A drought atlas based on data from the SCD will provide decision makers more support in their risk management needs.

**8. Issues relating QC to gridded datasets,**

will not be included in the grid-based estimation.

**Figure 10.** An example of station distribution used in the grid method.

**9. Quality control of high temporal resolution datasets**

The Oklahoma Mesonet (http://www.mesonet.org/) measures and archives weather condi‐ tions at 5-minute intervals (Shafer et al., 2000). The quality control system used in the net‐ work starts from the raw data of the measurements for the high temporal resolution data. A set of QC tools was developed to routinely maintain data of the Mesonet. These tools de‐ pend on the status of hardware and measurement flag sets built in the climate data sys‐

Gridded datasets are sometimes used in QC but, we caution against this for the following reasons.New datasets created from inverse distance weighted methods or krigging suffer from uncertainties. The values at a grid point are usually not "true"measurements but are interpolated values from the measurements at nearby stations in theweather network.Thus, the values at the grid points are susceptible to bias. When further interpolation is made to a given location within the grid, bias will again exist at the specific location between the grid‐ ded values..Fig.10provides an example of potential bias. Outside of a gridded data set the target location would give a large weight to the value at station 5. However, if the radius used for the gridded data is as in the Fig.10, then the closest station to the target station (5)

Toward a Better Quality Control of Weather Data

http://dx.doi.org/10.5772/51632

25

After identifying stations with a long-term (at least 40 years) continuous (no data gaps lon‐ ger than two months) dataset of Tmax, Tmin, and/or PRCP for a total of 2144 stations, the missing values in the original dataset retrieved from ACIS were filled to the extent possible with the keyed data from paper record and the estimates using the SRT and IDW methods. Two implementations of SRT were applied in this study. The short-window (60 days) imple‐ mentation provides the best estimates based on the most recent information available for constructing the regression. The second implementation of SRT fills the long gaps, e.g. gaps longer than one month using the data available on a yearly basis. The IDW method was adopted to fill any remaining missing data after the two implementations of SRT.

This is the first serially complete data set where a statement of confidence can be associated with many of the estimates, ie. SRT estimates. The RMSE is less than 1F in most cases and thus we are 95% confident that the value, if available, would lie between ±2F of the estimate. This data set is available 1 to interested parties and can be used in crop models, assessment of severe heat, cold, and dryness. Probabilities related to extreme rainfall for flooding and erosion potential can be derived along with indices to reflect impact on livestock produc‐ tion. The data set is offered as an option to distributing raw data to the users who need this level of spatial and temporal coverage but are not well positioned to spend time and resour‐ ces to fill gaps with acceptable estimates.

Analysis based on the long-term dataset will best reveal the regional and large scale climatic variability in the continental U.S., making this an ideal data set for the development of a new drought atlas and associated drought index calculations. Future data observations can be easily appended to this SCD with the dynamic data filling procedures described herein.

<sup>1</sup> Contact the High Plains Regional Climate Center at 402-472-6709
