www-txt.gif (7823 bytes)

VERIFICATION SYSTEMS FOR LONG-RANGE FORECASTS NEW

Preamble

CBS-Ext.(98) adopted procedures that defined the Core Standardized Verification System (SVS) for long-range forecasts, as proposed jointly by CAS, CCl and CBS experts. The Core SVS was designed to provide a straightforward assessment system for all predictions in the medium-range and longer timescales; nevertheless it can be used at the short-range also. Objectives of the SVS are covered in detail in Annex 1. The two prime objectives are:

1.    To provide on-going standardized verification statistics on real-time forecasts for exchange between GDPS centres and for annual submission to CBS;

2.    To provide standardized methods of verification that can be attached to any real-time prediction in order                      that information concerning the inherent skill of the forecast system is passed to the recipient.

Proposed Principles

Verification histories may be produced through a combination of hindcasts and real-time forecasts. However the forecast method should remain consistent throughout the entire history period, with hindcasts using no information that would not have been available for a real-time forecast produced at that time. If real-time forecasts are used within the verification history then they should not be included in the verification record of real-time forecasts.

Climatologies should be calculated consistently within the verification history. Data set statistics, such as means and standard deviations, should be calculated across the period of the verification history and should be applied to verification of subsequent real-time forecasts.

Where bias correction, statistical post-processing or other forms of intervention are applied which result in differences in forecast production methodology between verification history and real-time forecast periods then an attempt may be made to verify the unmodified forecast system in addition to the real-time system with results presented for both.

Formulation

The SVS is formulated in four parts:

  1. Diagnostics. Two diagnostics are included and are closely defined. Additional diagnostics are suggested but are not incorporated into the Core SVS as yet. Use of the additional diagnostics is optional.
  2. Parameters. Key variables for initial inclusion are proposed. However the list is flexible to ensure that all producers can contribute regardless of the structure of individual forecast systems.
  3. Verification data sets. Key data sets of observations against which forecasts may be verified are proposed. This list is also flexible to ensure that all producers can contribute regardless of the structure of individual forecast systems.
  4. System details. Details of forecast systems employed.

Diagnostics

Two diagnostics are incorporated in the Core SVS - Relative Operating Characteristics and Root Mean Square Skill Scores. Both provide standardized values permitting direct intercomparison of results across different predicted variables, geographical regions, forecast ranges, etc. Both may be applied in verification of most forecasts and it is proposed that, except where inappropriate, both diagnostics are used on all occasions.

  1. Relative Operating Characteristics. Calculation details are discussed in Annex 2. For deterministic forecasts, the full contingency table should be provided. In addition, values of the Hit and the False Alarm Rates should be supplied. Other contingency measures, as listed in Annex 4, may be added. For probabilistic forecasts, maps providing the standardized area under the curve (such that perfect forecasts, give an area of 1 and a curve lying along the diagonal gives 0.5) should be provided, as a map for gridded data or as a curve for single point/region predictions. Probability values should be labelled on any Relative Operating Characteristics curves.

A number of contingency table-based diagnostics are listed within Annex 4 in addition to Hit and False Alarm Rates, including the Kuiper Score and Percent Correct (both used in assessing deterministic forecasts), and these provide valuable, readily-assimilable information for developers, producers and users of long-range forecasts. They may be considered for inclusion within information supplied to users.

  1. Root Mean Square Skill Scores. Calculation details are discussed in Annex 3; root mean square skill scores are appropriate only for deterministic forecasts. In general two skill scores should be provided: against persisted anomalies and against climatology. Where persistence is not relevant as a predictor (such as for some seasonal rainfall regimes) only skill against climatology should be assessed. Additionally, persistence may not be relevant once the skill of a persistence forecast is exceeded by that of a climate forecast; in this circumstance use of persistence is optional. A further three individual values (two when persistence is not used) should be provided: RMS error values for the forecast, for persistence and for climatology.

Root Mean Square Skill Scores provide useful data to the developer and producer but are thought to carry less information to the user, particularly those served by the NMHS. Hence provision of Root Mean Square Skill Scores to users is optional.

Parameters

The key list of parameters in the Core SVS is provided below. Any verification for these key parameters, for either the verification history or for real-time forecasts, should be assessed using both Core SVS techniques wherever possible (given exceptions noted above). Many long-range forecasts are produced which do not include parameters in the key list (for example, there are numerous empirical systems that predict seasonal rainfall over part of, or over an entire, country). The Core SVS diagnostics should be used to assess these forecasts also, but full details of the predictions will need to be provided.

1.     Sea Surface Temperature Predictions. Predictions for:

NINO1+2

NINO3

NINO3.4

NINO4

Pacific Warm Pool (4N to 0N; 130E to 150E)

Tropical Indian Ocean ()

Tropical Atlantic Ocean ()

2.    Atmospheric parameters. Predictions for:

T2m Screen Temperature

with standard regions: Tropics 30N to 30S
                                        Northern Extratropics >=30N
                                        Southern Extratropics <=30S
                                        For both Extratropical regions
                                                     split also into separate
                                                     land and oceanic regions
                                        Tropical Africa (10N to 10S; 15W to 45E)
                                        Tropical South America ( 10N to 10S; 80W to 35W)
                                        Tropical South East Asia ( 10N to 10S; 95E to 150E)
                                        NINO3 region

Precipitation

with standard regions: Tropics 30N to 30S
                                        Northern Extratropics >=30N
                                        Southern Extratropics <=30S
                                        For both Extratropical regions
                                                    split also into separate
                                                    land and oceanic regions
                                        Tropical Africa (10N to 10S; 15W to 45E)
                                        Tropical South America (10N to 10S; 80W to 35W)
                                        Tropical South East Asia (10N to 10S; 95E to 150E)
                                        Southern Asia (30N to 5N; 70E to 90E)
                                        NINO3 region

500 hPa Geopotential Height

with standard regions: Northern Extratropics >=30N
                                        Southern Extratropics <=30S

850 hPa Temperature

with standard regions: Tropics 30N to 30S
                                        Northern Extratropics >=30N
                                        Southern Extratropics <=30S
                                        For both Extratropical regions
                                                split also into separate
                                                land and oceanic regions
                                        Tropical Africa (10N to 10S; 15W to 45E)
                                        Tropical South America (10N to 10S; 80W to 35W)
                                        Tropical South East Asia (10N to 10S; 95E to 150E)
                                        NINO3 region

Mean sea level surface pressure

with standard regions: Northern Extratropics >=30N
                                        Southern Extratropics <=30S

Southern Oscillation Index

                                        Tahiti-Darwin index (complete definition)

In using Relative Operating Characteristics a definition of the binary 'event' being predicted is required. While flexibility in defining the event is proposed, the recommendation is that each event be either above or below normal or a tercile of the climatological distribution.

Additional diagnostics that might aid centres in verification of long-range forecasts are listed in Annex 4.

Verification Data Sets

The key list of data sets to be used in the Core SVS for both climatological and verification information is provided below. The same data should be used for both climatology and verification, although the centre’s analysis (where available) and the ECMWF and NCEP/NCAR Reanalyses and subsequent analyses may be used when other data are not available. Many seasonal forecasts are produced that may not use the data in either the key climatology or verification data sets (for example, there are numerous systems which predict seasonal rainfall over part of, or over an entire, country). Appropriate data sets should then be used with full details provided.

  1. Sea Surface Temperature

                    Reynolds OI, with option for additional use of GISST

  1. Precipitation

Xie-Arkin; GPCP data; GCOS Network once data readily available; ECMWF and NCEP/NCAR Reanalyses and operational analysis data

  1. T2m Screen Temperature

GCOS Network once data readily available; ECMWF and NCEP/NCAR Reanalyses and operational analysis data; UKMO/CRU T2m data set

 

  1. 500 hPa Geopotential Height

ECMWF and NCEP/NCAR Reanalyses and operational analysis data; own centre operational analysis data if available; GUAN data once available; UKMO RS data set

  1. 850 hPa Temperature

ECMWF and NCEP/NCAR Reanalyses and operational analysis data; own centre operational analysis data if available; GUAN data once available; UKMO RS data set

6. Sea-surface Pressure

ECMWF and NCEP/NCAR Reanalyses and operational analysis data; own centre operational analysis data if available; UKMO GMSLP data set

When gridded data sets are used, a 2.5 by 2.5 grid is recommended.

System Details

Information will be requested for exchange of scores concerning the following details of the forecast system; information labelled * should also be attached to user information:

  1. Is the system numerical/hybrid/empirical*?
  2. Do the results relate to the verification history or to real-time forecasts*?
  3. Is the system deterministic/probabilistic*?
  4. List of parameters being assessed*
  5. List of regions for each parameter*
  6. List of forecast ranges (lead times) and periods (e.g. seasonal average) for each parameter*
  7. The number of hindcasts/predictions incorporated in the assessment and the dates of these hindcasts/predictions
  8. Details of climatological and verification data sets used (with details of quality controls when these are not published)
  9. If appropriate, resolution of fields used for climatologies and verification
  10. .The period over which data are averaged to produce persisted anomalies
  11. .Results of significance tests (Monte Carlo tests are recommended) on the historical verification period*
  12. .Bias correction

Annexes: 4


ANNEX 1

OBJECTIVES OF THE STANDARDIZED VERIFICATION SYSTEM

The Standardized Verification System has two major objectives:

1. To provide a standardized method whereby forecast producers can exchange information on the quality of longer-range predictions on a regular basis and can also report results to WMO annually as part of a consolidated annual summary;

2. To provide a standardized method whereby forecast producers can add information on the inherent qualities of their forecasts for the information and advice of recipients.

In order to achieve the first major objective, the SVS incorporates two diagnostics and a series of recommended forecast parameters and verification and climatological statistics against which to assess the forecasts which can be applied to real-time forecasts, either on an individual basis, or, preferably, accumulated over a sequence of predictions.

The second major objective is achieved using the same diagnostics, forecast parameters and verification and climatological statistics but applied to historical tests of the system. It is made clear whether the historical tests are based on methods that can be considered to represent a true forecast, had the test been run in real-time or otherwise. Producers will be requested to add this information to issued predictions; recommendations for methods by which this might be done may be formulated later.

Other objectives of the Standardized Verification System are:

3. To encourage both regular verification of forecasts and verification according to international standards;

4. To encourage information on inherent forecast quality to be added to all predictions as a matter of course and to encourage forecast recipients to expect receipt of the information;

5. To encourage producers to use consistent data sets and to encourage production of these data sets;

6. To provide verifications that permit direct intercomparison for forecast quality regardless of predicted variable, method, forecast range, geographical region, or any over consideration;

7. To encourage producers to work towards a common method for presenting forecasts.


ANNEX 2

RELATIVE OPERATING CHARACTERISTICS

The derivation of Relative Operating Characteristics is given below. For purposes of reporting forecast quality for exchange between centres and for annual submission to WMO the following will be required:

1. For deterministic forecasts Hit Rates and False Alarm Rates together with essential details of the forecast parameter and verification data sets;

2. For probabilistic forecasts Hit Rates and False Alarm Rates for each probability interval used. Frequent practice, as illustrated below, is for probability intervals of 10 per cent to be used. However other intervals may be used as appropriate (for example, for nine-member ensembles an interval of 33.3 % could be more realistic). Additionally the area under the curve should be calculated.

Relative Operating Characteristics (ROC), derived from signal detection theory, are intended to provide information of the characteristics of systems upon which management decisions can be taken. In the case of weather forecasts, the decision might relate to the most appropriate manner in which to use a forecast system for a given purpose. ROC's are useful in contrasting characteristics of deterministic and probabilistic systems.

Take the following 2x2 contingency table for any yes/no forecast for a specific binary event:

   

FORECASTS

 
    YES NO  
OBSERVED YES Hits (H) Misses (M) H+M
  NO False Alarms (FA) Correct rejections (CR) FA + CR
    H + FA M + CR  

The binary 'event' can be defined quite flexibly, e.g. as positive/negative anomalies, anomalies greater/less than a specific amount, values between two limits, etc. If terciles are used then the binary event can be defined in terms of predictions of one tercile against the remaining two.

Using stratification by observed (rather than by forecast) the following can be defined:

Hit Rate = H/(H + M)

False Alarm Rate = FA/(FA + CR)

For deterministic forecasts the Hit Rate and False Alarm Rate only need be calculated; for probabilistic forecasts the procedure outlined below should be followed.

A probabilistic forecast can be converted into a 2x2 table as follows. Tabulate probabilities in, say, 10% ranges stratified against observations, i.e.:

Probability Range

Number of Observed events

Number of Non-Observed events

 

for each probability range

for each probability range

90-100%

O10

NO10

80-90 %

O9

NO9

70-80 %

O8

NO8

60-70%

O7

NO7

50-60 %

O6

NO6

40-50%

O5

O5

30-40%

O4

O4

20-30%

O3

NO3

10-20%

O2

NO2

0-10%

O1

NO1

Totals

S OI

S NOi

For any threshold, such as 50%, (indicated by the dotted line in the table), the Hit Rate (False Alarm Rate) can be calculated by the sum of O's (NO's) at and above the threshold value divided by S Oi (S NOi) - in other words for a value of 50% the calculation is as if the event is predicted given any forecast probability of 50% or more. So for the above case:

Hit Rate = (O10 + O9 + O8 + O7 + O6) / S Oi

False Alarm Rate = (NO10 + NO9 + NO8 + NO7 + NO6) / S NOi

This calculation can be repeated at each threshold and the points plotted to produce the ROC curve, which, by definition, must pass through the points (0,0) and (100,100) (for events being predicted only for 100% probabilities and for all probabilities exceeding 0% respectively). The further the curve lies towards the upper left-hand corner the better; no-skill forecasts are indicated by a diagonal line.

Areas under ROC curves can be calculated using the Trapezium rule. Areas should be standardized against the total area of the figure such that a perfect forecast system (i.e. one that has a curve through the top-left-hand corner of the figure) has an area of one and a curve lying along the diagonal (no information) has an area of 0.5. Alternatively, but not recommended in the Standard, the 0.5 to 1.0 range can be rescaled to 0 to 1 (thus allowing negative values to be allocated to cases with the curve Iying below the diagonal - such curves can be generated). Not only can the areas be used to contrast different curves but they are also a basis for Monte Carlo significance tests. Monte Carlo testing should be done within the forecast data set itself.

In order to handle spatial forecasts, predictions for each point within the grid should be treated as individual forecasts but with all results combined into the final outcome. Categorical predictions can be treated for each category separately.

 


ANNEX 3

ROOT MEAN SQUARE SKILL SCORES

Root Mean Square Skill Scores are calculated from:

1 - RMS(forecast) * 100

RMS(standard)

RMS (forecast) refers to the RMS error of the forecast. RMS (standard) refers to the RMS error of the standard when verified against the same observations as the forecast - the standard can be either climatology or persistence. When persistence is used, the persistence should be defined in a manner appropriate to the time-scale of the prediction, although it is left to the producer to determine whether persistence over, perhaps, one month or an entire season is used in assessing a seasonal prediction. No portion of the persistence period should overlap into the forecast period and the forecast range should be calculated from no sooner than the time at which any observed information (i.e. information which could not be known at the time of a real forecast) is no longer included. Both of these requirements are placed to ensure that all forecasts and test predictions only use data that were available at the time of the prediction or would have been available at that time had a prediction been made (in the case of historical test).

 


ANNEX 4

ADDITIONAL DIAGNOSTICS

1. Categorical forecasts

Linear Error in Categorical Space for Categorical Forecasts (LEPSCAT)

Bias

Post Agreement

Percent Correct

Kuiper Score

2.    Probability Forecasts of Binary Predictands

Brier Score

Brier Skill Score with respect to Climatology

Reliability

Sharpness (measure to be decided)

Continuous Rank Probability Score

3.     Probability of Multiple-Category Predictands

Ranked Probability Score

Ranked Probability Skill Score with respect to Climatology

4.    Continuous Forecasts in Space

Murphy-Epstein Decomposition (phase error, amplitude error, bias error)

Anomaly Correlation

5. Continuous Forecasts in Time

Mean Square Error

Correlation

Bias

Anomaly Correlation

 

_________________


 

WMO Front, About WMO, WWW Front, Library, International Weather