Proposal for a verification intercomparison of European limited area models

Revised _proposal_DRAFT_1

Received: 4th October 2007

SRNWP Project

A staged Verification and

Model Intercomparison for European models

Table of content

1. Background

2. Technical details of the verification to be performed to realise the operational model intercomparison

2.1 Models to be compared

2.2 Forecast delivery

2.3 Verification domain

2.4 Types of observations to be used

2.5 Verification methods

2.6 Deliverables

2.7 Dissemination of results

2.8 Adding more operational models

3. Evaluation and shared implementation of new methods

4. Collecting non-GTS observation data

5. Duties of the Responsible Member

6. Reporting

7. Start and length of the Programme ---- needs redefinition

8. Costs per year ---- needs redefinition

Appendix 1: European Mesoscale model Intercomparison of Precipitation (EMIP)

Appendix 2: New methods of verification for precipitations of km-scale models

1. Background

At the meeting "A Vision for Numerical Weather Prediction in Europe", jointly organised by the Met Office and the ECMWF, which took place at the ECMWF the 15th-17th of March 2006, several recommendations have been made at the request of the EUMETNET Council in order to increase cooperation and efficiency in NWP in Europe (see EUMETNET Document EMN/C27/Doc12).

Under the theme "Improved Framework for Collaboration", one of the recommendations reads: "Initiate the reorganisation of the SRNWP Programme" in order to enlarge its scope and strengthen its activity. This recommendation also proposes "the definition of specific projects in a similar way to the ECSN Programme".

The same Document also stipulates the following: "SRNWP Programme will define at its next general meeting a draft programme proposal for spring 2007".

The SRNWP general meeting took place the12th of October 2006 in Zurich and three specific projects have been defined (See the minutes of the meeting under http://srnwp.cscs.ch/Annual_Meetings/2006/Report2006.htm).

One of these three specific projects was the development of a common verification package and the realization of an operational model intercomparison. After consideration by a redaction committee, the proposal was not submitted to the EUMETNET Council. It was felt that the construction of a new verification package was not required. Many consortia had or would shortly have suitable packages, and so there was with little enthusiasm in NMSs to take the responsibility for the package. Also the operational adoption and use of such a common verification package was likely to be slow. This and the delay in finding a responsible member and sufficient effort to develop a common verification package would mean the delivery of the common intercomparison results in the near future would be unlikely.

Recognising these practical difficulties, this is a revised proposal in which the objectives are staged. The first stage is to establish a method of comparing verification results for the four reference consortia models run at ~10km resolution, and evaluate whether this approach works or has too many problems in interpretation. The second stage would expand the procedure to include more model configurations and domains from other national/consortia centres, eg LACE. Stage 3 would add higher resolution models, currently ~2-5km. These first 2 stages are most easily achievable and should deliver results relatively quickly. The other aims of the original project would be the subject of later developments.

In the last five years there have evolved the new generation of non-hydrostatic high-resolution models with horizontal resolutions of a few kilometres only. New methods of verification are needed and are in development in several centres. This is therefore an historic opportunity to develop a common verification package for these and it would really be in the spirit of the "NWP Vision" if one or two of these new verification methods - particularly the ones for the verification of the precipitations - could be incorporated into a common package in order to allow the participating NMS to take profit of them, particularly when they do not have the resources to develop them themselves.

Stage 3 would involve evaluation of the new methods to identify which are most suitable. The working group on verification could organise the activity in this and hold a workshop specific to this topic alone. As part of this common code could be developed and shared. Also for precipitation, access to quality controlled radar composites, such as the OPERA radar hub product, is necessary. which could also be done in this stage.

Stage 4 is the collection of non-GTS data and development of a hub to archive and access these. This requires dedicated resources and much greater financial support.

1.1 First aim of this project: A practical approach to an intercomparison of operational models

The operational exchanges of WMO scores of global models are a good way for a national Weather Service active on the global scale to monitor the quality of its own model in comparison to the others. It allows seeing what comes from its own modifications and what comes from a more predictable atmosphere during a given period

Thanks to the existence of Consortia, we have in Europe only four basic operational regional models: Aladin, Hirlam, Cosmo and the limited area version of the Unified Model. The first aim is to deliver a meaningful intercomparison of operational deterministic forecasts from each of the 4 consortia models.

For limited area models, score exchanges are not systematically organized because the way scores are computed in the different NMS is not unique and because the simulation domains are not the same. One exception is provided by the Meteorological Office, which runs a routine comparison over the British Isles of the operational precipitation forecasts computed by the four major European regional models. The details are given in Appendix 1.

Every NMS today has its own verification package, usually calculating the same scores for similar quantities. Differences arise from the selection of stations, quality control and thresholds for exclusion such as height differences between model and observation. Whilst the use of a single common verification package could ensure that scores are all calculated in exactly the same way using an agreed station list, the interface to the observational data will inevitably introduce different quality control decisions. Most QC schemes involve comparison against the models themselves and so the selection will differ. Common verification has also been hampered by the lack of interoperability between our models. If this latter problem can be solved - a Project called "Interoperability between the European Models" has also been proposed[1] - a new effort in comparison of verification results is highly desirable.

A multiple centre verification:

A quick win in verification comparison would be to attempt a limited area version of the global WMO CBS scheme . The four regional model forecasts from each consortium could be verified by a few NMS using their current verification systems . This would ensure that the same observations and quality control are used for all the forecasts of all the models at each verifying centre although there would still be differences between centres. Unlike the WMO CBS exchange of global scores where each centre only scores its own forecasts, this proposal envisages forecasts being exchanged and scores applied consistently by each verification NMS. The exchange format for the Met Office precipitation comparison is currently GRIB1. When the interoperability project is defined the exchange format would be adopted. A possible way of ensuring exactly the same forecasts are used at each verifying centre would be to obtain these via the PEPS data collection at DWD, rather than through direct bi-lateral exchanges.

A responsible member would collect the results from the participating centres and make available both individual results and a “pooled” result on a password protected web site. A consensus view would be attempted although it is recognised that some contradictory results are likely and may not be adequately resolved. However, as with the global CBS exchange it is anticipated that some common conclusions will be possible.

The opportunity to perform a thorough comparison of our four regional models should highlight their respective strengths and weaknesses. This would foster a general improvement of our models - they all have weaknesses! - by re-design or replacement of some parts.

2. Technical details of the verification to be performed to realise the operational model intercomparison

2.1 Models to be compared

The verification and the comparison are principally open to all the operational versions of the Aladin, Hirlam, Cosmo and Unified Model of the Participating Members. However, in order to test the approach and get some early results, as a first step, the model intercomparison will be limited to:

The North Atlantic - Europe (NAE) version of the Unified Model run by the Met Office
The Hirlam reference version, as run by the Finnish Meteorological Institute
The Aladin-France model run by Meteo-France
The European area version of the Cosmo model as run by Deutscher Wetterdienst (DWD).

Only the 00 UTC forecasts will be verified. The forecast range will be 48 hours.

2.2 Forecast delivery

In order to avoid any problem related to the commercialisation of weather forecasts, the fields of the parameters to be verified will be sent to the participating verification centres with a 48-hour lag time. To ensure that exactly the same number of forecasts are used it is proposed that these are obtained via the PEPS project at DWD. Forecast fields of the parameters to be verified will be in the current format for that exchange. However later they will be delivered in the format of the common model output that the EUMETNET Interoperability Project will define. The interface programmes will be developed by the Project Interoperability.

The model outputs for all the parameters to be verified (precipitations excepted) are requested from T+0 to T+48h, at 6h intervals.

For the precipitations, needed are the accumulated precipitations for the 8 time intervals (+0/+6), ... , (+42/+48).

2.3 Verification domain

For the model intercomparison, the verification will be made over the largest possible common domain of the participating models, excluding lateral boundary and extension zones.

2.4 Types of observation to be used

The forecasts will be compared against SYNOP station reports. In addition, radar estimates of surface precipitation will be used as an alternative or complement to SYNOP station reports of precipitation totals, where possible. The use of the radar composites developed for operational production at the Met Office in the frame of the OPERA Programme will be the preferred choice, where this is possible.

2.5 Verification methods

The model intercomparison will be realized using the operational verification packages at each verifying centre. The preferred method to project model forecasts (apart from precipitation) onto the synoptic stations’ locations is by bilinear interpolation . It is recognised that some centres may only have other methods such as nearest grid point available in their package. The methods used should be fully documented and stated in publication of the results.

Precipitation forecasts, radar estimates and precipitation reported in SYNOPs should be area-meaned to a common coarse rotated latitude-longitude grid.

2.6 Deliverables

For the variables mean sea level pressure, temperature and wind speed, the scores to be produced are bias, root mean square error and skill score (with respect to persistence).

For the norm of the wind vector difference, the scores to be produced are the root mean square error and the skill score (with respect to persistence)

For all the above variables, the scores will be computed every 6 hours, i.e. for +00, +06, ... , +42, +48.

The ECMWF high-resolution analyses should be used for persistence in order to allow a fair common reference.

For the precipitations, the scores to be produced are frequency bias, equitable threat score (ETS), log-odds ratio and the Peirce (Hansen-Kuipers) skill score against persistence.

6-, 12- and 24-hourly accumulated total precipitation will be verified.

Geographical distributions and time-series, as well as monthly, seasonal and yearly means will be produced for all the scores of all the parameters verified.

Additionally, for the variables mean sea level pressure, temperature, wind speed and norm of the wind vector difference, the monthly means of the bias and root mean square error will be computed at each individual station for the hours +36 and +48.

2.7 Dissemination of results

All the verification results of the model intercomparison will be published on the web site of the Responsible Member under password protection.

Participating Members to this Project will be entitled to receive the password.

2.8 Adding more operational models

If the approach is demonstrated to be useful and allows consistent conclusions to be made, other model configurations can be added and different common areas defined and used. Not all centres will wish to verify every model but some common subsets could be defined between participating centres. In addition higher resolution versions could also be added at this stage, although using only standard verification scores and measures against synoptic stations are less suitable. A proper intercomparison is likely to depend on the maturing of the newer “fuzzy” or neighbourhood methods.

4. Evaluation and development of newer methods

It is now a well-established fact that the verification of the precipitations of models with km-scale resolution cannot be done at that scale with the traditional scores. If it is done, the results are often systematically inferior to the results of models with coarser, even much coarser horizontal resolutions.

The development of new verification methods for precipitation forecasts suitable for km-scale models is today a very active field of research. It is necessary to incorporate in the common verification approach – in addition to the classical scores given above - some of these new methods that NMS could use for their model version with the highest resolution.

It is quite a new field of research. This implies that the interpretation of the results of these methods is still the object of discussions and, as for the traditional scores, each of them shows only one characteristic of the precipitation behaviour.

It would not be meaningful in this Proposal to dictate which method NMSs should use. In Appendix 2, a few hints are given on some of these methods.

It is envisaged that the SRNWP working group for verification should coordinate its efforts in testing, developing and evaluating these methods. A workshop will be organised specifically aimed at identifying the most useful and coordinate code sharing for the practical implementation of them.

4. Collecting non-GTS observation data

Standard observation data like SYNOP, TEMP, METAR, etc are collecting in many databases in Europe, not only at each NMS and at the ECMWF, but also by projects as, for example, the EU Project EUROGRID or the EUMETNET Programme "European Climate Assessment & Dataset".

Next to these GTS-data, there are in Europe large amounts of meteorological observations that do not circulate: they come from stations that are not registered by WMO, which often belong to counties or provinces. Their data remain local or inside national borders.

The high-density rain gauge networks are the best example. Although they normally belong to National Meteorological Services, no large-scale exchange of their data takes place in Europe.

High-resolution observing networks - particularly the rain gauge networks - have been established in the past for climatic purposes: to better know the climate of a region or of a country. Modern high-resolution observing networks made of automatic observing stations serve primarily the knowledge of the present weather but also the climatology.

We are presently witnessing a tremendous increase in spatial resolution of the NWP models: models of some 4 km resolution are already operational; models with resolution between 1 and 3 km are in a pre-operational stage. It is thus easy to understand that these high-resolution data become very important for the verification of the results of these high-resolution models.

However, it is generally very difficult to access these data for stations outside national borders.

Some efforts have already been done to collect these data, particularly the high-density precipitation measurements. Examples:

- DWD collects for the needs of the Consortium COSMO the non-GTS precipitation data of Germany, Switzerland and Northern Italy

- ECMWF collects the non-GTS precipitation data of its Members

- The EUMETNET Programme "European Climate Assessment & Dataset" collect non-GTS precipitation data, but not with a high spatial resolution.

Missing in Europe is a hub that would centralised all the non-GTS meteorological data, as we today already have in the frame of EUMETNET a hub for radar data and a hub for wind profiler data.

The absence of such a hub can be well noticed today in Europe by the fact that when a project is submitted, the first foreseen task is often "to establish a data base for high resolution meteorological data". And when the project is accepted, several NMS Directors receive a letter from the project leader asking for non-GTS data (cf. Project ELDAS, Project ENSEMBLES, Programme ECA&D, etc.).

In this Programme, it is planed to have a data hub for non-GTS meteorological data, but it is not planed to create a new data centre. The idea is to supplement an already existing observation database with the maximum of verified non-GTS observation data. This hub would be an extension of the observation database of a NMS or of the ECMWF.

5. Duties of the Responsible Member

The Responsible Member shall

= Concerning the Model Intercomparison

- collect the verification scores from participating verification centres and produce the graphics in a common format

- maintain up-to-date the model intercomparison pages on its web site

- store on its computer system all the verification results

= Concerning the hub of the non-GTS observing data

- find a NMS (or ECMWF) ready to extend its database to non-GTS data

- remain in close contact with the NMS responsible of the non-GTS data hub

- motivate the NMS to deliver all their non-GTS observation data to the hub

6. Reporting

The Responsible Member shall send quarterly reports and annual reports to the Programme Manager of the SRNWP Programme reflecting the state of

- the advancement of the model intercomparison

- the non-GTS data hub, especially the incoming rate of the observation data.

The quarterly reports have to insist on the difficulties encountered and make propositions on how these difficulties could be solved or their effects mitigated.

7. Start and length of the Programme

The Programme should start the ???? and end the ???.

The work must start with the identification of the centres which will perform the verification to be included in the intercomparison. The operational model intercomparison should start as soon as a few scores are ready.

Discussions with NMS and ECMWF have to take place for the preparation of a non-GTS observation data hub.

8. Costs per year – this needs reworking……

Costs of the Responsible Member for

- the development and maintenance of the common verification package

- the set up and operation of the model intercomparison facility, inclusive computing and archiving costs

- the maintenance of the web pages of the model intercomparison results

Full time equivalent scientist: € 90'000.-

Travel expenses of the full time equivalent scientist: € 2'000.-

Programme Manager: 25% of his/her working time: € 17'500.-

Travel expenses of the Programme Manager: € 3'000.-

Cost for the maintenance of the database for the not GTS observation data with upload and download functions:

30% of the time of a scientist € 27'000.-

(not necessarily by the Responsible Member)

Total cost per year: € 139'500.-

Clive Wilson for the redaction Committee for

the Verification Programme

References

Ebert, E E , 2007. Fuzzy Verfication of High Resolution Gridded Forecasts: A review and Proposed Framework. Submitted to Meterological Applications

Casati, B, Ross, G, D B Stepehenson, 2004: A new intensity-scale approach for the verification of spatial precipitation forecasts. Meterological Applications 11, 141-154.

Mittermaier, M P, 2006: Using an intensity-scale technique to assess the added benefit of high-resolution model precipitation forecasts Atmos. Sci Letters 7, 36-42

Appendix 1

European Mesoscale model Intercomparison of Precipitation (EMIP)

The current precipitation intercomparison done at the Met Office is described here.

Models and forecasts verified

The current models included in the precipitation verification are:

The 12km UK mesoscale model run at the Met Office
The 22km reference Hirlam model run by FMI
The 9km Aladin France model run by Meteo France
The 7km Cosmo model run by Deutscher Wetterdienst

The intercomparison aims to verify the various models against the UK NIMROD radar-rainfall composite over a large part of the UK. The forecasts from 00UTC are verified. Only daily (24h) precipitation is compared. The intercomparison has data from January 2004 to present. At present the comparison is made at the coarsest model resolution, the HIRLAM at 22 km, with finer resolution models area-mean summed to the coarser grid.

At present, four scores derived from contingency tables are displayed: the frequency bias, Equitable Threat Score (ETS), the log-odds ratio and a new experimental score called the Extreme Dependency Score.

Mean scores since January 2004 to present are plotted against precipitation thresholds.

The thresholds are 0.125, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 12.0, 16.0, 20.0, 24.0, 32.0 and 48.0 mm. Time series of the scores are also produced and monthly contingency tables available to download for users’ own use. The results are displayed on the Met Office External web, under password protection.

Appendix 2

New methods of verification for precipitations of km-scale models

Among the different methods presently in development or put recently into operation, two should be seriously considered by the Responsible Member, as they are recognised by specialists as particularly interesting.

Fuzzy verification Methods

There are several “fuzzy” approaches to verification of high resolution models, some detailed in papers published in the literature, some outlined in conference and workshop proceedings. Recently a review of the approaches has been made by Ebert (2007) ,emphasising the framework in which the approaches can be compared and assessed.. ”Fuzzy” verification be applied under many different forms, some of them being even very simple. However, there is still no consensus yet on the most useful or appropriate form to be applied. A consensus may emerge from research and development of the methods and activities such as the Intercomparison of methods applied to WRF forecasts (see http://www.ral.ucar.edu/projects/icp/index.html).

The scale intensity method

The scale intensity method of Casati et al. (2004), used by Mittermaier (2006) in the Met Office is also a very powerful method and temporal aggregations of the results are possible.

2.3.1 New scores to develop

[1] "Interoperability between the European Models" is also one of the 3 specific projects decided at the last SRNWP general meeting to accompany the new SRNWP Programme scheduled to start January the 1st, 2008.