dot
Detailansicht
Katalogkarte GBA
Katalogkarte ISBD
Suche präzisieren
Drucken
Download RIS
Hier klicken, um den Treffer aus der Auswahl zu entfernen
Titel Life beyond MSE and R2 — improving validation of predictive models with observations
VerfasserIn Andreas Papritz, Madlene Nussbaum
Konferenz EGU General Assembly 2017
Medientyp Artikel
Sprache en
Digitales Dokument PDF
Erschienen In: GRA - Volume 19 (2017)
Datensatznummer 250144113
Publikation (Nr.) Volltext-Dokument vorhandenEGU/EGU2017-7899.pdf
 
Zusammenfassung
Machine learning and statistical predictive methods are evaluated by the closeness of predictions to observations of a test dataset. Common criteria for rating predictive methods are bias and mean square error (MSE), characterizing systematic and random prediction errors. Many studies also report R2-values, but their meaning is not always clear (correlation between observations and predictions or MSE skill score; Wilks, 2011). The same criteria are also used for choosing tuning parameters of predictive procedures by cross-validation and bagging (e.g. Hastie et al., 2009). For evident reasons, atmospheric sciences have developed a rich box of tools for forecast verification. Specific criteria have been proposed for evaluating deterministic and probabilistic predictions of binary, multinomial, ordinal and continuous responses (see reviews by Wilks, 2011, Jollie and Stephenson, 2012 and Gneiting et al., 2007). It appears that these techniques are not very well-known in the geosciences community interested in machine learning. In our presentation we review techniques that offer more insight into proximity of data and predictions than bias, MSE and R2 alone. We mention here only examples: (i) Graphing observations vs. predictions is usually more appropriate than the reverse (Piñeiro et al., 2008). (ii) The decomposition of the Brier score score (= MSE for probabilistic predictions of binary yes/no data) into reliability and resolution reveals (conditional) bias and capability of discriminating yes/no observations by the predictions. We illustrate the approaches by applications from digital soil mapping studies. Gneiting, T., Balabdaoui, F., and Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B, 69, 243–268. Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer, New York, second edition. Jolliffe, I. T. and Stephenson, D. B., editors (2012). Forecast Verification: A Practitioner’s Guide in Atmospheric Science. Wiley-Blackwell, second edition. Piñeiro, G., Perelman, S., Guerschman, J., and Paruelo, J. (2008). How to evaluate models: Observed vs. predicted or predicted vs. observed? Ecological Modelling, 216, 316–322. Wilks, D. S. (2011). Statistical Methods in the Atmospheric Sciences. Academic Press, third edition.