![Hier klicken, um den Treffer aus der Auswahl zu entfernen](images/unchecked.gif) |
Titel |
Mean absolute error and root mean square error: which is the better metric for assessing model performance? |
VerfasserIn |
Gary Brassington |
Konferenz |
EGU General Assembly 2017
|
Medientyp |
Artikel
|
Sprache |
en
|
Digitales Dokument |
PDF |
Erschienen |
In: GRA - Volume 19 (2017) |
Datensatznummer |
250140218
|
Publikation (Nr.) |
EGU/EGU2017-3574.pdf |
|
|
|
Zusammenfassung |
The mean absolute error (MAE) and root mean square error (RMSE) are two metrics that are
often used interchangeably as measures of ocean forecast accuracy. Recent literature has
debated which of these should be preferred though their conclusions have largely been based
on empirical arguments. We note that in general,
RM SE2 = M AE2 + V ARk [|ɛ|]
PIC
PIC
such that RMSE includes both the MAE as well as additional information related to the
variance (biased estimator) of the errors ɛ with sample size k. The greater sensitivity of
RMSE to a small number of outliers is directly attributable to the variance of absolute error.
Further statistical properties for both metrics are derived and compared based on the
assumption that the errors are Gaussian. For an unbiased (or bias corrected) model both MAE
and RMSE are shown to estimate the total error standard deviation to within a constant
coefficient such that
∘ ----
M AE ≈ 2/πRM SE
PIC
. Both metrics have comparable behaviour in response to model bias and asymptote to the
model bias as the bias increases. MAE is shown to be an unbiased estimator while RMSE is a
biased estimator. MAE also has a lower sample variance compared with RMSE indicating
MAE is the most robust choice. For real-time applications where there is a likelihood of
“bad” observations we recommend
∘ -- ∘ -----∘ --
π- -1- π- π-
TESD = 2 M AE ± √k-- 2 − 1 2M AE
PIC
as an unbiased estimator of the total error standard deviation with error estimates (one
standard deviation) based on the sample variance and defined as a scaling of the MAE itself.
A sample size (k) on the order of 90 and 9000 provides an error scaling of 10% and 1%
respectively. Nonetheless if the model performance is being analysed using a large sample of
delayed-mode quality controlled observations then RMSE might be preferred where the
second moment sensitivity to large model errors is important. Alternatively for model
intercomparisons the information might compactly represented by a graph with axes of
MAE
PIC
and ∘V--ARk-[|ɛ|]
PIC
where radials from the origin represent RMSE
PIC
. |
|
|
|
|