|
Titel |
Sample size matters: investigating the effect of sample size on a logistic regression susceptibility model for debris flows |
VerfasserIn |
T. Heckmann, K. Gegg, A. Gegg, M. Becht |
Medientyp |
Artikel
|
Sprache |
Englisch
|
ISSN |
1561-8633
|
Digitales Dokument |
URL |
Erschienen |
In: Natural Hazards and Earth System Sciences ; 14, no. 2 ; Nr. 14, no. 2 (2014-02-17), S.259-278 |
Datensatznummer |
250118274
|
Publikation (Nr.) |
copernicus.org/nhess-14-259-2014.pdf |
|
|
|
Zusammenfassung |
Predictive spatial modelling is an important task in natural hazard
assessment and regionalisation of geomorphic processes or landforms. Logistic
regression is a multivariate statistical approach frequently used in
predictive modelling; it can be conducted stepwise in order to select from a
number of candidate independent variables those that lead to the best model.
In our case study on a debris flow susceptibility model, we investigate the
sensitivity of model selection and quality to different sample sizes in light
of the following problem: on the one hand, a sample has to be large enough to
cover the variability of geofactors within the study area, and to yield
stable and reproducible results; on the other hand, the sample must not be
too large, because a large sample is likely to violate the assumption of
independent observations due to spatial autocorrelation. Using stepwise model
selection with 1000 random samples for a number of sample sizes between
n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors
and the diversity of the resulting models as a function of sample size; the
multiplicity of different models is assessed using numerical indices borrowed
from information theory and biodiversity research. Model diversity decreases
with increasing sample size and reaches either a local minimum or a plateau;
even larger sample sizes do not further reduce it, and they approach the upper
limit of sample size given, in this study, by the autocorrelation range of
the spatial data sets. In this way, an optimised sample size can be derived
from an exploratory analysis. Model uncertainty due to sampling and model
selection, and its predictive ability, are explored statistically and
spatially through the example of 100 models estimated in one study area and
validated in a neighbouring area: depending on the study area and on sample
size, the predicted probabilities for debris flow release differed, on
average, by 7 to 23 percentage points. In view of these results, we argue
that researchers applying model selection should explore the behaviour of the
model selection for different sample sizes, and that consensus models created
from a number of random samples should be given preference over models
relying on a single sample. |
|
|
Teil von |
|
|
|
|
|
|