Online Katalog der GeoSphere Austria (Standort Neulinggasse)

Home

Login

Katalogkarte GBA

Katalogkarte ISBD

Suche präzisieren

Drucken

Download RIS

Zum vorherigen Treffer

Datensatz 1 von 1

Zum nächsten Treffer


Titel	Gaussian Process Regression as a machine learning tool for predicting organic carbon from soil spectra - a machine learning comparison study
VerfasserIn	Andreas Schmidt, Angela Lausch, Hans-Jörg Vogel
Konferenz	EGU General Assembly 2016
Medientyp	Artikel
Sprache	en
Digitales Dokument	PDF
Erschienen	In: GRA - Volume 18 (2016)
Datensatznummer	250126537
Publikation (Nr.)	EGU/EGU2016-6273.pdf



Zusammenfassung
Diﬀuse reﬂectance spectroscopy as a soil analytical tool is spreading more and more. There is a wide range of possible applications ranging from the point scale (e.g. simple soil samples, drill cores, vertical proﬁle scans) through the ﬁeld scale to the regional and even global scale (UAV, airborne and space borne instruments, soil reﬂectance databases). The basic idea is that the soil’s reﬂectance spectrum holds information about its properties (like organic matter content or mineral composition). The relation between soil properties and the observable spectrum is usually not exactly know and is typically derived from statistical methods. Nowadays these methods are classiﬁed in the term machine learning, which comprises a vast pool of algorithms and methods for learning the relationship between pairs if input - output data (training data set). Within this pool of methods a Gaussian Process Regression (GPR) is newly emerging method (originating from Bayesian statistics) which is increasingly applied to applications in diﬀerent ﬁelds. For example, it was successfully used to predict vegetation parameters from hyperspectral remote sensing data. In this study we apply GPR to predict soil organic carbon from soil spectroscopy data (400 - 2500 nm). We compare it to more traditional and widely used methods such as Partitial Least Squares Regression (PLSR), Random Forest (RF) and Gradient Boosted Regression Trees (GBRT). All these methods have the common ability to calculate a measure for the variable importance (wavelengths importance). The main advantage of GPR is its ability to also predict the variance of the target parameter. This makes it easy to see whether a prediction is reliable or not. The ability to choose from various covariance functions makes GPR a ﬂexible method. This allows for including diﬀerent assumptions or a priori knowledge about the data. For this study we use samples from three diﬀerent locations to test the prediction accuracies. One location is a ﬁrst order catchment in agricultural use in the Harz mountains, central Germany (91 samples); another site as an agricultural site in the northeastern lowlands of Germany (Demmin site, 69 samples); and the third location is a Brazilian bamboo plantation site in the very east of Brazil (78 samples). For having robust validation metrics (RMSE, R2) we repeated the test/training split 100 times and show its resulting distributions. We also show the residual plots to check for non-linear behavior. The results show that GPR is performing best in 2 of the three study sites (Schäfertal: R2 = 0.85, Demmin: R2 = 0.78), only for the more diverse Brazilian samples PLSR scored higher (R2 = 0.74). With the additional remark: Two diﬀerent covariance functions were giving the best scores the Schäfertal and Demmin sites. This demonstrates the advantage of being ﬂexible with the choosing of the covariance function.