dot
Detailansicht
Katalogkarte GBA
Katalogkarte ISBD
Suche präzisieren
Drucken
Download RIS
Hier klicken, um den Treffer aus der Auswahl zu entfernen
Titel Gaussian Process Regression as a machine learning tool for predicting organic carbon from soil spectra - a machine learning comparison study
VerfasserIn Andreas Schmidt, Angela Lausch, Hans-Jörg Vogel
Konferenz EGU General Assembly 2016
Medientyp Artikel
Sprache en
Digitales Dokument PDF
Erschienen In: GRA - Volume 18 (2016)
Datensatznummer 250126537
Publikation (Nr.) Volltext-Dokument vorhandenEGU/EGU2016-6273.pdf
 
Zusammenfassung
Diffuse reflectance spectroscopy as a soil analytical tool is spreading more and more. There is a wide range of possible applications ranging from the point scale (e.g. simple soil samples, drill cores, vertical profile scans) through the field scale to the regional and even global scale (UAV, airborne and space borne instruments, soil reflectance databases). The basic idea is that the soil’s reflectance spectrum holds information about its properties (like organic matter content or mineral composition). The relation between soil properties and the observable spectrum is usually not exactly know and is typically derived from statistical methods. Nowadays these methods are classified in the term machine learning, which comprises a vast pool of algorithms and methods for learning the relationship between pairs if input - output data (training data set). Within this pool of methods a Gaussian Process Regression (GPR) is newly emerging method (originating from Bayesian statistics) which is increasingly applied to applications in different fields. For example, it was successfully used to predict vegetation parameters from hyperspectral remote sensing data. In this study we apply GPR to predict soil organic carbon from soil spectroscopy data (400 - 2500 nm). We compare it to more traditional and widely used methods such as Partitial Least Squares Regression (PLSR), Random Forest (RF) and Gradient Boosted Regression Trees (GBRT). All these methods have the common ability to calculate a measure for the variable importance (wavelengths importance). The main advantage of GPR is its ability to also predict the variance of the target parameter. This makes it easy to see whether a prediction is reliable or not. The ability to choose from various covariance functions makes GPR a flexible method. This allows for including different assumptions or a priori knowledge about the data. For this study we use samples from three different locations to test the prediction accuracies. One location is a first order catchment in agricultural use in the Harz mountains, central Germany (91 samples); another site as an agricultural site in the northeastern lowlands of Germany (Demmin site, 69 samples); and the third location is a Brazilian bamboo plantation site in the very east of Brazil (78 samples). For having robust validation metrics (RMSE, R2) we repeated the test/training split 100 times and show its resulting distributions. We also show the residual plots to check for non-linear behavior. The results show that GPR is performing best in 2 of the three study sites (Schäfertal: R2 = 0.85, Demmin: R2 = 0.78), only for the more diverse Brazilian samples PLSR scored higher (R2 = 0.74). With the additional remark: Two different covariance functions were giving the best scores the Schäfertal and Demmin sites. This demonstrates the advantage of being flexible with the choosing of the covariance function.