|
Titel |
Regression models tolerant to massively missing data: a case study in solar-radiation nowcasting |
VerfasserIn |
I. Žliobaitė, J. Hollmén, H. Junninen |
Medientyp |
Artikel
|
Sprache |
Englisch
|
ISSN |
1867-1381
|
Digitales Dokument |
URL |
Erschienen |
In: Atmospheric Measurement Techniques ; 7, no. 12 ; Nr. 7, no. 12 (2014-12-11), S.4387-4399 |
Datensatznummer |
250115991
|
Publikation (Nr.) |
copernicus.org/amt-7-4387-2014.pdf |
|
|
|
Zusammenfassung |
Statistical models for environmental monitoring strongly rely on automatic
data acquisition systems that use various physical sensors. Often, sensor
readings are missing for extended periods of time, while model outputs need
to be continuously available in real time. With a case study in solar-radiation nowcasting, we investigate how to deal with massively missing data
(around 50% of the time some data are unavailable) in such situations.
Our goal is to analyze characteristics of missing data and recommend
a strategy for deploying regression models which would be robust to missing
data in situations where data are massively missing. We are after one model
that performs well at all times, with and without data gaps. Due to the need
to provide instantaneous outputs with minimum energy consumption for
computing in the data streaming setting, we dismiss computationally demanding
data imputation methods and resort to a mean replacement, accompanied with a
robust regression model. We use an established strategy for assessing
different regression models and for determining how many missing sensor readings
can be tolerated before model outputs become obsolete. We experimentally
analyze the accuracies and robustness to missing data of seven linear regression
models. We recommend using the regularized PCA regression with our established
guideline in training regression models, which themselves are robust to
missing data. |
|
|
Teil von |
|
|
|
|
|
|