dot
Detailansicht
Katalogkarte GBA
Katalogkarte ISBD
Suche präzisieren
Drucken
Download RIS
Hier klicken, um den Treffer aus der Auswahl zu entfernen
Titel Quest for Value in Big Earth Data
VerfasserIn Kwo-Sen Kuo, Amidu O. Oloso, Mike L Rilee, Khoa Doan, Thomas L. Clune, Hongfeng Yu
Konferenz EGU General Assembly 2017
Medientyp Artikel
Sprache en
Digitales Dokument PDF
Erschienen In: GRA - Volume 19 (2017)
Datensatznummer 250144571
Publikation (Nr.) Volltext-Dokument vorhandenEGU/EGU2017-8413.pdf
 
Zusammenfassung
Among all the V’s of Big Data challenges, such as Volume, Variety, Velocity, Veracity, etc., we believe Value is the ultimate determinant, because a system delivering better value has a competitive edge over others. Although it is not straightforward to assess the value of scientific endeavors, we believe the ratio of scientific productivity increase to investment is a reasonable measure. Our research in Big Data approaches to data-intensive analysis for Earth Science has yielded some insights, as well as evidences, as to how optimal value might be attained. The first insight is that we should avoid, as much as possible, moving data through connections with relatively low bandwidth. That is, we recognize that moving data is expensive, albeit inevitable. They must at least be moved from the storage device into computer main memory and then to CPU registers for computation. When data must be moved it is better to move them via relatively high-bandwidth connections and avoid low-bandwidth ones. For this reason, a technology that can best exploit data locality will have an advantage over others. Data locality is easy to achieve and exploit with only one dataset. With multiple datasets, data colocation becomes important in addition to data locality. However, the organization of datasets can only be co-located for certain types of analyses. It is impossible for them to be co-located for all analyses. Therefore, our second insight is that we need to co-locate the datasets for the most commonly used analyses. In Earth Science, we believe the most common analysis requirement is “spatiotemporal coincidence”. For example, when we analyze precipitation systems, we often would like to know the environment conditions “where and when” (i.e. at the same location and time) there is precipitation. This “where and when” indicates the “spatiotemporal coincidence” requirement. Thus, an associated insight is that datasets need to be partitioned per the physical dimensions, i.e. space and time, rather than their array index dimensions to achieve co-location for spatiotemporal coincidence. This leads further to the insight that, in terms of optimizing Value, achieving good scalability in Variety is more crucial than good scalability in Volume. Therefore, we will discuss our innovative approach to improving productivity by homogenizing the daunting varieties in Earth Science data to enable data co-location systematically. In addition, a Big Data system incorporating the capabilities described above has the potential to drastically shorten the data preparation period of machine learning, better facilitate automated machine learning operations, and further boost scientific productivity.