dot
Detailansicht
Katalogkarte GBA
Katalogkarte ISBD
Suche präzisieren
Drucken
Download RIS
Hier klicken, um den Treffer aus der Auswahl zu entfernen
Titel Ibmdbpy-spatial : An Open-source implementation of in-database geospatial analytics in Python
VerfasserIn Avipsa Roy, Edouard Fouché, Rafael Rodriguez Morales, Gregor Moehler
Konferenz EGU General Assembly 2017
Medientyp Artikel
Sprache en
Digitales Dokument PDF
Erschienen In: GRA - Volume 19 (2017)
Datensatznummer 250146195
Publikation (Nr.) Volltext-Dokument vorhandenEGU/EGU2017-10205.pdf
 
Zusammenfassung
\begin{document} As the amount of spatial data acquired from several geodetic sources has grown over the years and as data infrastructure has become more powerful, the need for adoption of in-database analytic technology within geosciences has grown rapidly. In-database analytics on spatial data stored in a traditional enterprise data warehouse enables much faster retrieval and analysis for making better predictions about risks and opportunities, identifying trends and spot anomalies. Although there are a number of open-source spatial analysis libraries like \textit{geopandas} and \textit{shapely} available today, most of them have been restricted to manipulation and analysis of geometric objects with a dependency on GEOS and similar libraries. We present an open-source software package, written in Python, to fill the gap between spatial analysis and in-database analytics. \textit{Ibmdbpy-spatial} provides a geospatial extension to the \textit{ibmdbpy} package, implemented in 2015. It provides an interface for spatial data manipulation and access to in-database algorithms in \textit{IBM dashDB}, a data warehouse platform with a spatial extender that runs as a service on IBM's cloud platform called \textit{Bluemix}. Working in-database reduces the network overload, as the complete data need not be replicated into the user's local system altogether and only a subset of the entire dataset can be fetched into memory in a single instance. \textit{Ibmdbpy-spatial} accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution using the \textit{dashDB} spatial extender, thereby benefiting from in-database performance-enhancing features, such as columnar storage and parallel processing. The package is currently supported on Python versions from 2.7 up to 3.4. The basic architecture of the package consists of three main components - 1) a connection to the dashDB represented by the instance \textit{IdaDataBase}, which uses a middleware API namely - \textit{pypyodbc} or \textit{jaydebeapi} to establish the database connection via ODBC or JDBC respectively, 2) an instance to represent the spatial data stored in the database as a dataframe in Python, called the \textit{IdaGeoDataFrame}, with a specific geometry attribute which recognises a planar geometry column in \textit{dashDB} and 3) Python wrappers for spatial functions like \textit{within, distance, area, buffer} and more which \textit{dashDB} currently supports to make the querying process from Python much simpler for the users. The spatial functions translate well-known \textit{geopandas}-like syntax into SQL queries utilising the database connection to perform spatial operations in-database and can operate on single geometries as well two different geometries from different IdaGeoDataFrames. The in-database queries strictly follow the standards of OpenGIS Implementation Specification for Geographic information - Simple feature access for SQL. The results of the operations obtained can thereby be accessed dynamically via interactive Jupyter notebooks from any system which supports Python, without any additional dependencies and can also be combined with other open source libraries such as \textit{matplotlib} and \textit{folium} in-built within Jupyter notebooks for visualization purposes. We built a use case to analyse crime hotspots in New York city to validate our implementation and visualized the results as a choropleth map for each borough. \end{document}