Learning Big Data on Spark for the Optimal IDW-Based Spatiotemporal Interpolation

Document Type

Conference Proceeding

Publication Date

4-9-2017

Publication Title

Proceedings of the Association of American Geographers Annual Meeting

Abstract

To better assess the relationships between environmental exposures and health outcomes, an appropriate spatiotemporal interpolation is critical. Usually, air pollution data is collected at a limited number of monitoring locations and with a non-continuous manner. Traditional spatiotemporal methods treat space and time separately when interpolating the pollution data in the continuous space-time domain. Such interpolation results may be far away from the satisfaction. Li et al. (2004) proposed the extension approach to incorporate spatial and temporal dimensions simultaneously by treating time as another dimension in space. Unfortunately, modern work on spatiotemporal interpolation utilized simplistic methods to scale the range of the time dimension. Besides, due to the large data sets, experiments are usually very expensive in running time. Based on a recent work by Li et al. (2014), we develop an IDW (Inverse Distance Weighting)-based spatiotemporal interpolation, employ the efficient k-d tree structure to store data, combine the extension approach with machine learning methods, such as k-fold cross validation and bootstrap aggregating, to learn optimal parameters. Furthermore, we implement our method on Apache Spark, which is a lightning-fast cluster computing framework and represents the avant-garde of big data processing tools. Our experimental results demonstrate the computational power and improved performance of our method, which significantly outperforms the previous work in terms of speed and accuracy.

Recommended Citation

Tong, Weitian, Xiaolu Zhou, Lixin Li, Gina Besenyi, Jason Franklin, Heather Yates. 2017. "Learning Big Data on Spark for the Optimal IDW-Based Spatiotemporal Interpolation." Proceedings of the Association of American Geographers Annual Meeting Boston, MA: Association of American Geographers. source: http://app.core-apps.com/aagam2017/abstract/dcb181785f7fd70d2b1bd16d5acf7d5a
https://digitalcommons.georgiasouthern.edu/compsci-facpubs/97

Copyright

This work is archived and distributed under the repository's Standard Copyright and Reuse License (opens in new tab). End users may copy, store, and distribute this work without restriction. For all other uses, permission must be obtained from the copyright owners or their authorized agents.

Computer Science: Faculty Publications

Learning Big Data on Spark for the Optimal IDW-Based Spatiotemporal Interpolation

Document Type

Publication Date

Publication Title

Abstract

Recommended Citation

Copyright

Search GS Commons

Browse GS Commons

About GS Commons

Computer Science: Faculty Publications

Learning Big Data on Spark for the Optimal IDW-Based Spatiotemporal Interpolation

Authors

Document Type

Publication Date

Publication Title

Abstract

Recommended Citation

Copyright

Share

Search GS Commons

Browse GS Commons

About GS Commons