Computer Science: Faculty Presentations

Learning Big Data on Spark for the Optimal IDW-Based Spatiotemporal Interpolation

Weitian Tong, Georgia Southern UniversityFollow
Xiaolu Zhou, Georgia Southern UniversityFollow
Lixin Li, Georgia Southern UniversityFollow
Gina Besenyi, Augusta UniversityFollow
Heather Yates, Augusta University

Document Type

Presentation

Presentation Date

4-9-2017

Abstract or Description

To better assess the relationships between environmental exposures and health outcomes, an appropriate spatiotemporal interpolation is critical. Usually, air pollution data is collected at a limited number of monitoring locations and with a non-continuous manner. Traditional spatiotemporal methods treat space and time separately when interpolating the pollution data in the continuous space-time domain. Such interpolation results may be far away from the satisfaction. Li et al. (2004) proposed the extension approach to incorporate spatial and temporal dimensions simultaneously by treating time as another dimension in space. Unfortunately, modern work on spatiotemporal interpolation utilized simplistic methods to scale the range of the time dimension. Besides, due to the large data sets, experiments are usually very expensive in running time. Based on a recent work by Li et al. (2014), we develop an IDW (Inverse Distance Weighting)-based spatiotemporal interpolation, employ the efficient k-d tree structure to store data, combine the extension approach with machine learning methods, such as k-fold cross validation and bootstrap aggregating, to learn optimal parameters. Furthermore, we implement our method on Apache Spark, which is a lightning-fast cluster computing framework and represents the avant-garde of big data processing tools. Our experimental results demonstrate the computational power and improved performance of our method, which significantly outperforms the previous work in terms of speed and accuracy.

Sponsorship/Conference/Institution

American Association of Geographers Annual Meeting (AAG)

Location

Boston, MA

Recommended Citation

Tong, Weitian, Xiaolu Zhou, Lixin Li, Gina Besenyi, Heather Yates. 2017. "Learning Big Data on Spark for the Optimal IDW-Based Spatiotemporal Interpolation." Computer Science: Faculty Presentations. Presentation 5.
https://digitalcommons.georgiasouthern.edu/compsci-facpres/5

Link to Full Text

COinS

Computer Science: Faculty Presentations

Learning Big Data on Spark for the Optimal IDW-Based Spatiotemporal Interpolation

Document Type

Presentation Date

Abstract or Description

Sponsorship/Conference/Institution

Location

Recommended Citation

Search GS Commons

Browse GS Commons

About GS Commons

Computer Science: Faculty Presentations

Learning Big Data on Spark for the Optimal IDW-Based Spatiotemporal Interpolation

Presenters/Authors

Document Type

Presentation Date

Abstract or Description

Sponsorship/Conference/Institution

Location

Recommended Citation

Share

Search GS Commons

Browse GS Commons

About GS Commons