Information Systems and Analytics: Faculty Publications

Programming Case: A Methodology for Programmatic Web Data Extraction

John N. Dyer, Georgia Southern UniversityFollow

Document Type

Article

Publication Date

2017

Publication Title

Journal of Technology Research

ISSN

1941-3416

Abstract

Web scraping is a programmatic technique for extracting data from websites using software to simulate human navigation of webpages, with the purpose of automatically extracting data from the web. While many websites provide web services allowing users to consume their services for data transfer, other websites provide no such service(s) and it is incumbent on the user to write or use existing software to acquire the data. The purpose of this paper is to provide a methodology for development of a relatively simple program using the Microsoft Excel Web Query tool and Visual Basic for Applications that will programmatically extract webpage data that are not readily transferable or available in other electronic forms. The case presents an overview of web scraping with an application to extracting historical stock price data from Yahoo’s Finance® website. The case is suitable for students that have experience in an object-oriented a programming course, and further exposes students to using Excel and VBA, along with knowledge of basic webpage structure, to harvest data from the web. It is hoped that this paper can be used as a teaching and learning tool, as well as a basic template for academicians, students and practitioners that need to consume website data when data extraction web services are not readily available. The paper can also add value to student’s programming experience in the context of programming for a purpose.

Recommended Citation

Dyer, John N.. 2017. "Programming Case: A Methodology for Programmatic Web Data Extraction." Journal of Technology Research, 7: 1-27.
https://digitalcommons.georgiasouthern.edu/info-sys-facpubs/106

Link to Full Text

COinS

Information Systems and Analytics: Faculty Publications

Programming Case: A Methodology for Programmatic Web Data Extraction

Document Type

Publication Date

Publication Title

ISSN

Abstract

Recommended Citation

Search GS Commons

Browse GS Commons

About GS Commons

Information Systems and Analytics: Faculty Publications

Programming Case: A Methodology for Programmatic Web Data Extraction

Authors

Document Type

Publication Date

Publication Title

ISSN

Abstract

Recommended Citation

Share

Search GS Commons

Browse GS Commons

About GS Commons