Programming Case: A Methodology for Programmatic Web Data Extraction
Document Type
Article
Publication Date
2017
Publication Title
Journal of Technology Research
ISSN
1941-3416
Abstract
Web scraping is a programmatic technique for extracting data from websites using software to simulate human navigation of webpages, with the purpose of automatically extracting data from the web. While many websites provide web services allowing users to consume their services for data transfer, other websites provide no such service(s) and it is incumbent on the user to write or use existing software to acquire the data. The purpose of this paper is to provide a methodology for development of a relatively simple program using the Microsoft Excel Web Query tool and Visual Basic for Applications that will programmatically extract webpage data that are not readily transferable or available in other electronic forms. The case presents an overview of web scraping with an application to extracting historical stock price data from Yahoo’s Finance® website. The case is suitable for students that have experience in an object-oriented a programming course, and further exposes students to using Excel and VBA, along with knowledge of basic webpage structure, to harvest data from the web. It is hoped that this paper can be used as a teaching and learning tool, as well as a basic template for academicians, students and practitioners that need to consume website data when data extraction web services are not readily available. The paper can also add value to student’s programming experience in the context of programming for a purpose.
Recommended Citation
Dyer, John N..
2017.
"Programming Case: A Methodology for Programmatic Web Data Extraction."
Journal of Technology Research, 7: 1-27.
https://digitalcommons.georgiasouthern.edu/info-sys-facpubs/106