Honors College Theses
Publication Date
2016
Major
Information Systems (BBA)
Document Type and Release Option
Thesis (open access)
Faculty Mentor
Dr. John N. Dyer
Abstract
Web scraping refers to a software program that mimics human web surfing behavior by pointing to a website and collecting large amounts of data that would otherwise be difficult for a human to extract. A typical program will extract both unstructured and semi-structured data, as well as images, and convert the data into a structured format. Web scraping is commonly used to facilitate online price comparisons, aggregate contact information, extract online product catalog data, extract economic/demographic/statistical data, and create web mashups, among other uses. Additionally, in the era of big data, semantic analysis, and business intelligence, web scraping is the only option for data extraction as many individuals and organizations need to consume large amounts of data that reside on the web. Although many users and organizations program their own web scrapers, there are scores of freely available programs and web-browser add-ins that can facilitate web scraping. This paper demonstrates web scraping using a free program named Data Toolbar® to extract data from Amazon.com. It is hoped that the paper will expose academicians, students and practitioners to not only the concept and necessity of web scraping, but the available software as well.
Recommended Citation
Neil, Yolande, "Web Scraping the Easy Way" (2016). Honors College Theses. 201.
https://digitalcommons.georgiasouthern.edu/honors-theses/201
Included in
Business Administration, Management, and Operations Commons, Databases and Information Systems Commons, Management Information Systems Commons