Honors College Theses

Publication Date

2016

Major

Information Systems (BBA)

Document Type and Release Option

Thesis (open access)

Faculty Mentor

Dr. John N. Dyer

Abstract

Web scraping refers to a software program that mimics human web surfing behavior by pointing to a website and collecting large amounts of data that would otherwise be difficult for a human to extract. A typical program will extract both unstructured and semi-structured data, as well as images, and convert the data into a structured format. Web scraping is commonly used to facilitate online price comparisons, aggregate contact information, extract online product catalog data, extract economic/demographic/statistical data, and create web mashups, among other uses. Additionally, in the era of big data, semantic analysis, and business intelligence, web scraping is the only option for data extraction as many individuals and organizations need to consume large amounts of data that reside on the web. Although many users and organizations program their own web scrapers, there are scores of freely available programs and web-browser add-ins that can facilitate web scraping. This paper demonstrates web scraping using a free program named Data Toolbar® to extract data from Amazon.com. It is hoped that the paper will expose academicians, students and practitioners to not only the concept and necessity of web scraping, but the available software as well.

Share

COinS