Publication Date

2016

Major

Information Systems (BBA)

Release Option

Open Access

Faculty Mentor

Dr. John N. Dyer

Abstract

Web scraping refers to a software program that mimics human web surfing behavior by pointing to a website and collecting large amounts of data that would otherwise be difficult for a human to extract. A typical program will extract both unstructured and semi-structured data, as well as images, and convert the data into a structured format. Web scraping is commonly used to facilitate online price comparisons, aggregate contact information, extract online product catalog data, extract economic/demographic/statistical data, and create web mashups, among other uses. Additionally, in the era of big data, semantic analysis, and business intelligence, web scraping is the only option for data extraction as many individuals and organizations need to consume large amounts of data that reside on the web. Although many users and organizations program their own web scrapers, there are scores of freely available programs and web-browser add-ins that can facilitate web scraping. This paper demonstrates web scraping using a free program named Data Toolbar® to extract data from Amazon.com. It is hoped that the paper will expose academicians, students and practitioners to not only the concept and necessity of web scraping, but the available software as well.

Recommended Citation

Neil, Yolande, "Web Scraping the Easy Way" (2016). Honors College Theses. 201.
https://digitalcommons.georgiasouthern.edu/honors-theses/201

Download

Included in

Business Administration, Management, and Operations Commons, Databases and Information Systems Commons, Management Information Systems Commons

COinS

Honors College Theses

Web Scraping the Easy Way

Publication Date

Major

Release Option

Faculty Mentor

Abstract

Recommended Citation

Included in

Search GS Commons

Browse GS Commons

About GS Commons

Honors College Theses

Web Scraping the Easy Way

Name

Publication Date

Major

Release Option

Faculty Mentor

Abstract

Recommended Citation

Included in

Share

Search GS Commons

Browse GS Commons

About GS Commons