Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications

Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014

Isaac Fung, Georgia Southern University, Jiann-Ping HsuFollow
Jingjing Yin, Georgia Southern University, Jiann-Ping Hsu College of Public HealthFollow
Keisha D. Pressley, Georgia Southern University, Jiann-Ping Hsu College of Public HealthFollow
Carmen Duke, Georgia Southern University, Jiann-Ping Hsu College of Public HealthFollow
Chen Mo, Georgia Southern University, Jiann-Ping Hsu College of Public HealthFollow
Hai Liang, The University of Hong KongFollow
King-Wa Fu, The University of Hong KongFollow
Zion Tsz Ho Tse, University of GeorgiaFollow
Su-I Hou, University of Central FloridaFollow

Document Type

Article

Publication Date

6-10-2019

Publication Title

MDPI

DOI

10.3390/data4020084

ISSN

2306-5729

Abstract

As a pedagogical demonstration of Twitter data analysis, a case study of HIV/AIDS-related tweets around World AIDS Day, 2014, was presented. This study examined if Twitter users from countries with various income levels responded differently to World AIDS Day. The performance of support vector machine (SVM) models as classifiers of relevant tweets was evaluated. A manual coding of 1,826 randomly sampled HIV/AIDS-related original tweets from November 30 through December 2, 2014 was completed. Logistic regression was applied to analyze the association between the World Bank-designated income level of users’ self-reported countries and Twitter contents. To identify the optimal SVM model, 1278 (70%) of the 1826 sampled tweets were randomly selected as the training set, and 548 (30%) served as the test set. Another 180 tweets were separately sampled and coded as the held-out dataset. Compared with tweets from low-income countries, tweets from the Organization for Economic Cooperation and Development countries had 60% lower odds to mention epidemiology (adjusted odds ratio, aOR = 0.404; 95% CI: 0.166, 0.981) and three times the odds to mention compassion/support (aOR = 3.080; 95% CI: 1.179, 8.047). Tweets from lower-middle-income countries had 79% lower odds than tweets from low-income countries to mention HIV-affected sub-populations (aOR = 0.213; 95% CI: 0.068, 0.664). The optimal SVM model was able to identify relevant tweets from the held-out dataset of 180 tweets with an accuracy (F1 score) of 0.72. This study demonstrated how students can be taught to analyze Twitter data using manual coding, regression models, and SVM models.

Comments

Recommended Citation

Fung, Isaac, Jingjing Yin, Keisha D. Pressley, Carmen Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse, Su-I Hou. 2019. "Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014." MDPI, 4 (2): MDPI. doi: 10.3390/data4020084 source: https://www.mdpi.com/2306-5729/4/2/84
https://digitalcommons.georgiasouthern.edu/bee-facpubs/134

Link to Full Text

COinS

Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications

Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014

Document Type

Publication Date

Publication Title

DOI

ISSN

Abstract

Comments

Recommended Citation

Search GS Commons

Browse GS Commons

About GS Commons

Links

Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications

Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014

Authors

Document Type

Publication Date

Publication Title

DOI

ISSN

Abstract

Comments

Recommended Citation

Share

Search GS Commons

Browse GS Commons

About GS Commons

Links