Biostatistics, Epidemiology & Environmental Health Sciences: Faculty Publications

Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

Chen Mo, Georgia Southern University, Jiann-Ping Hsu College of Public HealthFollow
Jingjing Yin, Georgia Southern University, Jiann-Ping Hsu College of Public HealthFollow
Isaac Chun-Hai Fung, Georgia Southern University, Jiann-Ping Hsu College of Public HealthFollow
Zion Tse, University of YorkFollow

Document Type

Article

Publication Date

11-26-2021

Publication Title

European Journal of Investigation in Health, Psychology and Education

DOI

10.3390/ejihpe11040109

ISSN

2254-9625

Abstract

Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To illustrate the proposed method in detailed steps, we used three Twitter datasets on various topics: autism spectrum disorder, influenza, and violence against women. We found that our results were generally consistent with the critical factors associated with the specific public health topic in the existing literature. The proposed method could also classify tweets into different topic groups appropriately with consistent performance compared with existing text mining methods for automatic classification based on tweet contents.

Comments

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Recommended Citation

Mo, Chen, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tse. 2021. "Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification." European Journal of Investigation in Health, Psychology and Education, 11 (4): 1537-1554: MDPI. doi: 10.3390/ejihpe11040109
https://digitalcommons.georgiasouthern.edu/bee-facpubs/341

Download

Included in

Biostatistics Commons, Epidemiology Commons

COinS

Biostatistics, Epidemiology & Environmental Health Sciences: Faculty Publications

Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

Document Type

Publication Date

Publication Title

DOI

ISSN

Abstract

Comments

Recommended Citation

Included in

Search GS Commons

Browse GS Commons

About GS Commons

Biostatistics, Epidemiology & Environmental Health Sciences: Faculty Publications

Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

Authors

Document Type

Publication Date

Publication Title

DOI

ISSN

Abstract

Comments

Recommended Citation

Included in

Share

Search GS Commons

Browse GS Commons

About GS Commons