Application of Regression Analysis on Text-Mining Data Associated with Autism Spectrum Disorder from Twitter: A Pilot Study

Document Type

Presentation

Presentation Date

3-26-2018

Abstract or Description

Social media has become a popular resource of health data analysis. Mathematics and computation techniques are challenging to public health practitioners when using the massive data from social media. Besides, it is difficult to interpret results from traditional machine learning techniques. This study proposes a simple new solution by regressing the primary outcome of interest (e.g., number of retweets of a tweet or whether a tweet contains certain keywords) on the frequency of common terms appeared in the tweet. This method reduces the term matrix based on the fitted regression scores, such as relative risk or odds ratio. It also solves the data sparsity issue and transforms text data into continuous summary scores. It would be easier to perform data analysis on social media data and interpret the results using the proposed scores. We used a twitter data of Autism Spectrum Disorder (ASD) and applied regression models for analysis, including poisson model, hurdle model and logistic model with model selection based on the Youden index. We found that the terms with significant results are generally present the key factors associated with ASD in the existing literature.

Sponsorship/Conference/Institution

Eastern North American Region International Biometric Society (ENAR)

Location

Atlanta, GA

Source

https://www.enar.org/meetings/spring2018/program/Final_Program.pdf

Share

COinS