Application of Regression Analysis on Text-Mining Data Associated with Autism Spectrum Disorder from Twitter: A Pilot Study
Document Type
Presentation
Presentation Date
3-26-2018
Abstract or Description
Social media has become a popular resource of health data analysis. Mathematics and computation techniques are challenging to public health practitioners when using the massive data from social media. Besides, it is difficult to interpret results from traditional machine learning techniques. This study proposes a simple new solution by regressing the primary outcome of interest (e.g., number of retweets of a tweet or whether a tweet contains certain keywords) on the frequency of common terms appeared in the tweet. This method reduces the term matrix based on the fitted regression scores, such as relative risk or odds ratio. It also solves the data sparsity issue and transforms text data into continuous summary scores. It would be easier to perform data analysis on social media data and interpret the results using the proposed scores. We used a twitter data of Autism Spectrum Disorder (ASD) and applied regression models for analysis, including poisson model, hurdle model and logistic model with model selection based on the Youden index. We found that the terms with significant results are generally present the key factors associated with ASD in the existing literature.
Sponsorship/Conference/Institution
Eastern North American Region International Biometric Society (ENAR)
Location
Atlanta, GA
Source
https://www.enar.org/meetings/spring2018/program/Final_Program.pdf
Recommended Citation
Mo, Chen, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tsz Ho Tse.
2018.
"Application of Regression Analysis on Text-Mining Data Associated with Autism Spectrum Disorder from Twitter: A Pilot Study."
Biostatistics Faculty Presentations.
Presentation 112.
source: https://www.enar.org/meetings/spring2018/program/Final_Program.pdf
https://digitalcommons.georgiasouthern.edu/biostat-facpres/112