Term of Award
Spring 2024
Degree Name
Doctor of Public Health in Biostatistics (Dr.P.H.)
Document Type and Release Option
Dissertation (open access)
Copyright Statement / License for Reuse
This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Department of Biostatistics, Epidemiology, and Environmental Health Sciences
Committee Chair
Haresh Rochani
Committee Member 1
Hani Samawi
Committee Member 2
Jing Kersey
Abstract
Missing data is a common issue encountered in research studies pertaining to the life and social sciences. An analysis that does not take into account this issue may lead to biased and/or unreliable results and, therefore, raise concerns about the study's validity. The procedure of Multiple Imputation (MI) has been proposed as a solution and has shown an optimal performance; however, its application in high-dimensional data (where the number of variables is higher than the sample size: p>>n) is challenging. Zhao and Long (2016) evaluated the utilization of different procedures for incorporating regularization for model selection in the application of MI and found that Bayesian Lasso had a superior and optimal performance under the Missing at Random (MAR) mechanism. This dissertation extends further the utilization of Bayesian methods for MI in high-dimensional data. More specifically, the simulation studies show that a novel Bayesian procedure (Bayesian Best Subset Selection – BBSS) introduced by Posch et al. (2020) has an optimal performance under MAR and is comparable to the Bayesian LASSO used by Zhao and Long (2016). Furthermore, the application of MI through BBSS allows for the incorporation of the belief about the importance of specific variables for MI and the shift in distribution between the observed and missing data. This makes BBSS an effective solution for addressing some Missing not at Random (MNAR) mechanisms and implementation of Sensitivity Analysis. This is confirmed by the simulation study with MNAR data, where BBSS achieved a superior and optimal performance, while the other methods did not. Data from a genomic study for Prostate Cancer are used to show the application of Sensitivity Analysis by using BBSS and accounting for different assumptions about the missing mechanism
OCLC Number
1433096274
Catalog Permalink
https://galileo-georgiasouthern.primo.exlibrisgroup.com/permalink/01GALI_GASOUTH/1r4bu70/alma9916570847502950
Recommended Citation
Keko, Mario, "Multiple Imputation in High-Dimensional Data Under Nonignorable Missing Mechanisms Using Bayesian Best Subset Selection" (2024). Electronic Theses and Dissertations. 2771.
https://digitalcommons.georgiasouthern.edu/etd/2771
Research Data and Supplementary Material
No