Term of Award

Spring 2024

Degree Name

Doctor of Public Health in Biostatistics (Dr.P.H.)

Document Type and Release Option

Dissertation (open access)

Copyright Statement / License for Reuse

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


Department of Biostatistics, Epidemiology, and Environmental Health Sciences

Committee Chair

Haresh Rochani

Committee Member 1

Hani Samawi

Committee Member 2

Jing Kersey


Missing data is a common issue encountered in research studies pertaining to the life and social sciences. An analysis that does not take into account this issue may lead to biased and/or unreliable results and, therefore, raise concerns about the study's validity. The procedure of Multiple Imputation (MI) has been proposed as a solution and has shown an optimal performance; however, its application in high-dimensional data (where the number of variables is higher than the sample size: p>>n) is challenging. Zhao and Long (2016) evaluated the utilization of different procedures for incorporating regularization for model selection in the application of MI and found that Bayesian Lasso had a superior and optimal performance under the Missing at Random (MAR) mechanism. This dissertation extends further the utilization of Bayesian methods for MI in high-dimensional data. More specifically, the simulation studies show that a novel Bayesian procedure (Bayesian Best Subset Selection – BBSS) introduced by Posch et al. (2020) has an optimal performance under MAR and is comparable to the Bayesian LASSO used by Zhao and Long (2016). Furthermore, the application of MI through BBSS allows for the incorporation of the belief about the importance of specific variables for MI and the shift in distribution between the observed and missing data. This makes BBSS an effective solution for addressing some Missing not at Random (MNAR) mechanisms and implementation of Sensitivity Analysis. This is confirmed by the simulation study with MNAR data, where BBSS achieved a superior and optimal performance, while the other methods did not. Data from a genomic study for Prostate Cancer are used to show the application of Sensitivity Analysis by using BBSS and accounting for different assumptions about the missing mechanism

OCLC Number


Research Data and Supplementary Material


Available for download on Monday, April 23, 2029