Term of Award

Spring 2021

Degree Name

Doctor of Public Health in Biostatistics (Dr.P.H.)

Document Type and Release Option

Dissertation (restricted to Georgia Southern)

Copyright Statement / License for Reuse

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


College of Public Health

Committee Chair

Haresh Rochani

Committee Member 1

Hani Samawi

Committee Member 2

JingJing Yin


The area under the ROC curve (AUC) and the Youden index are standard measures of biomarkers' accuracy in medical diagnostics. Literature suggested that the AUC and the Youden index's joint confidence region provide a better estimation of biomarker's accuracy than the individual confidence interval for the given data. This dissertation introduced the close form solution (CFS) for the joint confidence region of the AUC and the Youden index estimates for a single continuous biomarker and the binary disease variable and compared it with the bootstrap method. The ties in the observed biomarker values were also analyzed by the close form solution (CFS) and bootstrap methods for various sample sizes. The second part of the dissertation introduced the missing data in the "gold standard" test. It explored the various range of the missing data (10% - 70%), assuming the ignorable missing mechanism (missing at random) for the disease variable. Further, it evaluated the performance of complete case analysis and other multiple imputation methods such as logistic regression (LR), predictive mean matching (PMM), and multivariate normal (MVN) to estimate the coverage probability and joint area for the AUC and the Youden index. Intensive simulation for these different settings suggested that for the smaller sample sizes and higher estimates of the AUC and the Youden index, logistic regression imputation is recommended. In contrast, for the larger sample sizes, lower disease rate, and smaller AUC and Youden index, either of the predictive mean matching or logistic regression imputation methods can be applied. Finally, the secondary data of the national health and nutritional examination survey (NHANS) recognizing diabetes (binary disease variable diabetes, continuous biomarker glycated hemoglobin and covariates age, gender, and race) has been used to evaluate the application of our methods in a real-world setting and using logistic regression multiple imputations with close form solution (CFS) method to impute and infer the joint results for missing binary disease variable.

Research Data and Supplementary Material


Available for download on Saturday, April 18, 2026