College of Graduate Studies: Theses & Dissertations
Term of Award
Spring 2026
Degree Name
Master of Science, Information Technology
Document Type and Release Option
Thesis (open access)
Copyright Statement / License for Reuse
Digital Commons@Georgia Southern License
Department
Department of Information Technology
Committee Chair
Hayden Wimmer
Committee Member 1
Kim Jongyeop
Committee Member 2
Atef Mohamed
Abstract
This thesis investigates the development of automated, explainable artificial intelligence systems to improve automated scientific discovery, clinical diagnostic support, and public health decision-making. Three independent yet thematically aligned studies were conducted across cheminformatics, radiology, and population health. Study A introduces an automated and explainable machine learning pipeline for Quantitative Structure–Activity Relationship (QSAR) modeling using RDKit-derived molecular descriptors. The pipeline performs systematic preprocessing, consensus-based feature selection (mutual information, random forest importance, RFECV), and benchmarks six supervised models—Random Forest, XGBoost, Support Vector Machine, Balanced Random Forest, EasyEnsembleClassifier, and a blended meta-ensemble, achieving AUC = 0.90 and accuracy = 0.83, while ensuring interpretability through SHAP, LIME, permutation importance, partial dependence plots, and applicability domain assessment via Williams plot. Study B develops an automated multimodal radiology assistant that integrates chest X-ray images, radiologist eye-gaze sequences, and dictation audio. The framework consists of a gaze imitation module for attention prediction, a multimodal fusion transformer for report generation, and a disease classification model for cardiopulmonary findings. Experimental evaluation demonstrated improved diagnostic alignment, achieving F1 = 0.89 and higher ROUGE scores for report generation, compared with image-only models, confirming that multimodal cues enhance perceptual reasoning and narrative accuracy in chest X-ray interpretation. Study C addresses premature mortality as a county-level prediction problem using data from County Health Rankings and Roadmaps (3,196 counties; 770 variables reduced to 286 post-processing). Benchmark models, including Random Forest, XGBoost, SVM, MLP, and Logistic Regression, achieved F1 = 0.96 with 28 features and maintained performance (F1 = 0.95–0.96) with compact subsets (Top-10/Top-5/Top-3). A domain-restricted confirmatory evaluation attained F1 ≈ 0.99, while a tiered risk evaluation showed precision = 1.00 (top 10%) and ≈ 0.99 (top 20%), supporting deployable workload-aware screening. Collectively, the findings demonstrate that AI systems designed with interpretability, compactness, and multimodal alignment can enhance reliability and trust across molecular prediction, clinical diagnostic assistance, and population health risk assessment. The studies provide evidence that advances in automated, explainable AI not only improve the computational performance of predictive systems but also enhance their suitability for real-world decision-making in science, medicine, and policy
Recommended Citation
Coffie, Lord, "A Unified Automated and Explainable AI Framework for Drug Discovery, Clinical Diagnostics, and Public Health Data Prediction" (2026). College of Graduate Studies: Theses & Dissertations. 3081.
https://digitalcommons.georgiasouthern.edu/etd/3081
Research Data and Supplementary Material
Yes