College of Graduate Studies: Theses & Dissertations

Term of Award

Spring 2026

Degree Name

Master of Science, Information Technology

Document Type and Release Option

Thesis (open access)

Copyright Statement / License for Reuse

Digital Commons@Georgia Southern License

Department

Department of Information Technology

Committee Chair

Hayden Wimmer

Committee Member 1

Kim Jongyeop

Committee Member 2

Atef Mohamed

Abstract

This thesis investigates the development of automated, explainable artificial intelligence systems to improve automated scientific discovery, clinical diagnostic support, and public health decision-making. Three independent yet thematically aligned studies were conducted across cheminformatics, radiology, and population health. Study A introduces an automated and explainable machine learning pipeline for Quantitative Structure–Activity Relationship (QSAR) modeling using RDKit-derived molecular descriptors. The pipeline performs systematic preprocessing, consensus-based feature selection (mutual information, random forest importance, RFECV), and benchmarks six supervised models—Random Forest, XGBoost, Support Vector Machine, Balanced Random Forest, EasyEnsembleClassifier, and a blended meta-ensemble, achieving AUC = 0.90 and accuracy = 0.83, while ensuring interpretability through SHAP, LIME, permutation importance, partial dependence plots, and applicability domain assessment via Williams plot. Study B develops an automated multimodal radiology assistant that integrates chest X-ray images, radiologist eye-gaze sequences, and dictation audio. The framework consists of a gaze imitation module for attention prediction, a multimodal fusion transformer for report generation, and a disease classification model for cardiopulmonary findings. Experimental evaluation demonstrated improved diagnostic alignment, achieving F1 = 0.89 and higher ROUGE scores for report generation, compared with image-only models, confirming that multimodal cues enhance perceptual reasoning and narrative accuracy in chest X-ray interpretation. Study C addresses premature mortality as a county-level prediction problem using data from County Health Rankings and Roadmaps (3,196 counties; 770 variables reduced to 286 post-processing). Benchmark models, including Random Forest, XGBoost, SVM, MLP, and Logistic Regression, achieved F1 = 0.96 with 28 features and maintained performance (F1 = 0.95–0.96) with compact subsets (Top-10/Top-5/Top-3). A domain-restricted confirmatory evaluation attained F1 ≈ 0.99, while a tiered risk evaluation showed precision = 1.00 (top 10%) and ≈ 0.99 (top 20%), supporting deployable workload-aware screening. Collectively, the findings demonstrate that AI systems designed with interpretability, compactness, and multimodal alignment can enhance reliability and trust across molecular prediction, clinical diagnostic assistance, and population health risk assessment. The studies provide evidence that advances in automated, explainable AI not only improve the computational performance of predictive systems but also enhance their suitability for real-world decision-making in science, medicine, and policy

Research Data and Supplementary Material

Yes

Share

COinS