College of Graduate Studies: Theses & Dissertations

A Unified Automated and Explainable AI Framework for Drug Discovery, Clinical Diagnostics, and Public Health Data Prediction

Lord Coffie, Georgia Southern UniversityFollow

Term of Award

Spring 2026

Degree Name

Master of Science, Information Technology

Document Type and Release Option

Thesis (open access)

Copyright Statement / License for Reuse

Digital Commons@Georgia Southern License

Department

Department of Information Technology

Committee Chair

Hayden Wimmer

Committee Member 1

Kim Jongyeop

Committee Member 2

Atef Mohamed

Abstract

This thesis investigates the development of automated, explainable artificial intelligence systems to improve automated scientific discovery, clinical diagnostic support, and public health decision-making. Three independent yet thematically aligned studies were conducted across cheminformatics, radiology, and population health. Study A introduces an automated and explainable machine learning pipeline for Quantitative Structure–Activity Relationship (QSAR) modeling using RDKit-derived molecular descriptors. The pipeline performs systematic preprocessing, consensus-based feature selection (mutual information, random forest importance, RFECV), and benchmarks six supervised models—Random Forest, XGBoost, Support Vector Machine, Balanced Random Forest, EasyEnsembleClassifier, and a blended meta-ensemble, achieving AUC = 0.90 and accuracy = 0.83, while ensuring interpretability through SHAP, LIME, permutation importance, partial dependence plots, and applicability domain assessment via Williams plot. Study B develops an automated multimodal radiology assistant that integrates chest X-ray images, radiologist eye-gaze sequences, and dictation audio. The framework consists of a gaze imitation module for attention prediction, a multimodal fusion transformer for report generation, and a disease classification model for cardiopulmonary findings. Experimental evaluation demonstrated improved diagnostic alignment, achieving F1 = 0.89 and higher ROUGE scores for report generation, compared with image-only models, confirming that multimodal cues enhance perceptual reasoning and narrative accuracy in chest X-ray interpretation. Study C addresses premature mortality as a county-level prediction problem using data from County Health Rankings and Roadmaps (3,196 counties; 770 variables reduced to 286 post-processing). Benchmark models, including Random Forest, XGBoost, SVM, MLP, and Logistic Regression, achieved F1 = 0.96 with 28 features and maintained performance (F1 = 0.95–0.96) with compact subsets (Top-10/Top-5/Top-3). A domain-restricted confirmatory evaluation attained F1 ≈ 0.99, while a tiered risk evaluation showed precision = 1.00 (top 10%) and ≈ 0.99 (top 20%), supporting deployable workload-aware screening. Collectively, the findings demonstrate that AI systems designed with interpretability, compactness, and multimodal alignment can enhance reliability and trust across molecular prediction, clinical diagnostic assistance, and population health risk assessment. The studies provide evidence that advances in automated, explainable AI not only improve the computational performance of predictive systems but also enhance their suitability for real-world decision-making in science, medicine, and policy

OCLC Number

1587445981

Catalog Permalink

https://galileo-georgiasouthern.primo.exlibrisgroup.com/permalink/01GALI_GASOUTH/c9nn09/alma9916658841602950

Recommended Citation

Coffie, Lord, "A Unified Automated and Explainable AI Framework for Drug Discovery, Clinical Diagnostics, and Public Health Data Prediction" (2026). College of Graduate Studies: Theses & Dissertations. 3081.
https://digitalcommons.georgiasouthern.edu/etd/3081

Research Data and Supplementary Material

Yes

Download

Included in

Chemicals and Drugs Commons, Diagnosis Commons, Health Information Technology Commons

COinS

College of Graduate Studies: Theses & Dissertations

A Unified Automated and Explainable AI Framework for Drug Discovery, Clinical Diagnostics, and Public Health Data Prediction

Term of Award

Degree Name

Document Type and Release Option

Copyright Statement / License for Reuse

Department

Committee Chair

Committee Member 1

Committee Member 2

Abstract

OCLC Number

Catalog Permalink

Recommended Citation

Research Data and Supplementary Material

Included in

Search GS Commons

Browse GS Commons

About GS Commons

Submission Guidelines

College of Graduate Studies: Theses & Dissertations

A Unified Automated and Explainable AI Framework for Drug Discovery, Clinical Diagnostics, and Public Health Data Prediction

Author

Term of Award

Degree Name

Document Type and Release Option

Copyright Statement / License for Reuse

Department

Committee Chair

Committee Member 1

Committee Member 2

Abstract

OCLC Number

Catalog Permalink

Recommended Citation

Research Data and Supplementary Material

Included in

Share

Search GS Commons

Browse GS Commons

About GS Commons

Submission Guidelines