Term of Award
Spring 2025
Degree Name
Master of Science, Information Technology
Document Type and Release Option
Thesis (open access)
Copyright Statement / License for Reuse

This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Department of Information Technology
Committee Chair
Hayden Wimmer
Committee Member 1
Jongyeop Kim
Committee Member 2
Atef Mohamed
Abstract
This study aims to examine the use of machine learning (ML) and large language models (LLMs) in healthcare to enhance disease prediction, clinical decision-making, and information management. Five supervised ML models—Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Trees (DT), and Naïve Bayes (NB)—on three different computing platforms—Google Colab, Databricks, and Snowflake—were employed for disease classification. Data preprocessing included treating missing values, encoding categorical variables utilizing one-hot-encoding, feature scaling when needed, and tackling class imbalance with Synthetic Minority Over-sampling Technique (SMOTE) before an 80-20 train-test separation. Models were created with Scikit-learn (Google Collab), Spark MLlib (Databricks), and Snowpark (Snowflake), with resulting efficacy being measured by classification metrics (accuracy, precision, recall, F1-score, and AUC-ROC) and regression metrics (Mean Absolute Error, Mean Squared Error, Root Mean Squared Error and R2). The study also explores whether LLMs can generate concise summaries of oncology reports (HTML) to curb information overload further and inform clinical decision-making. The summaries were generated using pre-trained transformer models like BART, T5, and Pegasus and evaluated using BLEU, ROUGE, and BERT scores. Additionally, performance was compared against recursive (summary of summaries) and direct summarization techniques and outputs from conversational AI models (e.g., ChatGPT, Google NotebookLM).
OCLC Number
1521193846
Catalog Permalink
https://galileo-georgiasouthern.primo.exlibrisgroup.com/permalink/01GALI_GASOUTH/1r4bu70/alma9916621325502950
Recommended Citation
Izuchukwu, Chiazam Chisom, "Application of Machine Learning and Large Language Models in Healthcare for Data Prediction and Summarization" (2025). Theses & Dissertations. 2919.
https://digitalcommons.georgiasouthern.edu/etd/2919
Research Data and Supplementary Material
No
Included in
Artificial Intelligence and Robotics Commons, Data Science Commons, Health Information Technology Commons