Term of Award

Spring 2025

Degree Name

Master of Science, Information Technology

Document Type and Release Option

Thesis (open access)

Copyright Statement / License for Reuse

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Department of Information Technology

Committee Chair

Hayden Wimmer

Committee Member 1

Jongyeop Kim

Committee Member 2

Atef Mohamed

Abstract

This study aims to examine the use of machine learning (ML) and large language models (LLMs) in healthcare to enhance disease prediction, clinical decision-making, and information management. Five supervised ML models—Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Trees (DT), and Naïve Bayes (NB)—on three different computing platforms—Google Colab, Databricks, and Snowflake—were employed for disease classification. Data preprocessing included treating missing values, encoding categorical variables utilizing one-hot-encoding, feature scaling when needed, and tackling class imbalance with Synthetic Minority Over-sampling Technique (SMOTE) before an 80-20 train-test separation. Models were created with Scikit-learn (Google Collab), Spark MLlib (Databricks), and Snowpark (Snowflake), with resulting efficacy being measured by classification metrics (accuracy, precision, recall, F1-score, and AUC-ROC) and regression metrics (Mean Absolute Error, Mean Squared Error, Root Mean Squared Error and R2). The study also explores whether LLMs can generate concise summaries of oncology reports (HTML) to curb information overload further and inform clinical decision-making. The summaries were generated using pre-trained transformer models like BART, T5, and Pegasus and evaluated using BLEU, ROUGE, and BERT scores. Additionally, performance was compared against recursive (summary of summaries) and direct summarization techniques and outputs from conversational AI models (e.g., ChatGPT, Google NotebookLM).

OCLC Number

1521193846

Research Data and Supplementary Material

No

Share

COinS