College of Graduate Studies: Theses & Dissertations
Term of Award
Spring 2026
Degree Name
Master of Science, Information Technology
Document Type and Release Option
Thesis (open access)
Copyright Statement / License for Reuse

This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Department of Information Technology
Committee Chair
Atef Mohamed (Shalan)
Committee Member 1
Lei Chen
Committee Member 2
Hayden Wimmer
Abstract
Intensive Care Unit (ICU) patients do not follow a single uniform physiological pattern. Patients admitted with the same diagnosis show different clinical trajectories over time making standardized classification and treatment approaches insufficient. The increasing availability of large-scale electronic health records in MIMIC-IV makes it possible to investigate such heterogeneity through data-driven approach that captures how physiology evolves during the early phase of ICU admission. This thesis compares two analytical pipelines designed to identify physiological subtypes from the first 48 hours of ICU time series data. This study then assesses how well these subtypes predict in-hospital mortality. The first approach, referred to as Study 1, employs a compact temporal representation of 94 features derived from the selected set of patient vital signs and laboratory variables. The second pipeline, referred to as Study 2, expands this representation to 358 features by incorporating a broader set of physiological variables (15 vital signs and 22 laboratory variables) across general ICU without restricting to a specific diagnosis.
Both pipelines employ Principal Component Analysis (PCA) to reduce the dimensionality of the feature representation, which is then subjected to K-Means (K = 3) and Bayesian Gaussian Mixture Model (BGMM) clustering algorithms to discover the underlying patient subtypes. Supervised learning is employed using Logistic Regression and XGBoost classifiers to predict patient in-hospital mortality risk.
Study 1 achieves strong performance with an XGBoost AUROC of 0.85, an AUPRC of 0.63 along with good calibration reflected by a low Brier score. Study 2 achieved slightly lower performance with an XGBoost AUROC of 0.828, an AUPRC of 0.443, but a significantly improved Brier score of 0.0785. Study 2 also introduced SHAP (SHapley Additive exPlanations) analysis of the top features. It is determined that Oxygen Saturation (SpO2) variability, lactate slope, Acidity or Alkalinity of blood (pH) slope, creatinine, and lactate levels are the key features of in-hospital mortality prediction. The identified subtypes show clear clinical separation, with the high-severity group having an in-hospital mortality rate of 21.5% compared to 5.4% in the more stable group.
Overall, this comparative analysis implies that increasing the complexity of features does not necessarily improve the predictive performance of the model. A smaller and carefully selected feature set can achieve better classification, while a larger feature set provides deeper clinical insights and support interpretability.
OCLC Number
1588663753
Catalog Permalink
https://galileo-georgiasouthern.primo.exlibrisgroup.com/permalink/01GALI_GASOUTH/c9nn09/alma9916659742002950
Recommended Citation
Khadka, Rojeena, "Early ICU Physiological Subtyping From Time-Series Data: A Comparative Study of Feature Complexity, Predictive Performance, and Model Interpretability" (2026). College of Graduate Studies: Theses & Dissertations. 3134.
https://digitalcommons.georgiasouthern.edu/etd/3134
Research Data and Supplementary Material
No
Included in
Data Science Commons, Health Information Technology Commons, Theory and Algorithms Commons