College of Graduate Studies: Theses & Dissertations

Term of Award

Spring 2026

Degree Name

Master of Science, Computer Science (M.S.C.S.)

Document Type and Release Option

Thesis (open access)

Copyright Statement / License for Reuse

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Department of Computer Science

Committee Chair

Yao Xu

Committee Member 1

Lixin Li

Committee Member 2

Hong Zhang

Abstract

Flight delays pose persistent challenges to the efficiency and reliability of air transportation systems, affecting airlines, airports, regulators, and passengers alike. As traffic demand grows and operational environments become increasingly interconnected, accurately predicting both departure and arrival delays has become crucial for effective planning and mitigation. This study presents a network-aware, airline-specific framework for predicting flight delays in U.S. domestic air transportation systems using tree-based ensemble machine learning models. A large-scale dataset of 1.98 million flights, enriched with weather information, is used to develop predictive models for both departure and arrival delays. To capture the structural and operational complexity of the aviation network, the study integrates temporal features, historical delay patterns, and network centrality measures derived from directed origin–destination graphs. Models are trained on both the full dataset and airline-specific subsets representing the five largest U.S. carriers, enabling a direct comparison between system-wide and airline-level modeling approaches. A novel structured feature selection framework based on mutual information and correlation is applied to reduce redundancy and improve model robustness. Experimental results show that tree-based ensemble methods, particularly Random Forest and Extra Trees, achieve the strongest performance across all datasets. Airline-specific models consistently outperform system-wide models, demonstrating improved accuracy, recall, and overall predictive stability. Feature importance analysis reveals that delay outcomes are primarily driven by seasonal patterns, schedule timing, and historical delay propagation, while the influence of network connectivity and weather conditions varies systematically across airlines. Overall, the findings highlight the importance of tailored, interpretable machine learning frameworks for flight delay prediction. By combining predictive accuracy with operational insight, this study contributes to a more nuanced understanding of delay dynamics and offers practical implications for airline operations and air traffic management.

Research Data and Supplementary Material

No

Share

COinS