Predicting Breast Cancer Diagnosis Using Logistic Regression

Faculty Mentor

Dr. Ibrahim Alliu

Location

Russell Union Ballroom

Type of Research

On-going

Session Format

Poster Presentation

College

Jiann-Ping Hsu College of Public Health

Department

Epidemiology Biostatistics and Environmental Sciences

Abstract

Early detection of breast cancer remains a cornerstone of improving patient survival, yet accurately distinguishing malignant from benign tumors continues to challenge clinicians and data scientists alike. This study develops and rigorously evaluates a logistic regression model designed to predict breast cancer status using key tumor characteristics including mean radius, texture, perimeter, area, and smoothness derived from a well-established clinical dataset. Through a combination of stepwise model selection, goodness-of-fit diagnostics, variance inflation factor (VIF) assessment, and effect visualization, the final model identifies the most influential predictors while addressing issues of multicollinearity.

The model demonstrates strong explanatory power, achieving a McFadden’s pseudo R^2 of 0.76 and excellent predictive performance in both training and testing datasets, with accuracies exceeding 93%. ROC curve analysis further confirms the model’s reliability, yielding an exceptionally high AUC of 0.99, indicating near-perfect discrimination between malignant and non-malignant cases. Confusion matrix assessments highlight balanced sensitivity and specificity, underscoring the model’s clinical relevance for early detection.

By integrating statistical rigor with interpretable modeling approaches, this research provides a transparent and highly accurate framework for breast cancer prediction. The findings reinforce the potential of classical statistical models when meticulously optimized to support precision diagnostics and strengthen decision-making in oncological practice.

Program Description

.

Start Date

4-23-2026 2:00 PM

End Date

4-23-2026 4:00 PM

This document is currently not available here.

Share

COinS
 
Apr 23rd, 2:00 PM Apr 23rd, 4:00 PM

Predicting Breast Cancer Diagnosis Using Logistic Regression

Russell Union Ballroom

Early detection of breast cancer remains a cornerstone of improving patient survival, yet accurately distinguishing malignant from benign tumors continues to challenge clinicians and data scientists alike. This study develops and rigorously evaluates a logistic regression model designed to predict breast cancer status using key tumor characteristics including mean radius, texture, perimeter, area, and smoothness derived from a well-established clinical dataset. Through a combination of stepwise model selection, goodness-of-fit diagnostics, variance inflation factor (VIF) assessment, and effect visualization, the final model identifies the most influential predictors while addressing issues of multicollinearity.

The model demonstrates strong explanatory power, achieving a McFadden’s pseudo R^2 of 0.76 and excellent predictive performance in both training and testing datasets, with accuracies exceeding 93%. ROC curve analysis further confirms the model’s reliability, yielding an exceptionally high AUC of 0.99, indicating near-perfect discrimination between malignant and non-malignant cases. Confusion matrix assessments highlight balanced sensitivity and specificity, underscoring the model’s clinical relevance for early detection.

By integrating statistical rigor with interpretable modeling approaches, this research provides a transparent and highly accurate framework for breast cancer prediction. The findings reinforce the potential of classical statistical models when meticulously optimized to support precision diagnostics and strengthen decision-making in oncological practice.