Predictors of Diabetes Among Adult Pima Indian Women: A Multivariable and Stepwise Logistic Regression Analysis
Faculty Mentor
Dr Aliu Ibrahim
Location
Russell Union Ballroom
Type of Research
Completed
Session Format
Poster Presentation
College
Jiann-Ping Hsu College of Public Health
Department
Epidemiology and biostatisitc
Abstract
Introduction: Globally, Diabetes affects an estimated 589 million adults and caused approximately 3.4 million deaths equivalent to one death every six seconds, with type 2 diabetes accounting for most cases. The Pima Indians of Arizona have one of the highest documented prevalence rates of type 2 diabetes globally, making their population central to diabetes research. Although the Pima Diabetes Dataset has been widely used in predictive and machine-learning studies, few investigations have applied rigorous multivariable epidemiologic methods to quantify the independent contributions of key demographic and metabolic risk factors. This study aimed to characterize diabetes risk factors and quantify independent predictors among Pima Indian women using multivariable and stepwise logistic regression. Methods: A cross-sectional analysis was conducted using data from 768 adult Pima Indian women in the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Pima Diabetes Dataset. After data cleaning, descriptive statistics, correlation analyses, and multicollinearity diagnostics were performed. A full multivariable logistic regression model including eight predictors was fitted. Stepwise logistic regression (SLENTRY = 0.15; SLSTAY = 0.15) was applied to derive a parsimonious model. Model performance was evaluated using Akaike Information Criterion (AIC), Hosmer–Lemeshow goodness-of-fit testing, and area under the receiver operating characteristic curve (AUC). Results: Five variables were retained in the final stepwise model: glucose, BMI, diabetes pedigree function, age, and pregnancies. Glucose, BMI, and diabetes pedigree function were independently associated with diabetes, with glucose demonstrating the strongest effect (aOR = 1.037; 95% CI: 1.027–1.047). The reduced model demonstrated excellent discrimination (AUC = 0.863), comparable to the full multivariable model (AUC = 0.862), while achieving improved model parsimony (AIC = 356.9). Conclusion: The study shows that glucose, BMI, and family history are the strongest predictors of diabetes and a reduced model can effectively support targeted screening in high-risk populations.
Program Description
.
Start Date
4-23-2026 10:00 AM
End Date
4-23-2026 12:00 PM
Recommended Citation
Andrew, Caroline and Aliu, Ibrahim, "Predictors of Diabetes Among Adult Pima Indian Women: A Multivariable and Stepwise Logistic Regression Analysis" (2026). GS4 Student Scholars Symposium. 80.
https://digitalcommons.georgiasouthern.edu/research_symposium/2026/2026/80
Predictors of Diabetes Among Adult Pima Indian Women: A Multivariable and Stepwise Logistic Regression Analysis
Russell Union Ballroom
Introduction: Globally, Diabetes affects an estimated 589 million adults and caused approximately 3.4 million deaths equivalent to one death every six seconds, with type 2 diabetes accounting for most cases. The Pima Indians of Arizona have one of the highest documented prevalence rates of type 2 diabetes globally, making their population central to diabetes research. Although the Pima Diabetes Dataset has been widely used in predictive and machine-learning studies, few investigations have applied rigorous multivariable epidemiologic methods to quantify the independent contributions of key demographic and metabolic risk factors. This study aimed to characterize diabetes risk factors and quantify independent predictors among Pima Indian women using multivariable and stepwise logistic regression. Methods: A cross-sectional analysis was conducted using data from 768 adult Pima Indian women in the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Pima Diabetes Dataset. After data cleaning, descriptive statistics, correlation analyses, and multicollinearity diagnostics were performed. A full multivariable logistic regression model including eight predictors was fitted. Stepwise logistic regression (SLENTRY = 0.15; SLSTAY = 0.15) was applied to derive a parsimonious model. Model performance was evaluated using Akaike Information Criterion (AIC), Hosmer–Lemeshow goodness-of-fit testing, and area under the receiver operating characteristic curve (AUC). Results: Five variables were retained in the final stepwise model: glucose, BMI, diabetes pedigree function, age, and pregnancies. Glucose, BMI, and diabetes pedigree function were independently associated with diabetes, with glucose demonstrating the strongest effect (aOR = 1.037; 95% CI: 1.027–1.047). The reduced model demonstrated excellent discrimination (AUC = 0.863), comparable to the full multivariable model (AUC = 0.862), while achieving improved model parsimony (AIC = 356.9). Conclusion: The study shows that glucose, BMI, and family history are the strongest predictors of diabetes and a reduced model can effectively support targeted screening in high-risk populations.