Explainable Machine Learning Analysis of Malaria–Anemia Co-Occurrence among Under-Five Children Using Nationally Representative Survey Data
Faculty Mentor
Dr. Jing Kersey
Location
Russell Union Ballroom
Type of Research
On-going
Session Format
Poster Presentation
College
Jiann-Ping Hsu College of Public Health
Department
Biostatistics, Epidemiology, and Environmental Health Sciences
Abstract
Background: Malaria and anemia remain major contributors to morbidity and mortality among under-five children in sub-Saharan Africa. Although the association between malaria infection and anemia is well documented, most studies examine these conditions independently using traditional regression approaches that assume linear relationships and limited interaction effects. The joint occurrence of malaria and anemia likely reflects complex, nonlinear interactions among nutritional deprivation, household environment, and socioeconomic vulnerability. This study applies explainable machine learning methods to characterize the co-burden of malaria and anemia and identify high-risk household profiles.
Methods: We analyzed nationally representative data from the Nigeria Malaria Indicator Survey, including laboratory-confirmed malaria test results and hemoglobin-based anemia classification among under-five children. The primary outcome was a four-category co-occurrence variable: (1) neither malaria nor anemia, (2) malaria only, (3) anemia only, and (4) both malaria and anemia. Predictor variables included severe food insecurity probability, water and sanitation indicators, housing materials, indoor air pollution exposure, household crowding, wealth index, and geographic region. After preprocessing (imputation, categorical encoding, and feature scaling), class imbalance was addressed using synthetic minority oversampling. Five classifiers, K-Nearest Neighbors, Logistic Regression, Support Vector Machine, Random Forest, and XGBoost, were trained using a 70:30 stratified split. Model performance was evaluated using accuracy, precision, recall, F1-score, and multiclass area under the curve (AUC). SHapley Additive exPlanations (SHAP) were used to assess global feature importance and nonlinear interactions.
Expected Findings and Implications: We hypothesize that severe food insecurity, poor housing quality, unimproved sanitation, indoor biomass fuel use, and geographic clustering will emerge as key drivers of the malaria-anemia co-burden. By identifying nonlinear interaction patterns and high-risk household profiles, this study aims to inform integrated, targeted interventions addressing infectious and nutritional vulnerabilities in under-five populations.
Program Description
.
Start Date
4-23-2026 2:00 PM
End Date
4-23-2026 4:00 PM
Recommended Citation
Asifat, Olamide; Soladoye, Afeez; Alliu, Ibrahim; Adebile, Tolulope; Azu, Emmanuel; Das, Keya; and Kersey, Jing, "Explainable Machine Learning Analysis of Malaria–Anemia Co-Occurrence among Under-Five Children Using Nationally Representative Survey Data" (2026). GS4 Student Scholars Symposium. 199.
https://digitalcommons.georgiasouthern.edu/research_symposium/2026/2026/199
Explainable Machine Learning Analysis of Malaria–Anemia Co-Occurrence among Under-Five Children Using Nationally Representative Survey Data
Russell Union Ballroom
Background: Malaria and anemia remain major contributors to morbidity and mortality among under-five children in sub-Saharan Africa. Although the association between malaria infection and anemia is well documented, most studies examine these conditions independently using traditional regression approaches that assume linear relationships and limited interaction effects. The joint occurrence of malaria and anemia likely reflects complex, nonlinear interactions among nutritional deprivation, household environment, and socioeconomic vulnerability. This study applies explainable machine learning methods to characterize the co-burden of malaria and anemia and identify high-risk household profiles.
Methods: We analyzed nationally representative data from the Nigeria Malaria Indicator Survey, including laboratory-confirmed malaria test results and hemoglobin-based anemia classification among under-five children. The primary outcome was a four-category co-occurrence variable: (1) neither malaria nor anemia, (2) malaria only, (3) anemia only, and (4) both malaria and anemia. Predictor variables included severe food insecurity probability, water and sanitation indicators, housing materials, indoor air pollution exposure, household crowding, wealth index, and geographic region. After preprocessing (imputation, categorical encoding, and feature scaling), class imbalance was addressed using synthetic minority oversampling. Five classifiers, K-Nearest Neighbors, Logistic Regression, Support Vector Machine, Random Forest, and XGBoost, were trained using a 70:30 stratified split. Model performance was evaluated using accuracy, precision, recall, F1-score, and multiclass area under the curve (AUC). SHapley Additive exPlanations (SHAP) were used to assess global feature importance and nonlinear interactions.
Expected Findings and Implications: We hypothesize that severe food insecurity, poor housing quality, unimproved sanitation, indoor biomass fuel use, and geographic clustering will emerge as key drivers of the malaria-anemia co-burden. By identifying nonlinear interaction patterns and high-risk household profiles, this study aims to inform integrated, targeted interventions addressing infectious and nutritional vulnerabilities in under-five populations.