Explainable Machine Learning Analysis of Malaria–Anemia Co-Occurrence among Under-Five Children Using Nationally Representative Survey Data

Faculty Mentor

Dr. Jing Kersey

Location

Russell Union Ballroom

Type of Research

On-going

Session Format

Poster Presentation

College

Jiann-Ping Hsu College of Public Health

Department

Biostatistics, Epidemiology, and Environmental Health Sciences

Abstract

Background: Malaria and anemia remain major contributors to morbidity and mortality among under-five children in sub-Saharan Africa. Although the association between malaria infection and anemia is well documented, most studies examine these conditions independently using traditional regression approaches that assume linear relationships and limited interaction effects. The joint occurrence of malaria and anemia likely reflects complex, nonlinear interactions among nutritional deprivation, household environment, and socioeconomic vulnerability. This study applies explainable machine learning methods to characterize the co-burden of malaria and anemia and identify high-risk household profiles.

Methods: We analyzed nationally representative data from the Nigeria Malaria Indicator Survey, including laboratory-confirmed malaria test results and hemoglobin-based anemia classification among under-five children. The primary outcome was a four-category co-occurrence variable: (1) neither malaria nor anemia, (2) malaria only, (3) anemia only, and (4) both malaria and anemia. Predictor variables included severe food insecurity probability, water and sanitation indicators, housing materials, indoor air pollution exposure, household crowding, wealth index, and geographic region. After preprocessing (imputation, categorical encoding, and feature scaling), class imbalance was addressed using synthetic minority oversampling. Five classifiers, K-Nearest Neighbors, Logistic Regression, Support Vector Machine, Random Forest, and XGBoost, were trained using a 70:30 stratified split. Model performance was evaluated using accuracy, precision, recall, F1-score, and multiclass area under the curve (AUC). SHapley Additive exPlanations (SHAP) were used to assess global feature importance and nonlinear interactions.

Expected Findings and Implications: We hypothesize that severe food insecurity, poor housing quality, unimproved sanitation, indoor biomass fuel use, and geographic clustering will emerge as key drivers of the malaria-anemia co-burden. By identifying nonlinear interaction patterns and high-risk household profiles, this study aims to inform integrated, targeted interventions addressing infectious and nutritional vulnerabilities in under-five populations.

Program Description

.

Start Date

4-23-2026 2:00 PM

End Date

4-23-2026 4:00 PM

This document is currently not available here.

Share

COinS
 
Apr 23rd, 2:00 PM Apr 23rd, 4:00 PM

Explainable Machine Learning Analysis of Malaria–Anemia Co-Occurrence among Under-Five Children Using Nationally Representative Survey Data

Russell Union Ballroom

Background: Malaria and anemia remain major contributors to morbidity and mortality among under-five children in sub-Saharan Africa. Although the association between malaria infection and anemia is well documented, most studies examine these conditions independently using traditional regression approaches that assume linear relationships and limited interaction effects. The joint occurrence of malaria and anemia likely reflects complex, nonlinear interactions among nutritional deprivation, household environment, and socioeconomic vulnerability. This study applies explainable machine learning methods to characterize the co-burden of malaria and anemia and identify high-risk household profiles.

Methods: We analyzed nationally representative data from the Nigeria Malaria Indicator Survey, including laboratory-confirmed malaria test results and hemoglobin-based anemia classification among under-five children. The primary outcome was a four-category co-occurrence variable: (1) neither malaria nor anemia, (2) malaria only, (3) anemia only, and (4) both malaria and anemia. Predictor variables included severe food insecurity probability, water and sanitation indicators, housing materials, indoor air pollution exposure, household crowding, wealth index, and geographic region. After preprocessing (imputation, categorical encoding, and feature scaling), class imbalance was addressed using synthetic minority oversampling. Five classifiers, K-Nearest Neighbors, Logistic Regression, Support Vector Machine, Random Forest, and XGBoost, were trained using a 70:30 stratified split. Model performance was evaluated using accuracy, precision, recall, F1-score, and multiclass area under the curve (AUC). SHapley Additive exPlanations (SHAP) were used to assess global feature importance and nonlinear interactions.

Expected Findings and Implications: We hypothesize that severe food insecurity, poor housing quality, unimproved sanitation, indoor biomass fuel use, and geographic clustering will emerge as key drivers of the malaria-anemia co-burden. By identifying nonlinear interaction patterns and high-risk household profiles, this study aims to inform integrated, targeted interventions addressing infectious and nutritional vulnerabilities in under-five populations.