Hot Deck Imputation for MixedTyped Datasets using Model Based Clustering

Document Type


Publication Date



Multiple imputation is a commonly used method when addressing the issue of missing values. Hot deck imputation is distinctively different than others to ensure closeness to true variance in estimating the regression coefficients as it involves the replacement of unobserved values by observed values in similar units or cells. These cells are determined in terms of the closeness of each observation using various distance measures. But most of the distance measures can only be applied to continuous variables. Thus, there is a distinct problem when there are categorical covariates in the dataset. We proposed for a model based clustering procedure that uses a parsimonious covariance structure of the latent variable, following a mixture of Gaussian distributions to generate the imputation cells of mixed type dataset (i.e. datasets with continuous and categorical variables). The results of the simulated data showed demonstrated lower variance compared to the complete cases in estimation of regression coefficients.


Eastern North American Region International Biometric Society Spring Meeting (ENAR)


Washington, DC