Principle Component Analysis for Feature Reduction and Data Preprocessing in Data Science
Proceedings of the Conference on Information Systems Applied Research
Medical datasets are large and complex. Due to the number of variables contained within medical data, machine learning algorithms may not be able to induct patterns from the data or may over fit the learned model to the data thereby reducing the generalizability of the model. Feature reduction seeks to limit the number of variables as input by establishing correlations between variables and reducing the overall feature set to the minimum number of possible variables to describe the data. This research seeks to examine the effects of principal component analysis for feature reduction when applied to decision trees. Results indicate that principle component analysis (PCA) may be employed to reduce the number of features; however, the results suffer minor degradation.
Wimmer, Hayden, Loreen Powell.
"Principle Component Analysis for Feature Reduction and Data Preprocessing in Data Science."
Proceedings of the Conference on Information Systems Applied Research, 9 (4257): 1-6 Las Vegas, NV: Information Systems and Computing Academic Professionals).