Principle Component Analysis for Feature Reduction and Data Preprocessing in Data Science
Document Type
Contribution to Book
Publication Date
2016
Publication Title
Proceedings of the Conference on Information Systems Applied Research
ISSN
2167-1508
Abstract
Medical datasets are large and complex. Due to the number of variables contained within medical data, machine learning algorithms may not be able to induct patterns from the data or may over fit the learned model to the data thereby reducing the generalizability of the model. Feature reduction seeks to limit the number of variables as input by establishing correlations between variables and reducing the overall feature set to the minimum number of possible variables to describe the data. This research seeks to examine the effects of principal component analysis for feature reduction when applied to decision trees. Results indicate that principle component analysis (PCA) may be employed to reduce the number of features; however, the results suffer minor degradation.
Recommended Citation
Wimmer, Hayden, Loreen Powell.
2016.
"Principle Component Analysis for Feature Reduction and Data Preprocessing in Data Science."
Proceedings of the Conference on Information Systems Applied Research, 9 (4257): 1-6 Las Vegas, NV: Information Systems and Computing Academic Professionals).
source: http://proc.conisar.org/2016/pdf/4257.pdf
https://digitalcommons.georgiasouthern.edu/information-tech-facpubs/50