Open vs. Close Source Decision Tree Algorithms: Comparing Performance Measures of Accuracy, Sensitivity and Specificity

Document Type


Publication Date


Publication Title

Proceedings of the CONISAR




Data Science research is trending due the abundance of publicly available data and open source and close (proprietary) tools available. Currently, an abundant amount of research exists on various data science techniques, tools and mining of medical data and big data. However, there is little to nonexistent research, which actually compares closed and open source algorithms. This research compared a closed source algorithm (Microsoft Decision Tree ) with open source algorithms (CART and C4.5) performances for accuracy, sensitivity, and specificity using data form the U.S. government’s Surveillance, Epidemiology, and End Results Program (SEERS). Data was downloaded, converted from raw data to structured data using a custom designed python script and transformed via the removal of missing and irrelevant data, and outliers. Predictive modeling results for accuracy, sensitivity, and specificity, indicated that closed algorithms have the best accuracy and specificity.