Big Cyber Security Data Analysis with Apache Mahou

Document Type

Conference Proceeding

Publication Date


Publication Title

IEEE/ACIS 20th International Conference on Software Engineering Research, Management and Applications (SERA) Proceedings




Machine learning classifiers are known algorithms used to classify network intrusion detection due to the drastic growth of data, new tools are being required to handle such a large amount of data within a short time frame. In this Paper, we present a Model using the Apache Mahout Framework to train machine learning classifiers Random Forest (RF), Logistic Regression (LR), and Naïve Bayes (NB) on CSE-CIC-IDS2018 dataset using Chi-Square and ANOVA f-test filter-based feature selection technique on an Apache Hadoop Framework. The performance of classifiers is measured in terms of Accuracy, Kappa, Precision, Recall, and F1- Score for a comparative analysis of the various machine learning classifiers.


Georgia Southern University faculty member, Hayden Wimmer and Jongyeop Kim co-authored Big Cyber Security Data Analysis with Apache Mahou.