Term of Award

Fall 2022

Degree Name

Master of Science, Information Technology

Document Type and Release Option

Thesis (open access)

Copyright Statement / License for Reuse

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Department of Information Technology

Committee Chair

Hayden Wimmer

Committee Member 1

Atef Mohamed

Committee Member 2

Dejarvis Oliver

Abstract

Machine Learning and Cloud Computing have become a staple to businesses and educational institutions over the recent years. The two forefronts of big data solutions have garnered technology giants to race for the superior implementation of both Machine Learning and Cloud Computing. The objective of this thesis is to test and utilize AWS SageMaker in three different applications: time-series forecasting with sentiment analysis, automated Machine Learning (AutoML), and finally anomaly detection. The first study covered is a sentiment-based LSTM for stock price prediction. The LSTM was created with two methods, the first being SQL Server Data Tools, and the second being an implementation of LSTM using the Keras library. These results were then evaluated using accuracy, precision, recall, f-1 score, mean absolute error (MAE), root mean squared error (RMSE), and symmetric mean absolute percentage error (SMAPE). The results of this project were that the sentiment models all outperformed the control LSTM. The public model for Facebook on SQL Server Data Tools performed the best overall with 0.9743 accuracy and 0.9940 precision. The second study covered is an application of AWS SageMaker AutoPilot which is an AutoML platform designed to make Machine Learning more accessible to those without programming backgrounds. The methodology of this study follows the application of AWS Data Wrangler and AutoPilot from beginning of the process to completion. The results were evaluated using the metrics of: accuracy, precision, recall, and f-1 score. The best accuracy is given to the LightGBM model on the AI4I Maintenance dataset with an accuracy of 0.983. This model also scored the best on precision, recall, and F1 Score. The final study covered is an anomaly detection system for cyber security intrusion detection system data. The Intrusion Detection Systems that have been rule based are able to catch most of the cyber threats that are prevalent in network traffic; however, the copious amounts of alerts are nearly impossible for humans to keep up with. The methodology of this study follows a typical taxonomy of: data collection, data processing, model creation, and model evaluation. Both Random Cut Forest and XGBoost are implemented using AWS SageMaker. The Supervised Learning Algorithm of XGBoost was able to have the highest accuracy of all models with Model 2 giving an accuracy of 0.6183. This model also showed a Precision of 0.5902, Recall of 0.9649, and F1 Score 0.7324.

OCLC Number

1361745661

Research Data and Supplementary Material

No

Share

COinS