2022 Symposium Archive

Early-Stage Diabetes Prediction using Apache Spark and MLlib

Chukwudi Nwachukwu, Georgia Southern University

Faculty Mentor

Dr. Hayden Wimmer

Location

Poster 210

Session Format

Poster Presentation

Academic Unit

Department of Information Technology

Background

We develop a diabetes prediction system on Apache Spark.
We utilize the Hadoop Distributed File System (HDFS) to store and retrieve our dataset into Spark. We opted to use PySpark to write Spark commands in Python.
We use the ‘Early-stage diabetes risk prediction dataset’ retrieved from the UCI machine learning repository.
To develop our prediction models, we utilize four machine learning algorithms: Decision Trees, Random Forest, Gradient Boosted Trees and Naïve Bayes.

Keywords

Allen E. Paulson College of Engineering and Computing Student Research Symposium, Hadoop Distributed File System, HDFS

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Presentation Type and Release Option

Presentation (File Not Available for Download)

Start Date

2022 12:00 AM

January 2022

This document is currently not available here.

COinS

Jan 1st, 12:00 AM

Early-Stage Diabetes Prediction using Apache Spark and MLlib

Poster 210

The explosion of the volume of data now being generated has led to the creation of larger and more complex datasets compiled from multiple sources.
Conventional data processing technology cannot manage the size and complexity of these datasets. This has driven the need for big data processing tools that can handle the tremendous workloads more efficiently.

2022 Symposium Archive

Early-Stage Diabetes Prediction using Apache Spark and MLlib

Faculty Mentor

Location

Session Format

Academic Unit

Background

Keywords

Creative Commons License

Presentation Type and Release Option

Start Date

Search GS Commons

Browse GS Commons

About GS Commons

2022 Symposium Archive

Early-Stage Diabetes Prediction using Apache Spark and MLlib

Presenter Information

Faculty Mentor

Location

Session Format

Academic Unit

Background

Keywords

Creative Commons License

Presentation Type and Release Option

Start Date

Share

Search GS Commons

Browse GS Commons

About GS Commons