Term of Award

Spring 2019

Degree Name

Master of Science, Computer Science (M.S.C.S.)

Document Type and Release Option

Thesis (open access)

Copyright Statement / License for Reuse

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


Department of Computer Science

Committee Chair

Ashraf Saad

Committee Member 1

Hong Zhang

Committee Member 2

Amar Rasheed

Committee Member 3

Murali Medidi

Committee Member 3 Email



Regular classification of data includes a training set and test set. For example for Naïve Bayes, Artificial Neural Networks, and Support Vector Machines, each classifier employs the whole training set to train itself. This thesis will explore the possibility of using a condensed form of the training set in order to get a comparable classification accuracy. The technique explored in this thesis will use a clustering algorithm to explore with data records can be labeled as exemplar, or a quality of multiple records. For example, is it possible to compress say 50 records into one single record? Can a single record represent all 50 records and train a classifier similarly? This thesis aims to explore the idea of what can label a data record as exemplar, what are the concepts that extract the qualities of a dataset, and how to check the information gain of one set of compressed data over another set of compressed data. This thesis will explore using Affinity Propagation, categorical data, exploring entropy within cluster sets, and testing the compressed data using Cosine Similarity as a classifier.

Research Data and Supplementary Material