Property Preservation in Reduction of Data Volume for Mining: A Neighborhood System Approach

Document Type


Publication Date



This presentation was given at the Third International Conference on Data Analytics (DATA ANALYTICS 2014).

The sheer volume of the very large datasets is the major obstacle in mining of the data because the size of the dataset is above the handling abilities of the traditional methodologies. A considerable vertical reduction over and beyond the reduction prescribed by pre-mining processes is needed to overcome the problem. However, the reduced version of the dataset ought to preserve the intrinsic properties of the original dataset in reference to a specific mining goal (a robust reduction); otherwise, it is a useless reduction. This research effort introduces and investigates the neighborhood system as a robust data volume reduction methodology in reference to the mining goal of “prediction”. Two well-known prediction algorithms of ID3 and Rough Sets are employed to determine the perseveration of intrinsic properties in the reduced datasets. The results obtained from 10 pairs of training and test sets revealed that the proposed reduction methodology is a robust one and it also reduces noise in data which in turn improves the prediction outcomes. The average percentage measures of: (i) the correct prediction increases by 26%, (ii) the false positive decreases by 36%, (iii) the false negative decreases by 89%, and (iv) the unpredictable objects increases by 136% which is the indicative of a reliable system. Prediction of no decision for an object is always preferred over prediction of a false positive or a false negative decision. The neighborhood-based reduction system also increases the granularity of the dataset which is different from the increase in the granularity through the use of a generalization process.


Third International Conference on Data Analytics (DATA ANALYTICS 2014)


Rome, Italy