Manifold Learning: Dimensionality Reduction and High Dimensional Data Reconstruction via Dictionary Learning

Document Type

Article

Publication Date

12-5-2016

Publication Title

Neurocomputing

DOI

10.1016/j.neucom.2016.07.045

ISSN

0925-2312

Abstract

Nonlinear dimensionality reduction (DR) algorithms can reveal the intrinsic characteristic of the high dimensional data in a succinct way. However, most of these methods suffer from two problems. First, the incremental dimensionality reduction problem, which means the algorithms cannot compute the embedding of new added data incrementally. Second, the high dimensional data reconstruction problem, which means the algorithms cannot recover the original high dimensional data from the embeddings. Both problems limit the application of the existing DR algorithms. In this paper, a dictionary-based algorithm for manifold learning is proposed to address the problems of incremental dimensionality reduction and high dimensional data reconstruction. In this algorithm, two dictionaries are trained. One is for the manifold in the high dimensional space and the other one is for the embeddings which can be computed by any existing DR method in the low dimensional space. When new data is added, dimensionality reduction and data reconstruction can just be conducted by coding this input data over one dictionary, and then use this code to recover the output data via the other dictionary. The proposed algorithm provides a general framework for manifold learning. It can be integrated into many existing DR algorithms to make them feasible to both incremental dimensionality reduction and high dimensional data reconstruction. The algorithm is efficient due to the closed-form solution for sparse coding and dictionary updating. Furthermore, the proposed algorithm is space-saving because it only needs to store two dictionaries instead of the whole training samples. Experiments conducted on synthetic datasets and real world datasets show that, no matter for incremental dimensionality reduction or high dimensional data reconstruction, the proposed algorithm is accurate and efficient.

Share

COinS