Learning of low-dimensional geometric structure in high-dimensional data
Professor David Dunson
Arts and Sciences Distinguished Professor of Statistical Science,
Mathematics and Electrical & Computer Engineering
Duke University
ABSTRACT: High-dimensional data are collected in an amazing variety of application areas. As the sample size is often modest relative to the dimension of the data, dimensionality reduction is needed. For example, PCA reduces p-dimensional data to coordinates on a d-dimensional subspace, with d<<p. There is a literature on non-linear extensions of PCA; such approaches are commonly referred to as “manifold learning”. Manifold learning usually relies on local linearity, necessitating a large number of pieces to obtain accurate approximations of highly curved subspaces. We propose a broad new dictionary for approximating manifolds based on pieces of spheres or spherelets. We provide theory showing dramatic reductions in covering numbers needed to produce a particular small MSE relative to locally linear methods. We develop a simple spherical PCA algorithm for implementation, and show very substantial gains in performance on toy and non-toy examples. A novel supervised nearest neighbor algorithm exploiting spherelets obtains state-of-the-art classification performance including relative to deep learning. We additionally develop a Bayesian model that characterizes uncertainty in subspace learning, along with an MCMC algorithm for implementation.
Read More