Unsupervised Manifold Linearizing and Clustering
- URL: http://arxiv.org/abs/2301.01805v2
- Date: Thu, 24 Aug 2023 06:28:02 GMT
- Title: Unsupervised Manifold Linearizing and Clustering
- Authors: Tianjiao Ding, Shengbang Tong, Kwan Ho Ryan Chan, Xili Dai, Yi Ma,
Benjamin D. Haeffele
- Abstract summary: We propose to optimize the Maximal Coding Reduction metric with respect to both the data representation and a novel doubly cluster membership.
Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods.
- Score: 19.879641608165887
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of simultaneously clustering and learning a linear
representation of data lying close to a union of low-dimensional manifolds, a
fundamental task in machine learning and computer vision. When the manifolds
are assumed to be linear subspaces, this reduces to the classical problem of
subspace clustering, which has been studied extensively over the past two
decades. Unfortunately, many real-world datasets such as natural images can not
be well approximated by linear subspaces. On the other hand, numerous works
have attempted to learn an appropriate transformation of the data, such that
data is mapped from a union of general non-linear manifolds to a union of
linear subspaces (with points from the same manifold being mapped to the same
subspace). However, many existing works have limitations such as assuming
knowledge of the membership of samples to clusters, requiring high sampling
density, or being shown theoretically to learn trivial representations. In this
paper, we propose to optimize the Maximal Coding Rate Reduction metric with
respect to both the data representation and a novel doubly stochastic cluster
membership, inspired by state-of-the-art subspace clustering results. We give a
parameterization of such a representation and membership, allowing efficient
mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100,
and TinyImageNet-200 datasets show that the proposed method is much more
accurate and scalable than state-of-the-art deep clustering methods, and
further learns a latent linear representation of the data.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Adaptively-weighted Integral Space for Fast Multiview Clustering [54.177846260063966]
We propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity.
Specifically, view generation models are designed to reconstruct the view observations from the latent integral space.
Experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method.
arXiv Detail & Related papers (2022-08-25T05:47:39Z) - Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels.
We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z) - Revisiting data augmentation for subspace clustering [21.737226432466496]
Subspace clustering is the classical problem of clustering a collection of data samples around several low-dimensional subspaces.
We argue that data distribution within each subspace plays a critical role in the success of self-expressive models.
We propose two subspace clustering frameworks for both unsupervised and semi-supervised settings.
arXiv Detail & Related papers (2022-07-20T08:13:08Z) - Enriched Robust Multi-View Kernel Subspace Clustering [5.770309971945476]
Subspace clustering is to find underlying low-dimensional subspaces and cluster the data points correctly.
Most existing methods suffer from two critical issues.
We propose a novel multi-view subspace clustering method.
arXiv Detail & Related papers (2022-05-21T03:06:24Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - A Critique of Self-Expressive Deep Subspace Clustering [23.971512395191308]
Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces.
We show that there are a number of potential flaws with this approach which have not been adequately addressed in prior work.
arXiv Detail & Related papers (2020-10-08T00:14:59Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Stochastic Sparse Subspace Clustering [20.30051592270384]
State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points.
We introduce dropout to address the issue of over-segmentation, which is based on randomly dropping out data points.
This leads to a scalable and flexible sparse subspace clustering approach, termed Sparse Subspace Clustering.
arXiv Detail & Related papers (2020-05-04T13:09:17Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.