Stochastic Sparse Subspace Clustering
- URL: http://arxiv.org/abs/2005.01449v1
- Date: Mon, 4 May 2020 13:09:17 GMT
- Title: Stochastic Sparse Subspace Clustering
- Authors: Ying Chen, Chun-Guang Li, and Chong You
- Abstract summary: State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points.
We introduce dropout to address the issue of over-segmentation, which is based on randomly dropping out data points.
This leads to a scalable and flexible sparse subspace clustering approach, termed Sparse Subspace Clustering.
- Score: 20.30051592270384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art subspace clustering methods are based on self-expressive
model, which represents each data point as a linear combination of other data
points. By enforcing such representation to be sparse, sparse subspace
clustering is guaranteed to produce a subspace-preserving data affinity where
two points are connected only if they are from the same subspace. On the other
hand, however, data points from the same subspace may not be well-connected,
leading to the issue of over-segmentation. We introduce dropout to address the
issue of over-segmentation, which is based on randomly dropping out data points
in self-expressive model. In particular, we show that dropout is equivalent to
adding a squared $\ell_2$ norm regularization on the representation
coefficients, therefore induces denser solutions. Then, we reformulate the
optimization problem as a consensus problem over a set of small-scale
subproblems. This leads to a scalable and flexible sparse subspace clustering
approach, termed Stochastic Sparse Subspace Clustering, which can effectively
handle large scale datasets. Extensive experiments on synthetic data and real
world datasets validate the efficiency and effectiveness of our proposal.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Inv-SENnet: Invariant Self Expression Network for clustering under
biased data [17.25929452126843]
We propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces.
Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-11-13T01:19:06Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Revisiting data augmentation for subspace clustering [21.737226432466496]
Subspace clustering is the classical problem of clustering a collection of data samples around several low-dimensional subspaces.
We argue that data distribution within each subspace plays a critical role in the success of self-expressive models.
We propose two subspace clustering frameworks for both unsupervised and semi-supervised settings.
arXiv Detail & Related papers (2022-07-20T08:13:08Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model.
We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z) - Overcomplete Deep Subspace Clustering Networks [80.16644725886968]
Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error.
Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise.
arXiv Detail & Related papers (2020-11-16T22:07:18Z) - A Critique of Self-Expressive Deep Subspace Clustering [23.971512395191308]
Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces.
We show that there are a number of potential flaws with this approach which have not been adequately addressed in prior work.
arXiv Detail & Related papers (2020-10-08T00:14:59Z) - Self-Representation Based Unsupervised Exemplar Selection in a Union of
Subspaces [27.22427926657327]
We present a new exemplar selection model that searches for a subset that best reconstructs all data points as measured by the $ell_1$ norm of the representation coefficients.
When the dataset is drawn from a union of independent subspaces, our method is able to select sufficiently many representatives from each subspace.
We also develop an exemplar based subspace clustering method that is robust to imbalanced data and efficient for large scale data.
arXiv Detail & Related papers (2020-06-07T19:43:33Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.