Learnable Subspace Clustering
- URL: http://arxiv.org/abs/2004.04520v1
- Date: Thu, 9 Apr 2020 12:53:28 GMT
- Title: Learnable Subspace Clustering
- Authors: Jun Li, Hongfu Liu, Zhiqiang Tao, Handong Zhao, and Yun Fu
- Abstract summary: We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
- Score: 76.2352740039615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the large-scale subspace clustering (LSSC) problem with
million data points. Many popular subspace clustering methods cannot directly
handle the LSSC problem although they have been considered as state-of-the-art
methods for small-scale data points. A basic reason is that these methods often
choose all data points as a big dictionary to build huge coding models, which
results in a high time and space complexity. In this paper, we develop a
learnable subspace clustering paradigm to efficiently solve the LSSC problem.
The key idea is to learn a parametric function to partition the
high-dimensional subspaces into their underlying low-dimensional subspaces
instead of the expensive costs of the classical coding models. Moreover, we
propose a unified robust predictive coding machine (RPCM) to learn the
parametric function, which can be solved by an alternating minimization
algorithm. In addition, we provide a bounded contraction analysis of the
parametric function. To the best of our knowledge, this paper is the first work
to efficiently cluster millions of data points among the subspace clustering
methods. Experiments on million-scale datasets verify that our paradigm
outperforms the related state-of-the-art methods in both efficiency and
effectiveness.
Related papers
- Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Clustering based on Mixtures of Sparse Gaussian Processes [6.939768185086753]
How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning.
In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction.
Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC)
arXiv Detail & Related papers (2023-03-23T20:44:36Z) - GBMST: An Efficient Minimum Spanning Tree Clustering Based on
Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST)
We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority"
Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z) - An Instance Selection Algorithm for Big Data in High imbalanced datasets
based on LSH [0.0]
Training Machine Learning models in real contexts often deals with big data sets and imbalance samples where the class of interest is unrepresented.
This work proposes three new methods for instance selection (IS) to be able to deal with large and imbalanced data sets.
Algorithms were developed in the Apache Spark framework, guaranteeing their scalability.
arXiv Detail & Related papers (2022-10-09T17:38:41Z) - Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue.
We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues.
Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z) - Enriched Robust Multi-View Kernel Subspace Clustering [5.770309971945476]
Subspace clustering is to find underlying low-dimensional subspaces and cluster the data points correctly.
Most existing methods suffer from two critical issues.
We propose a novel multi-view subspace clustering method.
arXiv Detail & Related papers (2022-05-21T03:06:24Z) - A Deep Learning Object Detection Method for an Efficient Clusters
Initialization [6.365889364810239]
Clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines.
Existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters.
This paper proposes a solution that can provide near-optimal clustering parameters with low computational and resources overhead.
arXiv Detail & Related papers (2021-04-28T08:34:25Z) - Overcomplete Deep Subspace Clustering Networks [80.16644725886968]
Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error.
Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise.
arXiv Detail & Related papers (2020-11-16T22:07:18Z) - An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering [2.5182813818441945]
Recently introduced convex clustering approach formulates clustering as a convex optimization problem.
State-of-the-art convex clustering algorithms require large computation and memory space.
In this paper, we develop a very efficient smoothing gradient algorithm (Sproga) for convex clustering.
arXiv Detail & Related papers (2020-06-22T20:02:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.