Related papers: Learnable Subspace Clustering

Learnable Subspace Clustering

URL: http://arxiv.org/abs/2004.04520v1
Date: Thu, 9 Apr 2020 12:53:28 GMT
Title: Learnable Subspace Clustering
Authors: Jun Li, Hongfu Liu, Zhiqiang Tao, Handong Zhao, and Yun Fu
Abstract summary: We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
Score: 76.2352740039615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the large-scale subspace clustering (LSSC) problem with million data points. Many popular subspace clustering methods cannot directly handle the LSSC problem although they have been considered as state-of-the-art methods for small-scale data points. A basic reason is that these methods often choose all data points as a big dictionary to build huge coding models, which results in a high time and space complexity. In this paper, we develop a learnable subspace clustering paradigm to efficiently solve the LSSC problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces instead of the expensive costs of the classical coding models. Moreover, we propose a unified robust predictive coding machine (RPCM) to learn the parametric function, which can be solved by an alternating minimization algorithm. In addition, we provide a bounded contraction analysis of the parametric function. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods. Experiments on million-scale datasets verify that our paradigm outperforms the related state-of-the-art methods in both efficiency and effectiveness.

Related papers

An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z)
Line Space Clustering (LSC): Feature-Based Clustering using K-medians and Dynamic Time Warping for Versatility [0.0]
Line Space Clustering (LSC) is a representation that transforms data points into lines in a newly defined feature space. LSC employs a combined distance metric that uses Euclidean and Dynamic Time Warping (DTW) distances, weighted by a parameter alpha experiments demonstrate the efficacy of LSC on synthetic and real-world datasets.
arXiv Detail & Related papers (2025-03-20T01:27:10Z)
Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS) We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution. To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z)
A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Clustering based on Mixtures of Sparse Gaussian Processes [6.939768185086753]
How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning. In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction. Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC)
arXiv Detail & Related papers (2023-03-23T20:44:36Z)
GBMST: An Efficient Minimum Spanning Tree Clustering Based on Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST) We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority" Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z)
An Instance Selection Algorithm for Big Data in High imbalanced datasets based on LSH [0.0]
Training Machine Learning models in real contexts often deals with big data sets and imbalance samples where the class of interest is unrepresented. This work proposes three new methods for instance selection (IS) to be able to deal with large and imbalanced data sets. Algorithms were developed in the Apache Spark framework, guaranteeing their scalability.
arXiv Detail & Related papers (2022-10-09T17:38:41Z)
Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue. We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues. Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z)
Enriched Robust Multi-View Kernel Subspace Clustering [5.770309971945476]
Subspace clustering is to find underlying low-dimensional subspaces and cluster the data points correctly. Most existing methods suffer from two critical issues. We propose a novel multi-view subspace clustering method.
arXiv Detail & Related papers (2022-05-21T03:06:24Z)
A Deep Learning Object Detection Method for an Efficient Clusters Initialization [6.365889364810239]
Clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines. Existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters. This paper proposes a solution that can provide near-optimal clustering parameters with low computational and resources overhead.
arXiv Detail & Related papers (2021-04-28T08:34:25Z)
Overcomplete Deep Subspace Clustering Networks [80.16644725886968]
Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error. Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise.
arXiv Detail & Related papers (2020-11-16T22:07:18Z)
An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering [2.5182813818441945]
Recently introduced convex clustering approach formulates clustering as a convex optimization problem. State-of-the-art convex clustering algorithms require large computation and memory space. In this paper, we develop a very efficient smoothing gradient algorithm (Sproga) for convex clustering.
arXiv Detail & Related papers (2020-06-22T20:02:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.