Provable Noisy Sparse Subspace Clustering using Greedy Neighbor
Selection: A Coherence-Based Perspective
- URL: http://arxiv.org/abs/2002.00401v1
- Date: Sun, 2 Feb 2020 14:28:35 GMT
- Title: Provable Noisy Sparse Subspace Clustering using Greedy Neighbor
Selection: A Coherence-Based Perspective
- Authors: Jwo-Yuh Wu, Wen-Hsuan Li, Liang-Chi Huang, Yen-Ping Lin, Chun-Hung Liu
and Rung-Hung Gau
- Abstract summary: We derive coherence-based sufficient conditions guaranteeing correct neighbor identification using MP/OMP.
A striking finding is that, when the ground truth subspaces are well-separated from each other and noise is not large, MP-based iterations, while enjoying lower algorithmic complexity, yield smaller perturbation of residuals.
- Score: 18.888312436971187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse subspace clustering (SSC) using greedy-based neighbor selection, such
as matching pursuit (MP) and orthogonal matching pursuit (OMP), has been known
as a popular computationally-efficient alternative to the conventional
L1-minimization based methods. Under deterministic bounded noise corruption, in
this paper we derive coherence-based sufficient conditions guaranteeing correct
neighbor identification using MP/OMP. Our analyses exploit the maximum/minimum
inner product between two noisy data points subject to a known upper bound on
the noise level. The obtained sufficient condition clearly reveals the impact
of noise on greedy-based neighbor recovery. Specifically, it asserts that, as
long as noise is sufficiently small so that the resultant perturbed residual
vectors stay close to the desired subspace, both MP and OMP succeed in
returning a correct neighbor subset. A striking finding is that, when the
ground truth subspaces are well-separated from each other and noise is not
large, MP-based iterations, while enjoying lower algorithmic complexity, yield
smaller perturbation of residuals, thereby better able to identify correct
neighbors and, in turn, achieving higher global data clustering accuracy.
Extensive numerical experiments are used to corroborate our theoretical study.
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Sketches-based join size estimation under local differential privacy [3.0945730947183203]
Join size estimation on sensitive data poses a risk of privacy leakage.
Local differential privacy (LDP) is a solution to preserve privacy while collecting sensitive data.
We introduce a novel algorithm called LDPJoinSketch for sketch-based join size estimation under LDP.
arXiv Detail & Related papers (2024-05-19T01:21:54Z) - DenMune: Density peak based clustering using mutual nearest neighbors [0.0]
Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other.
A novel clustering algorithm, DenMune is presented to meet this challenge.
It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user.
arXiv Detail & Related papers (2023-09-23T16:18:00Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Multiple Kernel Clustering with Dual Noise Minimization [56.009011016367744]
Multiple kernel clustering (MKC) aims to group data by integrating complementary information from base kernels.
In this paper, we rigorously define dual noise and propose a novel parameter-free MKC algorithm by minimizing them.
We observe that dual noise will pollute the block diagonal structures and incur the degeneration of clustering performance, and C-noise exhibits stronger destruction than N-noise.
arXiv Detail & Related papers (2022-07-13T08:37:42Z) - Greedier is Better: Selecting Multiple Neighbors per Iteration for
Sparse Subspace Clustering [18.888312436971187]
This paper proposes a new SSC scheme using generalized OMP (GOMP)
GOMP involves fewer iterations, thereby enjoying lower algorithmic complexity.
The proposed stopping rule is free from off-line estimation of subspace dimension and noise power.
arXiv Detail & Related papers (2022-04-06T04:20:35Z) - Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced Data [15.030895782548576]
The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning.
The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy.
However, the Tomek-Links under-sampling algorithm only considers the boundary instances that are the nearest neighbors to each other globally and ignores the potential local overlapping instances.
This paper proposes a multi-granularity relabeled under-sampling algorithm (MGRU) which fully considers the local information of the data set in the
arXiv Detail & Related papers (2022-01-11T14:07:55Z) - Non-Local Spatial Propagation Network for Depth Completion [82.60915972250706]
We propose a robust and efficient end-to-end non-local spatial propagation network for depth completion.
The proposed network takes RGB and sparse depth images as inputs and estimates non-local neighbors and their affinities of each pixel.
We show that the proposed algorithm is superior to conventional algorithms in terms of depth completion accuracy and robustness to the mixed-depth problem.
arXiv Detail & Related papers (2020-07-20T12:26:51Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - A Weighted Mutual k-Nearest Neighbour for Classification Mining [4.538870924201896]
kNN is a very effective Instance based learning method, and it is easy to implement.
In this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal of pseudo neighbours from the dataset.
arXiv Detail & Related papers (2020-05-14T18:11:30Z) - Residual-driven Fuzzy C-Means Clustering for Image Segmentation [152.609322951917]
We elaborate on residual-driven Fuzzy C-Means (FCM) for image segmentation.
Built on this framework, we present a weighted $ell_2$-norm fidelity term by weighting mixed noise distribution.
The results demonstrate the superior effectiveness and efficiency of the proposed algorithm over existing FCM-related algorithms.
arXiv Detail & Related papers (2020-04-15T15:46:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.