Self-Representation Based Unsupervised Exemplar Selection in a Union of
Subspaces
- URL: http://arxiv.org/abs/2006.04246v1
- Date: Sun, 7 Jun 2020 19:43:33 GMT
- Title: Self-Representation Based Unsupervised Exemplar Selection in a Union of
Subspaces
- Authors: Chong You, Chi Li, Daniel P. Robinson, Rene Vidal
- Abstract summary: We present a new exemplar selection model that searches for a subset that best reconstructs all data points as measured by the $ell_1$ norm of the representation coefficients.
When the dataset is drawn from a union of independent subspaces, our method is able to select sufficiently many representatives from each subspace.
We also develop an exemplar based subspace clustering method that is robust to imbalanced data and efficient for large scale data.
- Score: 27.22427926657327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding a small set of representatives from an unlabeled dataset is a core
problem in a broad range of applications such as dataset summarization and
information extraction. Classical exemplar selection methods such as
$k$-medoids work under the assumption that the data points are close to a few
cluster centroids, and cannot handle the case where data lie close to a union
of subspaces. This paper proposes a new exemplar selection model that searches
for a subset that best reconstructs all data points as measured by the $\ell_1$
norm of the representation coefficients. Geometrically, this subset best covers
all the data points as measured by the Minkowski functional of the subset. To
solve our model efficiently, we introduce a farthest first search algorithm
that iteratively selects the worst represented point as an exemplar. When the
dataset is drawn from a union of independent subspaces, our method is able to
select sufficiently many representatives from each subspace. We further develop
an exemplar based subspace clustering method that is robust to imbalanced data
and efficient for large scale data. Moreover, we show that a classifier trained
on the selected exemplars (when they are labeled) can correctly classify the
rest of the data points.
Related papers
- Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - A Consistent and Scalable Algorithm for Best Subset Selection in Single
Index Models [1.3236116985407258]
Best subset selection in high-dimensional models is known to be computationally intractable.
We propose the first provably scalable algorithm for best subset selection in high-dimensional SIMs.
Our algorithm enjoys the subset selection consistency and has the oracle property with a high probability.
arXiv Detail & Related papers (2023-09-12T13:48:06Z) - Inv-SENnet: Invariant Self Expression Network for clustering under
biased data [17.25929452126843]
We propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces.
Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-11-13T01:19:06Z) - Revisiting data augmentation for subspace clustering [21.737226432466496]
Subspace clustering is the classical problem of clustering a collection of data samples around several low-dimensional subspaces.
We argue that data distribution within each subspace plays a critical role in the success of self-expressive models.
We propose two subspace clustering frameworks for both unsupervised and semi-supervised settings.
arXiv Detail & Related papers (2022-07-20T08:13:08Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Adaptive Graph-based Generalized Regression Model for Unsupervised
Feature Selection [11.214334712819396]
How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection.
We present a novel generalized regression model imposed by an uncorrelated constraint and the $ell_2,1$-norm regularization.
It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood.
arXiv Detail & Related papers (2020-12-27T09:07:26Z) - Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on
Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models.
We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z) - Stochastic Sparse Subspace Clustering [20.30051592270384]
State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points.
We introduce dropout to address the issue of over-segmentation, which is based on randomly dropping out data points.
This leads to a scalable and flexible sparse subspace clustering approach, termed Sparse Subspace Clustering.
arXiv Detail & Related papers (2020-05-04T13:09:17Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.