Learning Self-Expression Metrics for Scalable and Inductive Subspace
Clustering
- URL: http://arxiv.org/abs/2009.12875v2
- Date: Fri, 18 Dec 2020 00:06:37 GMT
- Title: Learning Self-Expression Metrics for Scalable and Inductive Subspace
Clustering
- Authors: Julian Busch, Evgeniy Faerman, Matthias Schubert and Thomas Seidl
- Abstract summary: Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data.
We propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture.
Our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets.
- Score: 5.587290026368626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Subspace clustering has established itself as a state-of-the-art approach to
clustering high-dimensional data. In particular, methods relying on the
self-expressiveness property have recently proved especially successful.
However, they suffer from two major shortcomings: First, a quadratic-size
coefficient matrix is learned directly, preventing these methods from scaling
beyond small datasets. Secondly, the trained models are transductive and thus
cannot be used to cluster out-of-sample data unseen during training. Instead of
learning self-expression coefficients directly, we propose a novel metric
learning approach to learn instead a subspace affinity function using a siamese
neural network architecture. Consequently, our model benefits from a constant
number of parameters and a constant-size memory footprint, allowing it to scale
to considerably larger datasets. In addition, we can formally show that out
model is still able to exactly recover subspace clusters given an independence
assumption. The siamese architecture in combination with a novel geometric
classifier further makes our model inductive, allowing it to cluster
out-of-sample data. Additionally, non-linear clusters can be detected by simply
adding an auto-encoder module to the architecture. The whole model can then be
trained end-to-end in a self-supervised manner. This work in progress reports
promising preliminary results on the MNIST dataset. In the spirit of
reproducible research, me make all code publicly available. In future work we
plan to investigate several extensions of our model and to expand experimental
evaluation.
Related papers
- Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples.
An autoencoder with self-attention layers and optimal transport refines distributional consistency.
Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws [24.356906682593532]
We study the compute-optimal trade-off between model and training data set sizes for large neural networks.
Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla.
arXiv Detail & Related papers (2022-12-02T18:46:41Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Learning a Self-Expressive Network for Subspace Clustering [15.096251922264281]
We propose a novel framework for subspace clustering, termed Self-Expressive Network (SENet), which employs a properly designed neural network to learn a self-expressive representation of the data.
Our SENet can not only learn the self-expressive coefficients with desired properties on the training data, but also handle out-of-sample data.
In particular, SENet yields highly competitive performance on MNIST, Fashion MNIST and Extended MNIST and state-of-the-art performance on CIFAR-10.
arXiv Detail & Related papers (2021-10-08T18:06:06Z) - UCSL : A Machine Learning Expectation-Maximization framework for
Unsupervised Clustering driven by Supervised Learning [2.133032470368051]
Subtype Discovery consists in finding interpretable and consistent sub-parts of a dataset, which are also relevant to a certain supervised task.
We propose a general Expectation-Maximization ensemble framework entitled UCSL (Unsupervised Clustering driven by Supervised Learning)
Our method is generic, it can integrate any clustering method and can be driven by both binary classification and regression.
arXiv Detail & Related papers (2021-07-05T12:55:13Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Robust Self-Supervised Convolutional Neural Network for Subspace
Clustering and Classification [0.10152838128195464]
This paper proposes the robust formulation of the self-supervised convolutional subspace clustering network ($S2$ConvSCN)
In a truly unsupervised training environment, Robust $S2$ConvSCN outperforms its baseline version by a significant amount for both seen and unseen data on four well-known datasets.
arXiv Detail & Related papers (2020-04-03T16:07:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.