Interpolating between Clustering and Dimensionality Reduction with
Gromov-Wasserstein
- URL: http://arxiv.org/abs/2310.03398v1
- Date: Thu, 5 Oct 2023 09:04:53 GMT
- Title: Interpolating between Clustering and Dimensionality Reduction with
Gromov-Wasserstein
- Authors: Hugues Van Assel, C\'edric Vincent-Cuaz, Titouan Vayer, R\'emi
Flamary, Nicolas Courty
- Abstract summary: Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport problem.
When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering.
- Score: 13.656958543737211
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a versatile adaptation of existing dimensionality reduction (DR)
objectives, enabling the simultaneous reduction of both sample and feature
sizes. Correspondances between input and embedding samples are computed through
a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the
embedding sample size matches that of the input, our model recovers classical
popular DR models. When the embedding's dimensionality is unconstrained, we
show that the OT plan delivers a competitive hard clustering. We emphasize the
importance of intermediate stages that blend DR and clustering for summarizing
real data and apply our method to visualize datasets of images.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Revisiting data augmentation for subspace clustering [21.737226432466496]
Subspace clustering is the classical problem of clustering a collection of data samples around several low-dimensional subspaces.
We argue that data distribution within each subspace plays a critical role in the success of self-expressive models.
We propose two subspace clustering frameworks for both unsupervised and semi-supervised settings.
arXiv Detail & Related papers (2022-07-20T08:13:08Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.