GroupEnc: encoder with group loss for global structure preservation
- URL: http://arxiv.org/abs/2309.02917v1
- Date: Wed, 6 Sep 2023 11:22:21 GMT
- Title: GroupEnc: encoder with group loss for global structure preservation
- Authors: David Novak, Sofie Van Gassen, Yvan Saeys
- Abstract summary: We use the notion of structure preservation at both local and global levels to create a deep learning model.
Our model, called GroupEnc, uses a 'group loss' function to create embeddings with less global structure distortion than VAEs.
We validate our approach using publicly available biological single-cell transcriptomic datasets.
- Score: 1.8523441396284195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in dimensionality reduction have achieved more accurate
lower-dimensional embeddings of high-dimensional data. In addition to
visualisation purposes, these embeddings can be used for downstream processing,
including batch effect normalisation, clustering, community detection or
trajectory inference. We use the notion of structure preservation at both local
and global levels to create a deep learning model, based on a variational
autoencoder (VAE) and the stochastic quartet loss from the SQuadMDS algorithm.
Our encoder model, called GroupEnc, uses a 'group loss' function to create
embeddings with less global structure distortion than VAEs do, while keeping
the model parametric and the architecture flexible. We validate our approach
using publicly available biological single-cell transcriptomic datasets,
employing RNX curves for evaluation.
Related papers
- Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning [49.1574468325115]
We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm.
The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm.
When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.
arXiv Detail & Related papers (2025-01-13T22:30:38Z) - ClusterGraph: a new tool for visualization and compression of multidimensional data [0.0]
This paper provides an additional layer on the output of any clustering algorithm.
It provides information about the global layout of clusters, obtained from the considered clustering algorithm.
arXiv Detail & Related papers (2024-11-08T09:40:54Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Scalable manifold learning by uniform landmark sampling and constrained
locally linear embedding [0.6144680854063939]
We propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner.
We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types.
scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure.
arXiv Detail & Related papers (2024-01-02T08:43:06Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - On the convergence of group-sparse autoencoders [9.393652136001732]
We introduce and study a group-sparse autoencoder that accounts for a variety of generative models.
For clustering models, inputs that result in the same group of active units belong to the same cluster.
In this setting, we theoretically prove the convergence of the network parameters to a neighborhood of the generating matrix.
arXiv Detail & Related papers (2021-02-13T21:17:07Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Joint Optimization of an Autoencoder for Clustering and Embedding [22.16059261437617]
We present an alternative where the autoencoder and the clustering are learned simultaneously.
That simple neural network, referred to as the clustering module, can be integrated into a deep autoencoder resulting in a deep clustering model.
arXiv Detail & Related papers (2020-12-07T14:38:10Z) - Self-grouping Convolutional Neural Networks [30.732298624941738]
We propose a novel method of designing self-grouping convolutional neural networks, called SG-CNN.
For each filter, we first evaluate the importance value of their input channels to identify the importance vectors.
Using the resulting emphdata-dependent centroids, we prune the less important connections, which implicitly minimizes the accuracy loss of the pruning.
arXiv Detail & Related papers (2020-09-29T06:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.