Related papers: GroupEnc: encoder with group loss for global structure preservation

GroupEnc: encoder with group loss for global structure preservation

URL: http://arxiv.org/abs/2309.02917v1
Date: Wed, 6 Sep 2023 11:22:21 GMT
Title: GroupEnc: encoder with group loss for global structure preservation
Authors: David Novak, Sofie Van Gassen, Yvan Saeys
Abstract summary: We use the notion of structure preservation at both local and global levels to create a deep learning model. Our model, called GroupEnc, uses a 'group loss' function to create embeddings with less global structure distortion than VAEs. We validate our approach using publicly available biological single-cell transcriptomic datasets.
Score: 1.8523441396284195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in dimensionality reduction have achieved more accurate lower-dimensional embeddings of high-dimensional data. In addition to visualisation purposes, these embeddings can be used for downstream processing, including batch effect normalisation, clustering, community detection or trajectory inference. We use the notion of structure preservation at both local and global levels to create a deep learning model, based on a variational autoencoder (VAE) and the stochastic quartet loss from the SQuadMDS algorithm. Our encoder model, called GroupEnc, uses a 'group loss' function to create embeddings with less global structure distortion than VAEs do, while keeping the model parametric and the architecture flexible. We validate our approach using publicly available biological single-cell transcriptomic datasets, employing RNX curves for evaluation.

Related papers

Scalable Robust Bayesian Co-Clustering with Compositional ELBOs [2.6756996523251964]
Co-clustering exploits the duality of instances and features to uncover meaningful groups in both dimensions. We present the first fully variational Co-clustering framework that directly learns row and column clusters in the latent space. Our method not only preserves the advantages of prior Co-clustering approaches but also exceeds them in accuracy and robustness.
arXiv Detail & Related papers (2025-04-05T06:48:05Z)
Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning [49.1574468325115]
We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm. The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm. When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.
arXiv Detail & Related papers (2025-01-13T22:30:38Z)
ClusterGraph: a new tool for visualization and compression of multidimensional data [0.0]
This paper provides an additional layer on the output of any clustering algorithm. It provides information about the global layout of clusters, obtained from the considered clustering algorithm.
arXiv Detail & Related papers (2024-11-08T09:40:54Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Scalable manifold learning by uniform landmark sampling and constrained locally linear embedding [0.6144680854063939]
We propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner. We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types. scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure.
arXiv Detail & Related papers (2024-01-02T08:43:06Z)
Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z)
Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations. DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals. We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z)
Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations. We study how data heterogeneity affects the representations of the globally aggregated models. We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z)
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes. It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes. We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z)
On the convergence of group-sparse autoencoders [9.393652136001732]
We introduce and study a group-sparse autoencoder that accounts for a variety of generative models. For clustering models, inputs that result in the same group of active units belong to the same cluster. In this setting, we theoretically prove the convergence of the network parameters to a neighborhood of the generating matrix.
arXiv Detail & Related papers (2021-02-13T21:17:07Z)
Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain. We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one. Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z)
Joint Optimization of an Autoencoder for Clustering and Embedding [22.16059261437617]
We present an alternative where the autoencoder and the clustering are learned simultaneously. That simple neural network, referred to as the clustering module, can be integrated into a deep autoencoder resulting in a deep clustering model.
arXiv Detail & Related papers (2020-12-07T14:38:10Z)
Self-grouping Convolutional Neural Networks [30.732298624941738]
We propose a novel method of designing self-grouping convolutional neural networks, called SG-CNN. For each filter, we first evaluate the importance value of their input channels to identify the importance vectors. Using the resulting emphdata-dependent centroids, we prune the less important connections, which implicitly minimizes the accuracy loss of the pruning.
arXiv Detail & Related papers (2020-09-29T06:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.