Shrink the longest: improving latent space isotropy with symplicial geometry
- URL: http://arxiv.org/abs/2501.05502v1
- Date: Thu, 09 Jan 2025 18:44:10 GMT
- Title: Shrink the longest: improving latent space isotropy with symplicial geometry
- Authors: Sergei Kudriashov, Olesya Karpik, Eduard Klyshinsky,
- Abstract summary: We propose a novel regularization technique based on simplicial geometry to improve the isotropy of latent representations.<n>We demonstrate that the method leads to an increase in downstream performance while significantly lowering the anisotropy during fine-tuning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although transformer-based models have been dominating the field of deep learning, various studies of their embedding space have shown that they suffer from "representation degeneration problem": embeddings tend to be distributed in a narrow cone, making the latent space highly anisotropic. Increasing the isotropy has shown to improve performance in downstream tasks both in static and contextual language models. However, most of approaches either add inference overhead or require substantial amount of data for model reparametrization. We propose a novel regularization technique based on simplicial geometry to improve the isotropy of latent representations. The core idea of our method is based on maximizing the persistent entropy of barcodes obtained using Vietoris-Rips filtration from contextual embeddings in the underlying latent space. We demonstrate that the method leads to an increase in downstream performance while significantly lowering the anisotropy during fine-tuning by exploiting existing geometric structures instead of reparametrization.
Related papers
- Entropy-Based Block Pruning for Efficient Large Language Models [81.18339597023187]
We propose an entropy-based pruning strategy to enhance efficiency while maintaining performance.
Empirical analysis reveals that the entropy of hidden representations decreases in the early blocks but progressively increases across most subsequent blocks.
arXiv Detail & Related papers (2025-04-04T03:42:34Z) - Point Cloud Resampling with Learnable Heat Diffusion [58.050130177241186]
We propose a learnable heat diffusion framework for point cloud resampling.
Unlike previous diffusion models with a fixed prior, the adaptive conditional prior selectively preserves geometric features of the point cloud.
arXiv Detail & Related papers (2024-11-21T13:44:18Z) - On Probabilistic Pullback Metrics on Latent Hyperbolic Manifolds [5.724027955589408]
This paper focuses on the hyperbolic manifold, a particularly suitable choice for modeling hierarchical relationships.
We propose augmenting the hyperbolic metric with a pullback metric to account for distortions introduced by theVM's nonlinear mapping.
Through various experiments, we demonstrate that geodesics on the pullback metric not only respect the geometry of the hyperbolic latent space but also align with the underlying data distribution.
arXiv Detail & Related papers (2024-10-28T09:13:00Z) - Variational autoencoders with latent high-dimensional steady geometric flows for dynamics [0.0]
We develop approaches to variational autoencoders (VAEs) for PDE-type ambient data with regularizing geometric latent dynamics.<n>We redevelop the VAE framework such that manifold geometries, subject to our geometric flow, are learned in the intermediary latent space developed by encoders and decoders.<n>We demonstrate, on our datasets of interest, our methods perform at least as well as the traditional VAE, and oftentimes better.
arXiv Detail & Related papers (2024-10-14T04:07:45Z) - A Slices Perspective for Incremental Nonparametric Inference in High Dimensional State Spaces [25.16567521220103]
We introduce an innovative method for incremental nonparametric probabilistic inference in high-dimensional state spaces.
Our approach leverages slices from high-dimensional surfaces to efficiently approximate posterior distributions of any shape.
arXiv Detail & Related papers (2024-05-26T06:52:56Z) - Hyperbolic Geometric Latent Diffusion Model for Graph Generation [27.567428462212455]
Diffusion models have made significant contributions to computer vision, sparking a growing interest in the community recently regarding the application of them to graph generation.
In this paper, we propose a novel geometrically latent diffusion framework HypDiff.
Specifically, we first establish a geometrically latent space with interpretability measures based on hyperbolic geometry, to define anisotropic latent diffusion processes for graphs.
Then, we propose a geometrically latent diffusion process that is constrained by both radial and angular geometric properties, thereby ensuring the preservation of the original topological properties in the generative graphs.
arXiv Detail & Related papers (2024-05-06T06:28:44Z) - Scaling Riemannian Diffusion Models [68.52820280448991]
We show that our method enables us to scale to high dimensional tasks on nontrivial manifold.
We model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres.
arXiv Detail & Related papers (2023-10-30T21:27:53Z) - Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes [57.396578974401734]
We introduce a principled framework for building a generative diffusion process on general manifold.
Instead of following the denoising approach of previous diffusion models, we construct a diffusion process using a mixture of bridge processes.
We develop a geometric understanding of the mixture process, deriving the drift as a weighted mean of tangent directions to the data points.
arXiv Detail & Related papers (2023-10-11T06:04:40Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse
Problems [64.29491112653905]
We propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods.
Specifically, we prove that if tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG with the denoised data ensures the data consistency update to remain in the tangent space.
Our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
arXiv Detail & Related papers (2023-03-10T07:42:49Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - How Does Fine-tuning Affect the Geometry of Embedding Space: A Case
Study on Isotropy [18.490856440975996]
We analyze the extent to which the isotropy of the embedding space changes after fine-tuning.
Local structures in pre-trained contextual word representations (CWRs) undergo a massive change during fine-tuning.
arXiv Detail & Related papers (2021-09-10T08:58:59Z) - Intermediate Layer Optimization for Inverse Problems using Deep
Generative Models [86.29330440222199]
ILO is a novel optimization algorithm for solving inverse problems with deep generative models.
We empirically show that our approach outperforms state-of-the-art methods introduced in StyleGAN-2 and PULSE for a wide range of inverse problems.
arXiv Detail & Related papers (2021-02-15T06:52:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.