Related papers: Understanding Variational Autoencoders with Intrinsic Dimension and Information Imbalance

Understanding Variational Autoencoders with Intrinsic Dimension and Information Imbalance

URL: http://arxiv.org/abs/2411.01978v1
Date: Mon, 04 Nov 2024 10:58:41 GMT
Title: Understanding Variational Autoencoders with Intrinsic Dimension and Information Imbalance
Authors: Charles Camboulin, Diego Doimo, Aldo Glielmo,
Abstract summary: This work presents an analysis of the hidden representations of Variational Autoencoders (VAEs) using the Intrinsic Dimension (ID) and the Information Imbalance (II) We show that VAEs undergo a transition in behaviour once the bottleneck size is larger than the ID of the data, manifesting in a double hunchback ID profile and a qualitative shift in information processing as captured by the II.
Score: 2.7446241148152257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work presents an analysis of the hidden representations of Variational Autoencoders (VAEs) using the Intrinsic Dimension (ID) and the Information Imbalance (II). We show that VAEs undergo a transition in behaviour once the bottleneck size is larger than the ID of the data, manifesting in a double hunchback ID profile and a qualitative shift in information processing as captured by the II. Our results also highlight two distinct training phases for architectures with sufficiently large bottleneck sizes, consisting of a rapid fit and a slower generalisation, as assessed by a differentiated behaviour of ID, II, and KL loss. These insights demonstrate that II and ID could be valuable tools for aiding architecture search, for diagnosing underfitting in VAEs, and, more broadly, they contribute to advancing a unified understanding of deep generative models through geometric analysis.

Related papers

Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence [0.0]
CRID is a cross-modal framework combining Large Vision-Language Models, Graph Attention Networks, and representation learning.<n>Our approach focuses on identifying and leveraging interpretable features, enabling the detection of semantically meaningful PII beyond low-level appearance cues.<n>Our experiments show improved performance in practical cross-dataset Re-ID scenarios.
arXiv Detail & Related papers (2025-07-02T09:10:33Z)
Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging [0.9524546889479364]
In medical image analysis, feature engineering plays an important role in the design and performance of machine learning models.<n>We store persistent topological and geometrical features in the form of the persistence barcode.<n>We compare the effects of two approaches on the performance of classification models.
arXiv Detail & Related papers (2025-05-29T16:45:33Z)
Measuring Intrinsic Dimension of Token Embeddings [0.13108652488669734]
We estimate the ID of token embeddings in small-scale language models and also modern large language models. We observe an increase in redundancy rates as the model scale grows. When LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs.
arXiv Detail & Related papers (2025-03-04T00:19:01Z)
Explainable AI for Multivariate Time Series Pattern Exploration: Latent Space Visual Analytics with Temporal Fusion Transformer and Variational Autoencoders in Power Grid Event Diagnosis [1.170167705525779]
This paper proposes a novel visual analytics framework that integrates two generative AI models, Temporal Fusion Transformer (TFT) and Variational Autoencoders (VAEs) It reduces complex patterns into lower-dimensional latent spaces and visualizes them in 2D using dimensionality reduction techniques such as PCA, t-SNE, and UMAP with DBSCAN. The framework is demonstrated through a case study on power grid signal data, where it identifies multi-label grid event signatures, including faults and anomalies with diverse root causes.
arXiv Detail & Related papers (2024-12-20T17:41:11Z)
It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment [72.75844404617959]
This paper proposes a novel cross-granularity alignment gait recognition method, named XGait. To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces. Comprehensive experiments on two large-scale gait datasets show XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG.
arXiv Detail & Related papers (2024-11-16T08:54:27Z)
Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z)
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture. We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z)
Diffusion Bridge AutoEncoders for Unsupervised Representation Learning [10.74555302283403]
We introduce Diffusion Bridge AuteEncoders (DBAE), which enable z-dependent endpoint xT inference through a feed-forward architecture. We propose an objective function for DBAE to enable both reconstruction and generative modeling, with their theoretical justification.
arXiv Detail & Related papers (2024-05-27T12:28:17Z)
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising. We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z)
DAGnosis: Localized Identification of Data Inconsistencies using Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure. Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z)
UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders. We first develop an adaptive feature mask generator to account for the unique significance of nodes. We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z)
Supervision Adaptation Balancing In-distribution Generalization and Out-of-distribution Detection [36.66825830101456]
In-distribution (ID) and out-of-distribution (OOD) samples can lead to textitdistributional vulnerability in deep neural networks. We introduce a novel textitsupervision adaptation approach to generate adaptive supervision information for OOD samples, making them more compatible with ID samples.
arXiv Detail & Related papers (2022-06-19T11:16:44Z)
Image-based Automated Species Identification: Can Virtual Data Augmentation Overcome Problems of Insufficient Sampling? [0.0]
We present a two-level data augmentation approach to automated visual species identification. The first level of data augmentation applies classic approaches of data augmentation and generation of faked images. The second level of data augmentation employs synthetic additional sampling in feature space by an oversampling algorithm in vector space.
arXiv Detail & Related papers (2020-10-18T15:44:45Z)
Longitudinal Variational Autoencoder [1.4680035572775534]
A common approach to analyse high-dimensional data that contains missing values is to learn a low-dimensional representation using variational autoencoders (VAEs) Standard VAEs assume that the learnt representations are i.i.d., and fail to capture the correlations between the data samples. We propose the Longitudinal VAE (L-VAE), that uses a multi-output additive Gaussian process (GP) prior to extend the VAE's capability to learn structured low-dimensional representations. Our approach can simultaneously accommodate both time-varying shared and random effects, produce structured low-dimensional representations
arXiv Detail & Related papers (2020-06-17T10:30:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.