Disentangled representations via score-based variational autoencoders
- URL: http://arxiv.org/abs/2512.17127v1
- Date: Thu, 18 Dec 2025 23:42:10 GMT
- Title: Disentangled representations via score-based variational autoencoders
- Authors: Benjamin S. H. Lyo, Eero P. Simoncelli, Cristina Savin,
- Abstract summary: We present the Score-based Autoencoder for Multiscale Inference (SAMI)<n>SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process.<n>It can extract useful representations from pre-trained diffusion models with minimal additional training.
- Score: 21.955536401578616
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present the Score-based Autoencoder for Multiscale Inference (SAMI), a method for unsupervised representation learning that combines the theoretical frameworks of diffusion models and VAEs. By unifying their respective evidence lower bounds, SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process. The resulting representations automatically capture meaningful structure in the data: it recovers ground truth generative factors in our synthetic dataset, learns factorized, semantic latent dimensions from complex natural images, and encodes video sequences into latent trajectories that are straighter than those of alternative encoders, despite training exclusively on static images. Furthermore, SAMI can extract useful representations from pre-trained diffusion models with minimal additional training. Finally, the explicitly probabilistic formulation provides new ways to identify semantically meaningful axes in the absence of supervised labels, and its mathematical exactness allows us to make formal statements about the nature of the learned representation. Overall, these results indicate that implicit structural information in diffusion models can be made explicit and interpretable through synergistic combination with a variational autoencoder.
Related papers
- Expanding the Role of Diffusion Models for Robust Classifier Training [14.409018677904571]
We show that diffusion models offer representations that are both diverse and partially robust.<n>Our representation analysis indicates that incorporating diffusion models into adversarial training encourages more disentangled features.<n>Experiments on CIFAR-10, CIFAR-100, and ImageNet validate these findings.
arXiv Detail & Related papers (2026-02-23T15:06:52Z) - Training-Free Representation Guidance for Diffusion Models with a Representation Alignment Projector [14.027059904924135]
We introduce a representation alignment projector that injects representations predicted by a projector into intermediate sampling steps.<n>Experiments on SiTs and REPAs show notable improvements in class-conditional ImageNet synthesis.<n>The proposed method outperforms representative guidance when applied to SiT models.
arXiv Detail & Related papers (2026-01-30T02:29:54Z) - Generalization of Diffusion Models Arises with a Balanced Representation Space [32.68561555837436]
We analyze the distinctions between memorization and generalization in diffusion models through the lens of representation learning.<n>We show that memorization corresponds to the model storing raw training samples in the learned weights for encoding and decoding, yielding localized "spiky" representations.<n>We propose a representation-based method for detecting memorization and a training-free editing technique that allows precise control via representation steering.
arXiv Detail & Related papers (2025-12-24T05:40:40Z) - Automated Learning of Semantic Embedding Representations for Diffusion Models [1.688134675717698]
We employ a multi-level denoising autoencoder framework to expand the representation capacity of denoising diffusion models.<n>Our work justifies that DDMs are not only suitable for generative tasks, but also potentially advantageous for general-purpose deep learning applications.
arXiv Detail & Related papers (2025-05-09T02:10:46Z) - MASCOTS: Model-Agnostic Symbolic COunterfactual explanations for Time Series [4.664512594743523]
We introduce MASCOTS, a method that generates meaningful and diverse counterfactual observations in a model-agnostic manner.<n>By operating in a symbolic feature space, MASCOTS enhances interpretability while preserving fidelity to the original data and model.
arXiv Detail & Related papers (2025-03-28T12:48:12Z) - Graph Representation Learning with Diffusion Generative Models [0.0]
We train a discrete diffusion model within an autoencoder framework to learn meaningful embeddings for graph data.<n>We extract the representation from the combination of the encoder's output and the decoder's first time step hidden embedding.<n>Our approach demonstrates the potential of discrete diffusion models to be used for graph representation learning.
arXiv Detail & Related papers (2025-01-22T07:12:10Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Information Symmetry Matters: A Modal-Alternating Propagation Network
for Few-Shot Learning [118.45388912229494]
We propose a Modal-Alternating Propagation Network (MAP-Net) to supplement the absent semantic information of unlabeled samples.
We design a Relation Guidance (RG) strategy to guide the visual relation vectors via semantics so that the propagated information is more beneficial.
Our proposed method achieves promising performance and outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2021-09-03T03:43:53Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.