Generalization of Diffusion Models Arises with a Balanced Representation Space
- URL: http://arxiv.org/abs/2512.20963v1
- Date: Wed, 24 Dec 2025 05:40:40 GMT
- Title: Generalization of Diffusion Models Arises with a Balanced Representation Space
- Authors: Zekai Zhang, Xiao Li, Xiang Li, Lianghe Shi, Meng Wu, Molei Tao, Qing Qu,
- Abstract summary: We analyze the distinctions between memorization and generalization in diffusion models through the lens of representation learning.<n>We show that memorization corresponds to the model storing raw training samples in the learned weights for encoding and decoding, yielding localized "spiky" representations.<n>We propose a representation-based method for detecting memorization and a training-free editing technique that allows precise control via representation steering.
- Score: 32.68561555837436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models excel at generating high-quality, diverse samples, yet they risk memorizing training data when overfit to the training objective. We analyze the distinctions between memorization and generalization in diffusion models through the lens of representation learning. By investigating a two-layer ReLU denoising autoencoder (DAE), we prove that (i) memorization corresponds to the model storing raw training samples in the learned weights for encoding and decoding, yielding localized "spiky" representations, whereas (ii) generalization arises when the model captures local data statistics, producing "balanced" representations. Furthermore, we validate these theoretical findings on real-world unconditional and text-to-image diffusion models, demonstrating that the same representation structures emerge in deep generative models with significant practical implications. Building on these insights, we propose a representation-based method for detecting memorization and a training-free editing technique that allows precise control via representation steering. Together, our results highlight that learning good representations is central to novel and meaningful generative modeling.
Related papers
- Disentangled representations via score-based variational autoencoders [21.955536401578616]
We present the Score-based Autoencoder for Multiscale Inference (SAMI)<n>SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process.<n>It can extract useful representations from pre-trained diffusion models with minimal additional training.
arXiv Detail & Related papers (2025-12-18T23:42:10Z) - GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation [51.95701097588426]
We introduce a Global Perspective Tokenizer (GloTok) to model a more uniform semantic distribution of tokenized features.<n>A residual learning module is proposed to recover the fine-grained details to minimize the reconstruction error caused by quantization.<n>Experiments on the standard ImageNet-1k benchmark clearly show that our proposed method achieves state-of-the-art reconstruction performance and generation quality.
arXiv Detail & Related papers (2025-11-18T06:40:26Z) - Learning Diffusion Models with Flexible Representation Guidance [49.26046407886349]
We present a systematic framework for incorporating representation guidance into diffusion models.<n>We introduce two new strategies for enhancing representation alignment in diffusion models.<n>Experiments across image, protein sequence, and molecule generation tasks demonstrate superior performance as well as accelerated training.
arXiv Detail & Related papers (2025-07-11T19:29:02Z) - Diffusion models under low-noise regime [3.729242965449096]
We show that diffusion models are effective denoisers when the corruption level is small.<n>We quantify how training set size, data geometry, and model objective choice shape denoising trajectories.<n>This work starts to address gaps in our understanding of generative model reliability in practical applications.
arXiv Detail & Related papers (2025-06-09T15:07:16Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling [29.612011138019255]
We study the emergence of unimodal representation dynamics in diffusion models.<n>The unimodality arises from an interplay between denoising strength and class confidence across noise scales.<n>In classification tasks, the presence of unimodal dynamics reliably reflects the generalization of the diffusion model.
arXiv Detail & Related papers (2025-02-09T01:58:28Z) - On the Feature Learning in Diffusion Models [26.53807235141923]
We propose a feature learning framework aimed at analyzing and comparing the training dynamics of diffusion models with those of traditional classification models.<n>Our theoretical analysis demonstrates that diffusion models, due to the denoising objective, are encouraged to learn more balanced and comprehensive representations of the data.<n>In contrast, neural networks with a similar architecture trained for classification tend to prioritize learning specific patterns in the data, often focusing on easy-to-learn components.
arXiv Detail & Related papers (2024-12-02T00:41:25Z) - Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering [15.326641037243006]
diffusion models can effectively learn the image distribution and generate new samples.<n>We provide theoretical insights into this phenomenon by leveraging key empirical observations.<n>We show that the minimal number of samples required to learn the underlying distribution scales linearly with the intrinsic dimensions.
arXiv Detail & Related papers (2024-09-04T04:14:02Z) - The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.