How compositional generalization and creativity improve as diffusion models are trained
- URL: http://arxiv.org/abs/2502.12089v1
- Date: Mon, 17 Feb 2025 18:06:33 GMT
- Title: How compositional generalization and creativity improve as diffusion models are trained
- Authors: Alessandro Favero, Antonio Sclocchi, Francesco Cagnetta, Pascal Frossard, Matthieu Wyart,
- Abstract summary: How many samples do generative models need to learn the composition rules, so as to produce a number of novel data?
We consider diffusion models trained on simple context-free grammars - tree-like graphical models used to represent the structure of data such as language and images.
We demonstrate that diffusion models learn compositional rules with the sample complexity required for clustering features with statistically similar context, a process similar to the word2vec.
- Score: 82.08869888944324
- License:
- Abstract: Natural data is often organized as a hierarchical composition of features. How many samples do generative models need to learn the composition rules, so as to produce a combinatorial number of novel data? What signal in the data is exploited to learn? We investigate these questions both theoretically and empirically. Theoretically, we consider diffusion models trained on simple probabilistic context-free grammars - tree-like graphical models used to represent the structure of data such as language and images. We demonstrate that diffusion models learn compositional rules with the sample complexity required for clustering features with statistically similar context, a process similar to the word2vec algorithm. However, this clustering emerges hierarchically: higher-level, more abstract features associated with longer contexts require more data to be identified. This mechanism leads to a sample complexity that scales polynomially with the said context size. As a result, diffusion models trained on intermediate dataset size generate data coherent up to a certain scale, but that lacks global coherence. We test these predictions in different domains, and find remarkable agreement: both generated texts and images achieve progressively larger coherence lengths as the training time or dataset size grows. We discuss connections between the hierarchical clustering mechanism we introduce here and the renormalization group in physics.
Related papers
- Dissecting embedding method: learning higher-order structures from data [0.0]
Geometric deep learning methods for data learning often include set of assumptions on the geometry of the feature space.
These assumptions together with data being discrete and finite can cause some generalisations, which are likely to create wrong interpretations of the data and models outputs.
arXiv Detail & Related papers (2024-10-14T08:19:39Z) - IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation [70.8833857249951]
IterComp is a novel framework that aggregates composition-aware model preferences from multiple models.
We propose an iterative feedback learning method to enhance compositionality in a closed-loop manner.
IterComp opens new research avenues in reward feedback learning for diffusion models and compositional generation.
arXiv Detail & Related papers (2024-10-09T17:59:13Z) - ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models [65.82630283336051]
We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models.
We present a simple fix to this problem by constructing processes that fully exploit the structures, hence the name ComboStoc.
arXiv Detail & Related papers (2024-05-22T15:23:10Z) - A Phase Transition in Diffusion Models Reveals the Hierarchical Nature of Data [51.03144354630136]
Recent advancements show that diffusion models can generate high-quality images.
We study this phenomenon in a hierarchical generative model of data.
We find that the backward diffusion process acting after a time $t$ is governed by a phase transition.
arXiv Detail & Related papers (2024-02-26T19:52:33Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model [47.617093812158366]
We introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images.
We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups.
Our results indicate how deep networks overcome the curse of dimensionality by building invariant representations.
arXiv Detail & Related papers (2023-07-05T09:11:09Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.