Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
- URL: http://arxiv.org/abs/2503.03595v1
- Date: Wed, 05 Mar 2025 15:28:50 GMT
- Title: Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
- Authors: Rui Lu, Runzhe Wang, Kaifeng Lyu, Xitai Jiang, Gao Huang, Mengdi Wang,
- Abstract summary: This paper focuses on textual hallucinations, where diffusion models correctly generate individual symbols but assemble them in a nonsensical manner.<n>We observe that such phenomenon is attributed it to the network's local generation bias.<n>We also theoretically analyze the training dynamics for a specific case involving a two-layer learning parity points on a hypercube.
- Score: 76.85949078144098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Score-based diffusion models have achieved incredible performance in generating realistic images, audio, and video data. While these models produce high-quality samples with impressive details, they often introduce unrealistic artifacts, such as distorted fingers or hallucinated texts with no meaning. This paper focuses on textual hallucinations, where diffusion models correctly generate individual symbols but assemble them in a nonsensical manner. Through experimental probing, we consistently observe that such phenomenon is attributed it to the network's local generation bias. Denoising networks tend to produce outputs that rely heavily on highly correlated local regions, particularly when different dimensions of the data distribution are nearly pairwise independent. This behavior leads to a generation process that decomposes the global distribution into separate, independent distributions for each symbol, ultimately failing to capture the global structure, including underlying grammar. Intriguingly, this bias persists across various denoising network architectures including MLP and transformers which have the structure to model global dependency. These findings also provide insights into understanding other types of hallucinations, extending beyond text, as a result of implicit biases in the denoising models. Additionally, we theoretically analyze the training dynamics for a specific case involving a two-layer MLP learning parity points on a hypercube, offering an explanation of its underlying mechanism.
Related papers
- Generalization through variance: how noise shapes inductive biases in diffusion models [0.0]
We develop a mathematical theory that partly explains 'generalization through variance' phenomenon.
We find that the distributions diffusion models effectively learn to sample from resemble their training distributions.
We also characterize how this inductive bias interacts with feature-related inductive biases.
arXiv Detail & Related papers (2025-04-16T23:41:10Z) - Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.
We derive the theoretical backbone of a family of general interpolating discrete diffusion processes.
Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise.
arXiv Detail & Related papers (2025-03-06T14:30:55Z) - Compositional Generalization Requires More Than Disentangled Representations [5.762286612061953]
compositional generalization remains a key challenge for deep learning.
Many generative models fail to recognize and compose factors to generate out-of-distribution (OOD) samples.
We show that models forced-through architectural modifications with regularization or curated training data-can be highly data-efficient and effective at learning to compose in OOD regions.
arXiv Detail & Related papers (2025-01-30T23:20:41Z) - Generalized Diffusion Model with Adjusted Offset Noise [1.7767466724342067]
We propose a generalized diffusion model that naturally incorporates additional noise within a rigorous probabilistic framework.<n>We derive a loss function based on the evidence lower bound, establishing its theoretical equivalence to offset noise with certain adjustments.<n>Experiments on synthetic datasets demonstrate that our model effectively addresses brightness-related challenges and outperforms conventional methods in high-dimensional scenarios.
arXiv Detail & Related papers (2024-12-04T08:57:03Z) - Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure [8.320632531909682]
We study the generalizability of diffusion models by looking into the hidden properties of the learned score functions.<n>As diffusion models transition from memorization to generalization, their corresponding nonlinear diffusion denoisers exhibit increasing linearity.
arXiv Detail & Related papers (2024-10-31T15:57:04Z) - Understanding Hallucinations in Diffusion Models through Mode Interpolation [89.10226585746848]
We study a particular failure mode in diffusion models, which we term mode mode.
We find that diffusion models smoothly "interpolate" between nearby data modes in the training set, to generate samples that are completely outside the support of the original training distribution.
We show how hallucination leads to the generation of combinations of shapes that never existed.
arXiv Detail & Related papers (2024-06-13T17:43:41Z) - Analyzing Bias in Diffusion-based Face Generation Models [75.80072686374564]
Diffusion models are increasingly popular in synthetic data generation and image editing applications.
We investigate the presence of bias in diffusion-based face generation models with respect to attributes such as gender, race, and age.
We examine how dataset size affects the attribute composition and perceptual quality of both diffusion and Generative Adversarial Network (GAN) based face generation models.
arXiv Detail & Related papers (2023-05-10T18:22:31Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z) - Image Embedding for Denoising Generative Models [0.0]
We focus on Denoising Diffusion Implicit Models due to the deterministic nature of their reverse diffusion process.
As a side result of our investigation, we gain a deeper insight into the structure of the latent space of diffusion models.
arXiv Detail & Related papers (2022-12-30T17:56:07Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.