Related papers: Locality in Image Diffusion Models Emerges from Data Statistics

Locality in Image Diffusion Models Emerges from Data Statistics

URL: http://arxiv.org/abs/2509.09672v2
Date: Thu, 30 Oct 2025 17:40:53 GMT
Title: Locality in Image Diffusion Models Emerges from Data Statistics
Authors: Artem Lukoianov, Chenyang Yuan, Justin Solomon, Vincent Sitzmann,
Abstract summary: Recent work has shown that the generalization ability of image diffusion models arises from the locality properties of the trained neural network.<n>We present evidence that the locality in deep diffusion models emerges as a statistical property of the image dataset.<n>We show, both theoretically and experimentally, that this locality arises directly from pixel correlations present in the image datasets.
Score: 19.257597016636844
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent work has shown that the generalization ability of image diffusion models arises from the locality properties of the trained neural network. In particular, when denoising a particular pixel, the model relies on a limited neighborhood of the input image around that pixel, which, according to the previous work, is tightly related to the ability of these models to produce novel images. Since locality is central to generalization, it is crucial to understand why diffusion models learn local behavior in the first place, as well as the factors that govern the properties of locality patterns. In this work, we present evidence that the locality in deep diffusion models emerges as a statistical property of the image dataset and is not due to the inductive bias of convolutional neural networks, as suggested in previous work. Specifically, we demonstrate that an optimal parametric linear denoiser exhibits similar locality properties to deep neural denoisers. We show, both theoretically and experimentally, that this locality arises directly from pixel correlations present in the image datasets. Moreover, locality patterns are drastically different on specialized datasets, approximating principal components of the data's covariance. We use these insights to craft an analytical denoiser that better matches scores predicted by a deep diffusion model than prior expert-crafted alternatives. Our key takeaway is that while neural network architectures influence generation quality, their primary role is to capture locality patterns inherent in the data.

Related papers

Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias [76.85949078144098]
This paper focuses on textual hallucinations, where diffusion models correctly generate individual symbols but assemble them in a nonsensical manner.<n>We observe that such phenomenon is attributed it to the network's local generation bias.<n>We also theoretically analyze the training dynamics for a specific case involving a two-layer learning parity points on a hypercube.
arXiv Detail & Related papers (2025-03-05T15:28:50Z)
Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering [15.326641037243006]
diffusion models can effectively learn the image distribution and generate new samples.<n>We provide theoretical insights into this phenomenon by leveraging key empirical observations.<n>We show that the minimal number of samples required to learn the underlying distribution scales linearly with the intrinsic dimensions.
arXiv Detail & Related papers (2024-09-04T04:14:02Z)
Mitigating Bias Using Model-Agnostic Data Attribution [1.477005743355395]
Mitigating bias in machine learning models is a critical endeavor for ensuring fairness and equity.<n>We propose a novel approach to address bias by leveraging pixel image attributions to identify and regularize regions of images containing bias attributes.
arXiv Detail & Related papers (2024-05-08T13:00:56Z)
A Phase Transition in Diffusion Models Reveals the Hierarchical Nature of Data [51.03144354630136]
Recent advancements show that diffusion models can generate high-quality images.<n>We study this phenomenon in a hierarchical generative model of data.<n>We find that the backward diffusion process acting after a time $t$ is governed by a phase transition.
arXiv Detail & Related papers (2024-02-26T19:52:33Z)
Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization [45.72323731094864]
We present a theoretical framework to analyze two-layer neural network-based diffusion models. We prove that training shallow neural networks for score prediction can be done by solving a single convex program. Our results provide a precise characterization of what neural network-based diffusion models learn in non-asymptotic settings.
arXiv Detail & Related papers (2024-02-03T00:20:25Z)
Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models [76.46246743508651]
We show that current diffusion models actually have an expressive bottleneck in backward denoising. We introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising.
arXiv Detail & Related papers (2023-09-25T12:03:32Z)
Diffusion Models are Minimax Optimal Distribution Estimators [49.47503258639454]
We provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling. We show that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates.
arXiv Detail & Related papers (2023-03-03T11:31:55Z)
CRADL: Contrastive Representations for Unsupervised Anomaly Detection and Localization [2.8659934481869715]
Unsupervised anomaly detection in medical imaging aims to detect and localize arbitrary anomalies without requiring anomalous data during training. Most current state-of-the-art methods use latent variable generative models operating directly on the images. We propose CRADL whose core idea is to model the distribution of normal samples directly in the low-dimensional representation space of an encoder trained with a contrastive pretext-task.
arXiv Detail & Related papers (2023-01-05T16:07:49Z)
Image Embedding for Denoising Generative Models [0.0]
We focus on Denoising Diffusion Implicit Models due to the deterministic nature of their reverse diffusion process. As a side result of our investigation, we gain a deeper insight into the structure of the latent space of diffusion models.
arXiv Detail & Related papers (2022-12-30T17:56:07Z)
DeepDC: Deep Distance Correlation as a Perceptual Image Quality Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models. We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features. We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z)
Sampling Based On Natural Image Statistics Improves Local Surrogate Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction. We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z)
How Much is Enough? A Study on Diffusion Times in Score-based Generative Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution. We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Multi-Branch Deep Radial Basis Function Networks for Facial Emotion Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units. RBF units capture local patterns shared by similar instances using an intermediate representation. We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z)
Anomaly localization by modeling perceptual features [3.04585143845864]
Feature-Augmented VAE is trained by reconstructing the input image in pixel space, and also in several different feature spaces. It achieves clear improvement over state-of-the-art methods on the MVTec anomaly detection and localization datasets.
arXiv Detail & Related papers (2020-08-12T15:09:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.