Related papers: Elucidating the representation of images within an unconditional diffusion model denoiser

Elucidating the representation of images within an unconditional diffusion model denoiser

URL: http://arxiv.org/abs/2506.01912v1
Date: Mon, 02 Jun 2025 17:33:34 GMT
Title: Elucidating the representation of images within an unconditional diffusion model denoiser
Authors: Zahra Kadkhodaie, Stéphane Mallat, Eero Simoncelli,
Abstract summary: Generative diffusion models learn probability densities over diverse image datasets by estimating the score with a neural network trained to remove noise.<n>Here, we examine a UNet trained for denoising on the ImageNet dataset, to better understand its internal representation and computation of the score.<n>We show that the middle block of the UNet decomposes individual images into sparse subsets of active channels, and that the vector of spatial averages of these channels can provide a nonlinear representation of the underlying clean images.
Score: 10.853652149844999
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative diffusion models learn probability densities over diverse image datasets by estimating the score with a neural network trained to remove noise. Despite their remarkable success in generating high-quality images, the internal mechanisms of the underlying score networks are not well understood. Here, we examine a UNet trained for denoising on the ImageNet dataset, to better understand its internal representation and computation of the score. We show that the middle block of the UNet decomposes individual images into sparse subsets of active channels, and that the vector of spatial averages of these channels can provide a nonlinear representation of the underlying clean images. We develop a novel algorithm for stochastic reconstruction of images from this representation and demonstrate that it recovers a sample from a set of images defined by a target image representation. We then study the properties of the representation and demonstrate that Euclidean distances in the latent space correspond to distances between conditional densities induced by representations as well as semantic similarities in the image space. Applying a clustering algorithm in the representation space yields groups of images that share both fine details (e.g., specialized features, textured regions, small objects), as well as global structure, but are only partially aligned with object identities. Thus, we show for the first time that a network trained solely on denoising contains a rich and accessible sparse representation of images.

Related papers

UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data. We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images. We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z)
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models [52.3015009878545]
We develop an image segmentor capable of generating fine-grained segmentation maps without any additional training. Our framework identifies semantic correspondences between image pixels and spatial locations of low-dimensional feature maps. In extensive experiments, the produced segmentation maps are demonstrated to be well delineated and capture detailed parts of the images.
arXiv Detail & Related papers (2024-01-22T07:34:06Z)
Image segmentation with traveling waves in an exactly solvable recurrent neural network [71.74150501418039]
We show that a recurrent neural network can effectively divide an image into groups according to a scene's structural characteristics. We present a precise description of the mechanism underlying object segmentation in this network. We then demonstrate a simple algorithm for object segmentation that generalizes across inputs ranging from simple geometric objects in grayscale images to natural images.
arXiv Detail & Related papers (2023-11-28T16:46:44Z)
Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation [24.436957604430678]
We develop a neural network architecture which, trained in an unsupervised manner as a denoising diffusion model, simultaneously learns to both generate and segment images. Experiments demonstrate that our model achieves accurate unsupervised image segmentation and high-quality synthetic image generation across multiple datasets.
arXiv Detail & Related papers (2023-09-27T15:32:46Z)
Learning to Annotate Part Segmentation with Gradient Matching [58.100715754135685]
This paper focuses on tackling semi-supervised part segmentation tasks by generating high-quality images with a pre-trained GAN. In particular, we formulate the annotator learning as a learning-to-learn problem. We show that our method can learn annotators from a broad range of labelled images including real images, generated images, and even analytically rendered images.
arXiv Detail & Related papers (2022-11-06T01:29:22Z)
Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network. It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space. By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z)
Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets [18.19216557948184]
Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset. Their main drawback remains the proportion of incorrect (noisy) samples retrieved. We propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere.
arXiv Detail & Related papers (2022-07-04T16:51:56Z)
Self-supervised Contrastive Learning for Cross-domain Hyperspectral Image Representation [26.610588734000316]
This paper introduces a self-supervised learning framework suitable for hyperspectral images that are inherently challenging to annotate. The proposed framework architecture leverages cross-domain CNN, allowing for learning representations from different hyperspectral images. The experimental results demonstrate the advantage of the proposed self-supervised representation over models trained from scratch or other transfer learning methods.
arXiv Detail & Related papers (2022-02-08T16:16:45Z)
Spatially Consistent Representation Learning [12.120041613482558]
We propose a spatially consistent representation learning algorithm (SCRL) for multi-object and location-specific tasks. We devise a novel self-supervised objective that tries to produce coherent spatial representations of a randomly cropped local region. On various downstream localization tasks with benchmark datasets, the proposed SCRL shows significant performance improvements.
arXiv Detail & Related papers (2021-03-10T15:23:45Z)
Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images [98.82804259905478]
We present Neighbor2Neighbor to train an effective image denoising model with only noisy images. In detail, input and target used to train a network are images sub-sampled from the same noisy image. A denoising network is trained on sub-sampled training pairs generated in the first stage, with a proposed regularizer as additional loss for better performance.
arXiv Detail & Related papers (2021-01-08T02:03:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.