Scaling Properties of Diffusion Models for Perceptual Tasks
- URL: http://arxiv.org/abs/2411.08034v3
- Date: Sun, 17 Nov 2024 03:45:07 GMT
- Title: Scaling Properties of Diffusion Models for Perceptual Tasks
- Authors: Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik,
- Abstract summary: We show how diffusion models benefit from scaling training and test-time compute for perceptual tasks.
Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute.
- Score: 39.89262997776814
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. To access our code and models, see https://scaling-diffusion-perception.github.io .
Related papers
- DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks [79.50756148780928]
This paper studies the problem of leveraging pretrained diffusion models for performing discriminative tasks.
We extend the discriminative capability of pretrained frozen generative diffusion models from the classification task to the more complex object detection task, by "inverting" a pretrained layout-to-image diffusion model.
arXiv Detail & Related papers (2025-04-24T05:13:27Z) - Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering [15.326641037243006]
diffusion models can effectively learn the image distribution and generate new samples.
We provide theoretical insights into this phenomenon by leveraging key empirical observations.
We show that the minimal number of samples required to learn the underlying distribution scales linearly with the intrinsic dimensions.
arXiv Detail & Related papers (2024-09-04T04:14:02Z) - An Expectation-Maximization Algorithm for Training Clean Diffusion Models from Corrupted Observations [21.411327264448058]
We propose an expectation-maximization (EM) approach to train diffusion models from corrupted observations.
Our method alternates between reconstructing clean images from corrupted data using a known diffusion model (E-step) and refining diffusion model weights based on these reconstructions (M-step)
This iterative process leads the learned diffusion model to gradually converge to the true clean data distribution.
arXiv Detail & Related papers (2024-07-01T07:00:17Z) - Plug-and-Play Diffusion Distillation [14.359953671470242]
We propose a new distillation approach for guided diffusion models.
An external lightweight guide model is trained while the original text-to-image model remains frozen.
We show that our method reduces the inference of classifier-free guided latent-space diffusion models by almost half.
arXiv Detail & Related papers (2024-06-04T04:22:47Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - The Surprising Effectiveness of Diffusion Models for Optical Flow and
Monocular Depth Estimation [42.48819460873482]
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.
We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions.
arXiv Detail & Related papers (2023-06-02T21:26:20Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - Exploring Continual Learning of Diffusion Models [24.061072903897664]
We evaluate the continual learning (CL) properties of diffusion models.
We provide insights into the dynamics of forgetting, which exhibit diverse behavior across diffusion timesteps.
arXiv Detail & Related papers (2023-03-27T15:52:14Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.