Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2503.08434v3
- Date: Mon, 24 Mar 2025 09:33:20 GMT
- Title: Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models
- Authors: Armando Fortes, Tianyi Wei, Shangchen Zhou, Xingang Pan,
- Abstract summary: Current diffusion models typically rely on prompt engineering to mimic such effects.<n>We propose Bokeh Diffusion, a scene-consistent bokeh control framework.<n>Our approach achieves flexible, lens-like blur control and supports applications such as real image editing via inversion.
- Score: 26.79219274697864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography offers precise control over camera settings to shape visual aesthetics -- such as depth-of-field -- current diffusion models typically rely on prompt engineering to mimic such effects. This approach often results in crude approximations and inadvertently altering the scene content. In this work, we propose Bokeh Diffusion, a scene-consistent bokeh control framework that explicitly conditions a diffusion model on a physical defocus blur parameter. By grounding depth-of-field adjustments, our method preserves the underlying scene structure as the level of blur is varied. To overcome the scarcity of paired real-world images captured under different camera settings, we introduce a hybrid training pipeline that aligns in-the-wild images with synthetic blur augmentations. Extensive experiments demonstrate that our approach not only achieves flexible, lens-like blur control but also supports applications such as real image editing via inversion.
Related papers
- Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes [34.19578921335553]
Reconstructing 3D scenes from a single image is a fundamentally ill-posed task due to the severely under-constrained nature of the problem.
In this work, we address these inherent limitations in existing single image-to-3D scene feedforward networks.
To alleviate the poor performance due to insufficient information beyond the input image's view, we leverage a strong generative prior in the form of a pre-trained latent video diffusion model.
arXiv Detail & Related papers (2025-03-19T23:14:27Z) - ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring [61.82010103478833]
We develop a context-based local blur detection module that incorporates additional contextual information to improve the identification of blurry regions.
Considering that modern smartphones are equipped with cameras capable of providing short-exposure images, we develop a blur-aware guided image restoration method.
We formulate the above components into a simple yet effective network, named ExpRDiff.
arXiv Detail & Related papers (2024-12-12T11:42:39Z) - Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering [56.68286440268329]
correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials.
We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process.
Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes.
arXiv Detail & Related papers (2024-08-19T05:15:45Z) - Motion Guidance: Diffusion-Based Image Editing with Differentiable
Motion Estimators [19.853978560075305]
Motion guidance is a technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move.
We demonstrate that our technique works on complex motions and produces high quality edits of real and generated images.
arXiv Detail & Related papers (2024-01-31T18:59:59Z) - ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation [45.582704677784825]
Implicit Diffusion-based reBLurring AUgmentation (ID-Blau) is proposed to generate diverse blurred images by simulating motion trajectories in a continuous space.
By sampling diverse blur conditions, ID-Blau can generate various blurred images unseen in the training set.
Results demonstrate that ID-Blau can produce realistic blurred images for training and thus significantly improve performance for state-of-the-art deblurring models.
arXiv Detail & Related papers (2023-12-18T07:47:43Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z) - Curved Diffusion: A Generative Model With Optical Geometry Control [56.24220665691974]
The influence of different optical systems on the final scene appearance is frequently overlooked.
This study introduces a framework that intimately integrates a textto-image diffusion model with the particular lens used in image rendering.
arXiv Detail & Related papers (2023-11-29T13:06:48Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Defocus to focus: Photo-realistic bokeh rendering by fusing defocus and
radiance priors [26.38833313692807]
Bokeh rendering mimics aesthetic shallow depth-of-field (DoF) in professional photography.
Existing methods suffer from simple flat background blur and blurred in-focus regions.
We present a Defocus to Focus (D2F) framework to learn realistic bokeh rendering.
arXiv Detail & Related papers (2023-06-07T15:15:13Z) - Joint Video Multi-Frame Interpolation and Deblurring under Unknown
Exposure Time [101.91824315554682]
In this work, we aim ambitiously for a more realistic and challenging task - joint video multi-frame and deblurring under unknown exposure time.
We first adopt a variant of supervised contrastive learning to construct an exposure-aware representation from input blurred frames.
We then build our video reconstruction network upon the exposure and motion representation by progressive exposure-adaptive convolution and motion refinement.
arXiv Detail & Related papers (2023-03-27T09:43:42Z) - Bokeh-Loss GAN: Multi-Stage Adversarial Training for Realistic
Edge-Aware Bokeh [3.8811606213997587]
We tackle the problem of monocular bokeh synthesis, where we attempt to render a shallow depth of field image from a single all-in-focus image.
Unlike in DSLR cameras, this effect can not be captured directly in mobile cameras due to the physical constraints of the mobile aperture.
We propose a network-based approach that is capable of rendering realistic monocular bokeh from single image inputs.
arXiv Detail & Related papers (2022-08-25T20:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.