Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2503.08434v4
- Date: Mon, 16 Jun 2025 13:29:51 GMT
- Title: Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models
- Authors: Armando Fortes, Tianyi Wei, Shangchen Zhou, Xingang Pan,
- Abstract summary: Bokeh Diffusion is a scene-consistent bokeh control framework.<n>We introduce a hybrid training pipeline that aligns in-the-wild images with synthetic blur augmentations.<n>Our approach enables flexible, lens-like blur control, supports downstream applications such as real image editing via inversion.
- Score: 26.79219274697864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography offers precise control over camera settings to shape visual aesthetics - such as depth-of-field via aperture - current diffusion models typically rely on prompt engineering to mimic such effects. This approach often results in crude approximations and inadvertently alters the scene content. In this work, we propose Bokeh Diffusion, a scene-consistent bokeh control framework that explicitly conditions a diffusion model on a physical defocus blur parameter. To overcome the scarcity of paired real-world images captured under different camera settings, we introduce a hybrid training pipeline that aligns in-the-wild images with synthetic blur augmentations, providing diverse scenes and subjects as well as supervision to learn the separation of image content from lens blur. Central to our framework is our grounded self-attention mechanism, trained on image pairs with different bokeh levels of the same scene, which enables blur strength to be adjusted in both directions while preserving the underlying scene. Extensive experiments demonstrate that our approach enables flexible, lens-like blur control, supports downstream applications such as real image editing via inversion, and generalizes effectively across both Stable Diffusion and FLUX architectures.
Related papers
- BokehDiff: Neural Lens Blur with One-Step Diffusion [53.11429878683807]
We introduce BokehDiff, a lens blur rendering method that achieves physically accurate and visually appealing outcomes.<n>Our method employs a physics-inspired self-attention module that aligns with the image formation process.<n>We adapt the diffusion model to the one-step inference scheme without introducing additional noise, and achieve results of high quality and fidelity.
arXiv Detail & Related papers (2025-07-24T03:23:19Z) - Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion [27.488654753644692]
We propose a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.<n>By conditioning a single-step video diffusion model on MPI layers, our approach achieves realistic and consistent bokeh effects across diverse scenes.
arXiv Detail & Related papers (2025-05-27T14:33:54Z) - Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes [34.19578921335553]
Reconstructing 3D scenes from a single image is a fundamentally ill-posed task due to the severely under-constrained nature of the problem.
In this work, we address these inherent limitations in existing single image-to-3D scene feedforward networks.
To alleviate the poor performance due to insufficient information beyond the input image's view, we leverage a strong generative prior in the form of a pre-trained latent video diffusion model.
arXiv Detail & Related papers (2025-03-19T23:14:27Z) - ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring [61.82010103478833]
We develop a context-based local blur detection module that incorporates additional contextual information to improve the identification of blurry regions.
Considering that modern smartphones are equipped with cameras capable of providing short-exposure images, we develop a blur-aware guided image restoration method.
We formulate the above components into a simple yet effective network, named ExpRDiff.
arXiv Detail & Related papers (2024-12-12T11:42:39Z) - Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering [56.68286440268329]
correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials.
We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process.
Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes.
arXiv Detail & Related papers (2024-08-19T05:15:45Z) - Motion Guidance: Diffusion-Based Image Editing with Differentiable
Motion Estimators [19.853978560075305]
Motion guidance is a technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move.
We demonstrate that our technique works on complex motions and produces high quality edits of real and generated images.
arXiv Detail & Related papers (2024-01-31T18:59:59Z) - ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation [45.582704677784825]
Implicit Diffusion-based reBLurring AUgmentation (ID-Blau) is proposed to generate diverse blurred images by simulating motion trajectories in a continuous space.
By sampling diverse blur conditions, ID-Blau can generate various blurred images unseen in the training set.
Results demonstrate that ID-Blau can produce realistic blurred images for training and thus significantly improve performance for state-of-the-art deblurring models.
arXiv Detail & Related papers (2023-12-18T07:47:43Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z) - Curved Diffusion: A Generative Model With Optical Geometry Control [56.24220665691974]
The influence of different optical systems on the final scene appearance is frequently overlooked.
This study introduces a framework that intimately integrates a textto-image diffusion model with the particular lens used in image rendering.
arXiv Detail & Related papers (2023-11-29T13:06:48Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - GBSD: Generative Bokeh with Stage Diffusion [16.189787907983106]
The bokeh effect is an artistic technique that blurs out-of-focus areas in a photograph.
We present GBSD, the first generative text-to-image model that synthesizes photorealistic images with a bokeh style.
arXiv Detail & Related papers (2023-06-14T05:34:02Z) - Defocus to focus: Photo-realistic bokeh rendering by fusing defocus and
radiance priors [26.38833313692807]
Bokeh rendering mimics aesthetic shallow depth-of-field (DoF) in professional photography.
Existing methods suffer from simple flat background blur and blurred in-focus regions.
We present a Defocus to Focus (D2F) framework to learn realistic bokeh rendering.
arXiv Detail & Related papers (2023-06-07T15:15:13Z) - Inverting the Imaging Process by Learning an Implicit Camera Model [73.81635386829846]
This paper proposes a novel implicit camera model which represents the physical imaging process of a camera as a deep neural network.
We demonstrate the power of this new implicit camera model on two inverse imaging tasks.
arXiv Detail & Related papers (2023-04-25T11:55:03Z) - Joint Video Multi-Frame Interpolation and Deblurring under Unknown
Exposure Time [101.91824315554682]
In this work, we aim ambitiously for a more realistic and challenging task - joint video multi-frame and deblurring under unknown exposure time.
We first adopt a variant of supervised contrastive learning to construct an exposure-aware representation from input blurred frames.
We then build our video reconstruction network upon the exposure and motion representation by progressive exposure-adaptive convolution and motion refinement.
arXiv Detail & Related papers (2023-03-27T09:43:42Z) - Bokeh-Loss GAN: Multi-Stage Adversarial Training for Realistic
Edge-Aware Bokeh [3.8811606213997587]
We tackle the problem of monocular bokeh synthesis, where we attempt to render a shallow depth of field image from a single all-in-focus image.
Unlike in DSLR cameras, this effect can not be captured directly in mobile cameras due to the physical constraints of the mobile aperture.
We propose a network-based approach that is capable of rendering realistic monocular bokeh from single image inputs.
arXiv Detail & Related papers (2022-08-25T20:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.