Fine-grained Defocus Blur Control for Generative Image Models
- URL: http://arxiv.org/abs/2510.06215v1
- Date: Tue, 07 Oct 2025 17:59:15 GMT
- Title: Fine-grained Defocus Blur Control for Generative Image Models
- Authors: Ayush Shrivastava, Connelly Barnes, Xuaner Zhang, Lingzhi Zhang, Andrew Owens, Sohrab Amirghodsi, Eli Shechtman,
- Abstract summary: Current text-to-image diffusion models excel at generating diverse, high-quality images.<n>We introduce a novel text-to-image diffusion framework that leverages camera metadata.<n>Our model enables superior fine-grained control without altering the depicted scene.
- Score: 66.30016220484394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current text-to-image diffusion models excel at generating diverse, high-quality images, yet they struggle to incorporate fine-grained camera metadata such as precise aperture settings. In this work, we introduce a novel text-to-image diffusion framework that leverages camera metadata, or EXIF data, which is often embedded in image files, with an emphasis on generating controllable lens blur. Our method mimics the physical image formation process by first generating an all-in-focus image, estimating its monocular depth, predicting a plausible focus distance with a novel focus distance transformer, and then forming a defocused image with an existing differentiable lens blur model. Gradients flow backwards through this whole process, allowing us to learn without explicit supervision to generate defocus effects based on content elements and the provided EXIF data. At inference time, this enables precise interactive user control over defocus effects while preserving scene contents, which is not achievable with existing diffusion models. Experimental results demonstrate that our model enables superior fine-grained control without altering the depicted scene.
Related papers
- DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis [63.59932602411222]
DMAligner is a diffusion-based framework for image alignment through alignment-oriented view synthesis.<n>We propose a Dynamics-aware Diffusion Training approach for learning conditional image generation.<n>We develop the Dynamic Scene Image Alignment (DSIA) dataset using Blender, which includes 1,033 indoor and outdoor scenes with over 30K image pairs tailored for image alignment.
arXiv Detail & Related papers (2026-02-26T14:00:07Z) - Learning to Refocus with Video Diffusion Models [10.749713029715226]
We introduce a novel method for realistic post-capture refocusing using video diffusion models.<n>From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence.<n>Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios.
arXiv Detail & Related papers (2025-12-22T19:29:57Z) - DiffCamera: Arbitrary Refocusing on Images [55.948229011478304]
We propose DiffCamera, a model that enables flexible refocusing of a created image conditioned on an arbitrary new focus point and a blur level.<n>Experiments demonstrate that DiffCamera supports stable refocusing across a wide range of scenes, providing unprecedented control over DoF adjustments for photography and generative AI applications.
arXiv Detail & Related papers (2025-09-30T17:48:23Z) - BokehDiff: Neural Lens Blur with One-Step Diffusion [53.11429878683807]
We introduce BokehDiff, a lens blur rendering method that achieves physically accurate and visually appealing outcomes.<n>Our method employs a physics-inspired self-attention module that aligns with the image formation process.<n>We adapt the diffusion model to the one-step inference scheme without introducing additional noise, and achieve results of high quality and fidelity.
arXiv Detail & Related papers (2025-07-24T03:23:19Z) - Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models [26.79219274697864]
Bokeh Diffusion is a scene-consistent bokeh control framework.<n>We introduce a hybrid training pipeline that aligns in-the-wild images with synthetic blur augmentations.<n>Our approach enables flexible, lens-like blur control, supports downstream applications such as real image editing via inversion.
arXiv Detail & Related papers (2025-03-11T13:49:12Z) - Curved Diffusion: A Generative Model With Optical Geometry Control [56.24220665691974]
The influence of different optical systems on the final scene appearance is frequently overlooked.
This study introduces a framework that intimately integrates a textto-image diffusion model with the particular lens used in image rendering.
arXiv Detail & Related papers (2023-11-29T13:06:48Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Single Image Optical Flow Estimation with an Event Camera [38.92408855196647]
Event cameras are bio-inspired sensors that report intensity changes in microsecond resolution.
We propose a single image (potentially blurred) and events based optical flow estimation approach.
arXiv Detail & Related papers (2020-04-01T11:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.