Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion
- URL: http://arxiv.org/abs/2505.21593v1
- Date: Tue, 27 May 2025 14:33:54 GMT
- Title: Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion
- Authors: Yang Yang, Siming Zheng, Jinwei Chen, Boxi Wu, Xiaofei He, Deng Cai, Bo Li, Peng-Tao Jiang,
- Abstract summary: We propose a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.<n>By conditioning a single-step video diffusion model on MPI layers, our approach achieves realistic and consistent bokeh effects across diverse scenes.
- Score: 27.488654753644692
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in diffusion based editing models have enabled realistic camera simulation and image-based bokeh, but video bokeh remains largely unexplored. Existing video editing models cannot explicitly control focus planes or adjust bokeh intensity, limiting their applicability for controllable optical effects. Moreover, naively extending image-based bokeh methods to video often results in temporal flickering and unsatisfactory edge blur transitions due to the lack of temporal modeling and generalization capability. To address these challenges, we propose a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects. Our method leverages a multi-plane image (MPI) representation constructed through a progressively widening depth sampling function, providing explicit geometric guidance for depth-dependent blur synthesis. By conditioning a single-step video diffusion model on MPI layers and utilizing the strong 3D priors from pre-trained models such as Stable Video Diffusion, our approach achieves realistic and consistent bokeh effects across diverse scenes. Additionally, we introduce a progressive training strategy to enhance temporal consistency, depth robustness, and detail preservation. Extensive experiments demonstrate that our method produces high-quality, controllable bokeh effects and achieves state-of-the-art performance on multiple evaluation benchmarks.
Related papers
- BokehDiff: Neural Lens Blur with One-Step Diffusion [53.11429878683807]
We introduce BokehDiff, a lens blur rendering method that achieves physically accurate and visually appealing outcomes.<n>Our method employs a physics-inspired self-attention module that aligns with the image formation process.<n>We adapt the diffusion model to the one-step inference scheme without introducing additional noise, and achieve results of high quality and fidelity.
arXiv Detail & Related papers (2025-07-24T03:23:19Z) - Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models [26.79219274697864]
Current diffusion models typically rely on prompt engineering to mimic such effects.<n>We propose Bokeh Diffusion, a scene-consistent bokeh control framework.<n>Our approach achieves flexible, lens-like blur control and supports applications such as real image editing via inversion.
arXiv Detail & Related papers (2025-03-11T13:49:12Z) - Video Depth Anything: Consistent Depth Estimation for Super-Long Videos [60.857723250653976]
We propose Video Depth Anything for high-quality, consistent depth estimation in super-long videos.<n>Our model is trained on a joint dataset of video depth and unlabeled images, similar to Depth Anything V2.<n>Our approach sets a new state-of-the-art in zero-shot video depth estimation.
arXiv Detail & Related papers (2025-01-21T18:53:30Z) - Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow.<n>We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs.<n>This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z) - Variable Aperture Bokeh Rendering via Customized Focal Plane Guidance [18.390543681127976]
The proposed method has achieved competitive state-of-the-art performance with only 4.4M parameters, which is much lighter than mainstream computational bokeh models.
The proposed method has achieved competitive state-of-the-art performance with only 4.4M parameters, which is much lighter than mainstream computational bokeh models.
arXiv Detail & Related papers (2024-10-18T12:04:23Z) - GBSD: Generative Bokeh with Stage Diffusion [16.189787907983106]
The bokeh effect is an artistic technique that blurs out-of-focus areas in a photograph.
We present GBSD, the first generative text-to-image model that synthesizes photorealistic images with a bokeh style.
arXiv Detail & Related papers (2023-06-14T05:34:02Z) - BokehOrNot: Transforming Bokeh Effect with Image Transformer and Lens
Metadata Embedding [2.3784282912975345]
Bokeh effect is an optical phenomenon that offers a pleasant visual experience, typically generated by high-end cameras with wide aperture lenses.
We propose a novel universal method for embedding lens metadata into the model and introducing a loss calculation method using alpha masks.
Based on the above techniques, we propose the BokehOrNot model, which is capable of producing both blur-to-sharp and sharp-to-blur bokeh effect.
arXiv Detail & Related papers (2023-06-06T21:49:56Z) - Joint Video Multi-Frame Interpolation and Deblurring under Unknown
Exposure Time [101.91824315554682]
In this work, we aim ambitiously for a more realistic and challenging task - joint video multi-frame and deblurring under unknown exposure time.
We first adopt a variant of supervised contrastive learning to construct an exposure-aware representation from input blurred frames.
We then build our video reconstruction network upon the exposure and motion representation by progressive exposure-adaptive convolution and motion refinement.
arXiv Detail & Related papers (2023-03-27T09:43:42Z) - MC-Blur: A Comprehensive Benchmark for Image Deblurring [127.6301230023318]
In most real-world images, blur is caused by different factors, e.g., motion and defocus.
We construct a new large-scale multi-cause image deblurring dataset (called MC-Blur)
Based on the MC-Blur dataset, we conduct extensive benchmarking studies to compare SOTA methods in different scenarios.
arXiv Detail & Related papers (2021-12-01T02:10:42Z) - Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner.
To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z) - AIM 2020 Challenge on Rendering Realistic Bokeh [95.87775182820518]
This paper reviews the second AIM realistic bokeh effect rendering challenge.
The goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset.
The participants had to render bokeh effect based on only one single frame without any additional data from other cameras or sensors.
arXiv Detail & Related papers (2020-11-10T09:15:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.