Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion
- URL: http://arxiv.org/abs/2505.21593v1
- Date: Tue, 27 May 2025 14:33:54 GMT
- Title: Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion
- Authors: Yang Yang, Siming Zheng, Jinwei Chen, Boxi Wu, Xiaofei He, Deng Cai, Bo Li, Peng-Tao Jiang,
- Abstract summary: We propose a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.<n>By conditioning a single-step video diffusion model on MPI layers, our approach achieves realistic and consistent bokeh effects across diverse scenes.
- Score: 27.488654753644692
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in diffusion based editing models have enabled realistic camera simulation and image-based bokeh, but video bokeh remains largely unexplored. Existing video editing models cannot explicitly control focus planes or adjust bokeh intensity, limiting their applicability for controllable optical effects. Moreover, naively extending image-based bokeh methods to video often results in temporal flickering and unsatisfactory edge blur transitions due to the lack of temporal modeling and generalization capability. To address these challenges, we propose a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects. Our method leverages a multi-plane image (MPI) representation constructed through a progressively widening depth sampling function, providing explicit geometric guidance for depth-dependent blur synthesis. By conditioning a single-step video diffusion model on MPI layers and utilizing the strong 3D priors from pre-trained models such as Stable Video Diffusion, our approach achieves realistic and consistent bokeh effects across diverse scenes. Additionally, we introduce a progressive training strategy to enhance temporal consistency, depth robustness, and detail preservation. Extensive experiments demonstrate that our method produces high-quality, controllable bokeh effects and achieves state-of-the-art performance on multiple evaluation benchmarks.
Related papers
- Light-X: Generative 4D Video Rendering with Camera and Illumination Control [52.87059646145144]
Light-X is a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control.<n>To address the lack of paired multi-view and multi-illumination videos, we introduce Light-Syn, a degradation-based pipeline with inverse-mapping.
arXiv Detail & Related papers (2025-12-04T18:59:57Z) - ReLumix: Extending Image Relighting to Video via Video Diffusion Models [5.890782804843724]
Controlling illumination during video post-production is a crucial yet elusive goal in computational photography.<n>This paper introduces ReLumix, a novel framework that decouples the relighting from temporal synthesis.<n>Although trained on synthetic data, ReLumix shows competitive generalization to real-world videos.
arXiv Detail & Related papers (2025-09-28T09:35:33Z) - Stable Video-Driven Portraits [52.008400639227034]
Animation aims to generate photo-realistic videos from a single source image by reenacting the expression and pose from a driving video.<n>Recent advances using diffusion models have demonstrated improved quality but remain constrained by weak control signals and architectural limitations.<n>We propose a novel diffusion based framework that leverages masked facial regions specifically the eyes, nose, and mouth from the driving video as strong motion control cues.
arXiv Detail & Related papers (2025-09-22T08:11:08Z) - BokehDiff: Neural Lens Blur with One-Step Diffusion [53.11429878683807]
We introduce BokehDiff, a lens blur rendering method that achieves physically accurate and visually appealing outcomes.<n>Our method employs a physics-inspired self-attention module that aligns with the image formation process.<n>We adapt the diffusion model to the one-step inference scheme without introducing additional noise, and achieve results of high quality and fidelity.
arXiv Detail & Related papers (2025-07-24T03:23:19Z) - Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models [26.79219274697864]
Current diffusion models typically rely on prompt engineering to mimic such effects.<n>We propose Bokeh Diffusion, a scene-consistent bokeh control framework.<n>Our approach achieves flexible, lens-like blur control and supports applications such as real image editing via inversion.
arXiv Detail & Related papers (2025-03-11T13:49:12Z) - DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models [83.28670336340608]
We introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering.<n>Our model enables practical applications from a single video input--including relighting, material editing, and realistic object insertion.
arXiv Detail & Related papers (2025-01-30T18:59:11Z) - Video Depth Anything: Consistent Depth Estimation for Super-Long Videos [60.857723250653976]
We propose Video Depth Anything for high-quality, consistent depth estimation in super-long videos.<n>Our model is trained on a joint dataset of video depth and unlabeled images, similar to Depth Anything V2.<n>Our approach sets a new state-of-the-art in zero-shot video depth estimation.
arXiv Detail & Related papers (2025-01-21T18:53:30Z) - DiffuEraser: A Diffusion Model for Video Inpainting [13.292164408616257]
We introduce DiffuEraser, a video inpainting model based on stable diffusion, to fill masked regions with greater details and more coherent structures.<n>We also expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models.
arXiv Detail & Related papers (2025-01-17T08:03:02Z) - Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow.<n>We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs.<n>This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z) - Variable Aperture Bokeh Rendering via Customized Focal Plane Guidance [18.390543681127976]
The proposed method has achieved competitive state-of-the-art performance with only 4.4M parameters, which is much lighter than mainstream computational bokeh models.
The proposed method has achieved competitive state-of-the-art performance with only 4.4M parameters, which is much lighter than mainstream computational bokeh models.
arXiv Detail & Related papers (2024-10-18T12:04:23Z) - DaBiT: Depth and Blur informed Transformer for Video Focal Deblurring [4.332534893042983]
In many real-world scenarios, recorded videos suffer from accidental focus blur.<n>This paper introduces a framework optimized for the as yet unattempted task of video focal deblurring (refocusing)<n>We achieve state-of-the-art results with an average PSNR performance over 1.9dB greater than comparable existing video restoration methods.
arXiv Detail & Related papers (2024-07-01T12:22:16Z) - GBSD: Generative Bokeh with Stage Diffusion [16.189787907983106]
The bokeh effect is an artistic technique that blurs out-of-focus areas in a photograph.
We present GBSD, the first generative text-to-image model that synthesizes photorealistic images with a bokeh style.
arXiv Detail & Related papers (2023-06-14T05:34:02Z) - BokehOrNot: Transforming Bokeh Effect with Image Transformer and Lens
Metadata Embedding [2.3784282912975345]
Bokeh effect is an optical phenomenon that offers a pleasant visual experience, typically generated by high-end cameras with wide aperture lenses.
We propose a novel universal method for embedding lens metadata into the model and introducing a loss calculation method using alpha masks.
Based on the above techniques, we propose the BokehOrNot model, which is capable of producing both blur-to-sharp and sharp-to-blur bokeh effect.
arXiv Detail & Related papers (2023-06-06T21:49:56Z) - Joint Video Multi-Frame Interpolation and Deblurring under Unknown
Exposure Time [101.91824315554682]
In this work, we aim ambitiously for a more realistic and challenging task - joint video multi-frame and deblurring under unknown exposure time.
We first adopt a variant of supervised contrastive learning to construct an exposure-aware representation from input blurred frames.
We then build our video reconstruction network upon the exposure and motion representation by progressive exposure-adaptive convolution and motion refinement.
arXiv Detail & Related papers (2023-03-27T09:43:42Z) - MC-Blur: A Comprehensive Benchmark for Image Deblurring [127.6301230023318]
In most real-world images, blur is caused by different factors, e.g., motion and defocus.
We construct a new large-scale multi-cause image deblurring dataset (called MC-Blur)
Based on the MC-Blur dataset, we conduct extensive benchmarking studies to compare SOTA methods in different scenarios.
arXiv Detail & Related papers (2021-12-01T02:10:42Z) - Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner.
To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z) - AIM 2020 Challenge on Rendering Realistic Bokeh [95.87775182820518]
This paper reviews the second AIM realistic bokeh effect rendering challenge.
The goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset.
The participants had to render bokeh effect based on only one single frame without any additional data from other cameras or sensors.
arXiv Detail & Related papers (2020-11-10T09:15:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.