Physics-Informed Video Flare Synthesis and Removal Leveraging Motion Independence between Flare and Scene
- URL: http://arxiv.org/abs/2512.11327v1
- Date: Fri, 12 Dec 2025 07:01:44 GMT
- Title: Physics-Informed Video Flare Synthesis and Removal Leveraging Motion Independence between Flare and Scene
- Authors: Junqiao Wang, Yuanfei Huang, Hua Huang,
- Abstract summary: Video flare removal pose significantly greater challenges than in image synthesis, owing to the complex and mutually independent motion flare of light sources, and scene content.<n>We propose physics-informed dynamic flare pipeline, which simulates light motion source using optical flow and models the temporal behaviors of both scattering and flares.<n>Our method consistently outperforms existing video-based restoration and image-based flare removal methods on both real and synthetic videos.
- Score: 17.04814273488001
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Lens flare is a degradation phenomenon caused by strong light sources. Existing researches on flare removal have mainly focused on images, while the spatiotemporal characteristics of video flare remain largely unexplored. Video flare synthesis and removal pose significantly greater challenges than in image, owing to the complex and mutually independent motion of flare, light sources, and scene content. This motion independence further affects restoration performance, often resulting in flicker and artifacts. To address this issue, we propose a physics-informed dynamic flare synthesis pipeline, which simulates light source motion using optical flow and models the temporal behaviors of both scattering and reflective flares. Meanwhile, we design a video flare removal network that employs an attention module to spatially suppress flare regions and incorporates a Mamba-based temporal modeling component to capture long range spatio-temporal dependencies. This motion-independent spatiotemporal representation effectively eliminates the need for multi-frame alignment, alleviating temporal aliasing between flares and scene content and thereby improving video flare removal performance. Building upon this, we construct the first video flare dataset to comprehensively evaluate our method, which includes a large set of synthetic paired videos and additional real-world videos collected from the Internet to assess generalization capability. Extensive experiments demonstrate that our method consistently outperforms existing video-based restoration and image-based flare removal methods on both real and synthetic videos, effectively removing dynamic flares while preserving light source integrity and maintaining spatiotemporal consistency of scene.
Related papers
- SLCFormer: Spectral-Local Context Transformer with Physics-Grounded Flare Synthesis for Nighttime Flare Removal [12.135723445465551]
Lens flare is a common nighttime artifact caused by strong light sources scattering within camera lenses.<n>We propose SLCFormer, a novel spectral-local context transformer framework for effective nighttime lens flare removal.<n>Our method achieves state-of-the-art performance, outperforming existing approaches in both quantitative metrics and perceptual visual quality.
arXiv Detail & Related papers (2025-12-17T09:16:59Z) - BurstDeflicker: A Benchmark Dataset for Flicker Removal in Dynamic Scenes [36.35784556196341]
We present BurstDeflicker, a scalable benchmark constructed using three complementary data acquisition strategies.<n>First, we develop a Retinex-based synthesis pipeline that redefines the goal of flicker removal.<n>Second, we capture 4,000 real-world flicker images from different scenes, which help the model better understand the spatial and temporal characteristics of real flicker artifacts.
arXiv Detail & Related papers (2025-10-11T03:46:34Z) - DeflareMamba: Hierarchical Vision Mamba for Contextually Consistent Lens Flare Removal [14.87987455441087]
We present DeflareMamba, a sequence model for lens flare removal.<n>We show that our method effectively removes various types of flare artifacts, including scattering and reflective flares.<n>Further downstream applications demonstrate the capacity of our method to improve visual object recognition and cross-modal semantic understanding.
arXiv Detail & Related papers (2025-08-04T06:49:48Z) - Video Forgery Detection with Optical Flow Residuals and Spatial-Temporal Consistency [1.7061868168035932]
We propose a detection framework that leverages spatial-temporal consistency by combining RGB appearance features with optical flow residuals.<n>By integrating these complementary features, the proposed method effectively detects a wide range of forged videos.
arXiv Detail & Related papers (2025-08-01T07:51:35Z) - UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting [85.27994475113056]
We introduce a general-purpose approach that jointly estimates albedo and synthesizes relit outputs in a single pass.<n>Our model demonstrates strong generalization across diverse domains and surpasses previous methods in both visual fidelity and temporal consistency.
arXiv Detail & Related papers (2025-06-18T17:56:45Z) - RelightVid: Temporal-Consistent Diffusion Model for Video Relighting [95.10341081549129]
RelightVid is a flexible framework for video relighting.<n>It can accept background video, text prompts, or environment maps as relighting conditions.<n>It achieves arbitrary video relighting with high temporal consistency without intrinsic decomposition.
arXiv Detail & Related papers (2025-01-27T18:59:57Z) - Flying with Photons: Rendering Novel Views of Propagating Light [37.06220870989172]
We present an imaging and neural rendering technique that seeks to synthesize videos of light propagating through a scene from novel, moving camera viewpoints.
Our approach relies on a new ultrafast imaging setup to capture a first-of-its kind, multi-viewpoint video dataset with pico-second-level temporal resolution.
arXiv Detail & Related papers (2024-04-09T17:48:52Z) - Lumiere: A Space-Time Diffusion Model for Video Generation [75.54967294846686]
We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once.
This is in contrast to existing video models which synthesize distants followed by temporal super-resolution.
By deploying both spatial and (importantly) temporal down- and up-sampling, our model learns to directly generate a full-frame-rate, low-resolution video.
arXiv Detail & Related papers (2024-01-23T18:05:25Z) - MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z) - LEO: Generative Latent Image Animator for Human Video Synthesis [38.99490968487773]
We propose a novel framework for human video synthesis, placing emphasis on synthesizing-temporal coherency.
Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance.
We implement this idea via a flow-based image animator and a Latent Motion Diffusion Model (LMDM)
arXiv Detail & Related papers (2023-05-06T09:29:12Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.