ELVIS: Enhance Low-Light for Video Instance Segmentation in the Dark
- URL: http://arxiv.org/abs/2512.01495v1
- Date: Mon, 01 Dec 2025 10:17:07 GMT
- Title: ELVIS: Enhance Low-Light for Video Instance Segmentation in the Dark
- Authors: Joanne Lin, Ruirui Lin, Yini Li, David Bull, Nantheera Anantrasirichai,
- Abstract summary: textbfELVIS (textbfEnhance textbfLow-light for textbfVideo textbfInstance textbfSegmentation) is a novel framework that enables effective domain adaptation of state-of-the-art VIS models to low-light scenarios.<n>It improves performances by up to textbf+3.7AP on the synthetic low-light YouTube-VIS 2019 dataset.
- Score: 6.743827417653301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video instance segmentation (VIS) for low-light content remains highly challenging for both humans and machines alike, due to adverse imaging conditions including noise, blur and low-contrast. The lack of large-scale annotated datasets and the limitations of current synthetic pipelines, particularly in modeling temporal degradations, further hinder progress. Moreover, existing VIS methods are not robust to the degradations found in low-light videos and, as a result, perform poorly even when finetuned on low-light data. In this paper, we introduce \textbf{ELVIS} (\textbf{E}nhance \textbf{L}ow-light for \textbf{V}ideo \textbf{I}nstance \textbf{S}egmentation), a novel framework that enables effective domain adaptation of state-of-the-art VIS models to low-light scenarios. ELVIS comprises an unsupervised synthetic low-light video pipeline that models both spatial and temporal degradations, a calibration-free degradation profile synthesis network (VDP-Net) and an enhancement decoder head that disentangles degradations from content features. ELVIS improves performances by up to \textbf{+3.7AP} on the synthetic low-light YouTube-VIS 2019 dataset. Code will be released upon acceptance.
Related papers
- FRBNet: Revisiting Low-Light Vision through Frequency-Domain Radial Basis Network [7.386546521017689]
We revisit low-light image formation and extend the classical Lambertian model to better characterize low-light conditions.<n>We propose a novel and end-to-end trainable module named textbfFrequency-domain textbfRadial textbfBasis textbfNetwork.<n>As a plug-and-play module, FRBNet can be integrated into existing networks for low-light downstream tasks without modifying loss functions.
arXiv Detail & Related papers (2025-10-27T15:46:07Z) - Dynamic Weight-based Temporal Aggregation for Low-light Video Enhancement [6.8703489542630445]
Low-light video enhancement is challenging due to noise, low contrast, and color degradations.<n>We present DWTA-Net, a novel framework that exploits short- and long-term temporal cues.<n>We show that DWTA-Net effectively suppresses noise and artifacts, delivering superior visual quality compared with state-of-the-art methods.
arXiv Detail & Related papers (2025-10-10T15:00:31Z) - Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling [61.95831392879045]
Deep neural networks (DNNs) have recently become the leading method for low-light image enhancement (LLIE)<n>Despite significant progress, their outputs may still exhibit issues such as amplified noise, incorrect white balance, or unnatural enhancements when deployed in real world applications.<n>A key challenge is the lack of diverse, large scale training data that captures the complexities of low-light conditions and imaging pipelines.<n>We propose a novel image signal processing (ISP) driven data synthesis pipeline that addresses these challenges by generating unlimited paired training data.
arXiv Detail & Related papers (2025-04-16T15:53:53Z) - STORM: Token-Efficient Long Video Understanding for Multimodal LLMs [116.4479155699528]
STORM is a novel architecture incorporating a dedicated temporal encoder between the image encoder and the Video-LLMs.<n>We show that STORM achieves state-of-the-art results across various long video understanding benchmarks.
arXiv Detail & Related papers (2025-03-06T06:17:38Z) - LLVD: LSTM-based Explicit Motion Modeling in Latent Space for Blind Video Denoising [1.9253333342733672]
This paper introduces a novel algorithm designed for scenarios where noise is introduced during video capture.<n>We propose the Latent space LSTM Video Denoiser (LLVD), an end-to-end blind denoising model.<n> Experiments reveal that LLVD demonstrates excellent performance for both synthetic and captured noise.
arXiv Detail & Related papers (2025-01-10T06:20:27Z) - Rethinking High-speed Image Reconstruction Framework with Spike Camera [48.627095354244204]
Spike cameras generate continuous spike streams to capture high-speed scenes with lower bandwidth and higher dynamic range than traditional RGB cameras.<n>We introduce a novel spike-to-image reconstruction framework SpikeCLIP that goes beyond traditional training paradigms.<n>Our experiments on real-world low-light datasets demonstrate that SpikeCLIP significantly enhances texture details and the luminance balance of recovered images.
arXiv Detail & Related papers (2025-01-08T13:00:17Z) - Event-guided Low-light Video Semantic Segmentation [6.938849566816958]
Event cameras can capture motion dynamics, filter out temporal-redundant information, and are robust to lighting conditions.
We propose EVSNet, a lightweight framework that leverages event modality to guide the learning of a unified illumination-invariant representation.
Specifically, we leverage a Motion Extraction Module to extract short-term and long-term temporal motions from event modality and a Motion Fusion Module to integrate image features and motion features adaptively.
arXiv Detail & Related papers (2024-11-01T14:54:34Z) - BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions.
We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels.
Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z) - Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition [52.89441679581216]
Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise.<n>We present an innovative video decomposition strategy that incorporates view-independent and view-dependent components.<n>Our framework consistently outperforms existing methods, establishing a new SOTA performance.
arXiv Detail & Related papers (2024-05-24T15:56:40Z) - A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement [44.1973928137492]
STA-SUNet model is trained on a novel, fully registered dataset (BVI)
It is analysed comparatively against various other models over three test datasets.
It is particularly effective in extreme low-light conditions, yielding fairly good visualisation results.
arXiv Detail & Related papers (2024-03-04T19:06:13Z) - LEDNet: Joint Low-light Enhancement and Deblurring in the Dark [100.24389251273611]
We present the first large-scale dataset for joint low-light enhancement and deblurring.
LOL-Blur contains 12,000 low-blur/normal-sharp pairs with diverse darkness and motion blurs in different scenarios.
We also present an effective network, named LEDNet, to perform joint low-light enhancement and deblurring.
arXiv Detail & Related papers (2022-02-07T17:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.