FeatEnHancer: Enhancing Hierarchical Features for Object Detection and
Beyond Under Low-Light Vision
- URL: http://arxiv.org/abs/2308.03594v1
- Date: Mon, 7 Aug 2023 13:52:21 GMT
- Title: FeatEnHancer: Enhancing Hierarchical Features for Object Detection and
Beyond Under Low-Light Vision
- Authors: Khurram Azeem Hashmi, Goutham Kallempudi, Didier Stricker, Muhammamd
Zeshan Afzal
- Abstract summary: FeatEnHancer is a general-purpose plug-and-play module that can be incorporated into any low-light vision pipeline.
We show with extensive experimentation that the enhanced representation produced with FeatEnHancer significantly and consistently improves results in several low-light vision tasks.
- Score: 11.255962936937744
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Extracting useful visual cues for the downstream tasks is especially
challenging under low-light vision. Prior works create enhanced representations
by either correlating visual quality with machine perception or designing
illumination-degrading transformation methods that require pre-training on
synthetic datasets. We argue that optimizing enhanced image representation
pertaining to the loss of the downstream task can result in more expressive
representations. Therefore, in this work, we propose a novel module,
FeatEnHancer, that hierarchically combines multiscale features using
multiheaded attention guided by task-related loss function to create suitable
representations. Furthermore, our intra-scale enhancement improves the quality
of features extracted at each scale or level, as well as combines features from
different scales in a way that reflects their relative importance for the task
at hand. FeatEnHancer is a general-purpose plug-and-play module and can be
incorporated into any low-light vision pipeline. We show with extensive
experimentation that the enhanced representation produced with FeatEnHancer
significantly and consistently improves results in several low-light vision
tasks, including dark object detection (+5.7 mAP on ExDark), face detection
(+1.5 mAPon DARK FACE), nighttime semantic segmentation (+5.1 mIoU on ACDC ),
and video object detection (+1.8 mAP on DarkVision), highlighting the
effectiveness of enhancing hierarchical features under low-light vision.
Related papers
- SAIGFormer: A Spatially-Adaptive Illumination-Guided Network for Low-Light Image Enhancement [58.79901582809091]
Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>We present a Spatially-Adaptive Illumination-Guided Transformer framework that enables accurate illumination restoration.
arXiv Detail & Related papers (2025-07-21T11:38:56Z) - Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models [50.98559225639266]
We investigate the contributions of visual features from different encoder layers using 18 benchmarks spanning 6 task categories.
Our findings reveal that multilayer features provide complementary strengths with varying task dependencies, and uniform fusion leads to suboptimal performance.
We propose the instruction-guided vision aggregator, a module that dynamically integrates multi-layer visual features based on textual instructions.
arXiv Detail & Related papers (2024-12-26T05:41:31Z) - HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning [62.264673293638175]
Existing underwater image enhancement methods primarily focus on improving visual quality while overlooking practical implications.
We propose a invertible network for underwater perception enhancement, dubbed H, which enhances visual quality and demonstrates flexibility in handling other downstream tasks.
arXiv Detail & Related papers (2024-11-27T12:37:03Z) - Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning [102.18178065928426]
We propose an efficient fine-tuning framework with two novel approaches: Vision Cue Enhancement (VCE) and Dual Low-Rank Adaptation (Dual-LoRA)
VCE enhances the vision projector by integrating multi-level visual cues, improving the model's ability to capture fine-grained visual features.
Dual-LoRA introduces a dual low-rank structure for instruction tuning, decoupling learning into skill and task spaces to enable precise control and efficient adaptation across diverse tasks.
arXiv Detail & Related papers (2024-11-19T11:03:09Z) - VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use [74.39058448757645]
We present VipAct, an agent framework that enhances vision-language models (VLMs)
VipAct consists of an orchestrator agent, which manages task requirement analysis, planning, and coordination, along with specialized agents that handle specific tasks.
We evaluate VipAct on benchmarks featuring a diverse set of visual perception tasks, with experimental results demonstrating significant performance improvements.
arXiv Detail & Related papers (2024-10-21T18:10:26Z) - MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training [57.18758272617101]
MaeFuse is a novel autoencoder model designed for infrared and visible image fusion (IVIF)
Our model utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks.
MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
arXiv Detail & Related papers (2024-04-17T02:47:39Z) - ReViT: Enhancing Vision Transformers Feature Diversity with Attention Residual Connections [8.372189962601077]
Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers.
We propose a novel residual attention learning method for improving ViT-based architectures.
arXiv Detail & Related papers (2024-02-17T14:44:10Z) - A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale
Attention Transformer and Luminance Consistency Loss [11.585269110131659]
Low-light image enhancement aims to improve the perception of images collected in dim environments.
Existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure.
We propose a multi-scale attention Transformer named MSATr, which sufficiently extracts local and global features for light balance to improve the visual quality.
arXiv Detail & Related papers (2023-12-27T10:07:11Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - Semantic-aware Texture-Structure Feature Collaboration for Underwater
Image Enhancement [58.075720488942125]
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics.
We develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model.
We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks.
arXiv Detail & Related papers (2022-11-19T07:50:34Z) - Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised
Adaptation [36.050270650417325]
We propose a learnable illumination enhancement model for high-level vision.
Inspired by real camera response functions, we assume that the illumination enhancement function should be a concave curve.
Our model architecture and training designs mutually benefit each other, forming a powerful unsupervised normal-to-low light adaptation framework.
arXiv Detail & Related papers (2022-10-07T19:32:55Z) - Single Image Deraining via Scale-space Invariant Attention Neural
Network [58.5284246878277]
We tackle the notion of scale that deals with visual changes in appearance of rain steaks with respect to the camera.
We propose to represent the multi-scale correlation in convolutional feature domain, which is more compact and robust than that in pixel domain.
In this way, we summarize the most activated presence of feature maps as the salient features.
arXiv Detail & Related papers (2020-06-09T04:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.