Feedback RoI Features Improve Aerial Object Detection
- URL: http://arxiv.org/abs/2311.17129v1
- Date: Tue, 28 Nov 2023 16:09:09 GMT
- Title: Feedback RoI Features Improve Aerial Object Detection
- Authors: Botao Ren, Botian Xu, Tengyu Liu, Jingyi Wang, Zhidong Deng
- Abstract summary: Neuroscience studies have shown that the human visual system utilizes high-level feedback information to guide lower-level perception.
We propose Feedback multi-Level feature Extractor (Flex) to incorporate a similar mechanism for object detection.
Flex refines feature selection based on image-wise and instance-level feedback information in response to image quality variation and classification uncertainty.
- Score: 9.554951222327443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neuroscience studies have shown that the human visual system utilizes
high-level feedback information to guide lower-level perception, enabling
adaptation to signals of different characteristics. In light of this, we
propose Feedback multi-Level feature Extractor (Flex) to incorporate a similar
mechanism for object detection. Flex refines feature selection based on
image-wise and instance-level feedback information in response to image quality
variation and classification uncertainty. Experimental results show that Flex
offers consistent improvement to a range of existing SOTA methods on the
challenging aerial object detection datasets including DOTA-v1.0, DOTA-v1.5,
and HRSC2016. Although the design originates in aerial image detection, further
experiments on MS COCO also reveal our module's efficacy in general detection
models. Quantitative and qualitative analyses indicate that the improvements
are closely related to image qualities, which match our motivation.
Related papers
- Evaluating the Impact of Underwater Image Enhancement on Object Detection Performance: A Comprehensive Study [1.7933377464816112]
This work aims to evaluate state-of-the-art image enhancement models, investigate their impact on underwater object detection, and explore their potential to improve detection performance.
arXiv Detail & Related papers (2024-11-21T22:59:15Z) - Integrated Dynamic Phenological Feature for Remote Sensing Image Land Cover Change Detection [5.109855690325439]
We introduce the InPhea model, which integrates phenological features into a remote sensing image CD framework.
A constrainer with four constraint modules and a multi-stage contrastive learning approach is employed to aid in the model's understanding of phenological characteristics.
Experiments on the HRSCD, SECD, and PSCD-Wuhan datasets reveal that InPhea outperforms other models.
arXiv Detail & Related papers (2024-08-08T01:07:28Z) - AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines [0.0]
Anomaly detection in manufacturing pipelines remains a critical challenge, intensified by the complexity and variability of industrial environments.
This paper introduces AssemAI, an interpretable image-based anomaly detection system tailored for smart manufacturing pipelines.
arXiv Detail & Related papers (2024-08-05T01:50:09Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Diffusion Model Based Visual Compensation Guidance and Visual Difference
Analysis for No-Reference Image Quality Assessment [82.13830107682232]
We propose a novel class of state-of-the-art (SOTA) generative model, which exhibits the capability to model intricate relationships.
We devise a new diffusion restoration network that leverages the produced enhanced image and noise-containing images.
Two visual evaluation branches are designed to comprehensively analyze the obtained high-level feature information.
arXiv Detail & Related papers (2024-02-22T09:39:46Z) - ReViT: Enhancing Vision Transformers Feature Diversity with Attention Residual Connections [8.372189962601077]
Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers.
We propose a novel residual attention learning method for improving ViT-based architectures.
arXiv Detail & Related papers (2024-02-17T14:44:10Z) - Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity [55.399230250413986]
We propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) to remove harmful semantic noise features from the upstream task.
Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets.
arXiv Detail & Related papers (2023-12-11T06:50:27Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Physics Inspired Hybrid Attention for SAR Target Recognition [61.01086031364307]
We propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the issues.
PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target.
Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters.
arXiv Detail & Related papers (2023-09-27T14:39:41Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.