Attention-guided Context Feature Pyramid Network for Object Detection
- URL: http://arxiv.org/abs/2005.11475v1
- Date: Sat, 23 May 2020 05:24:50 GMT
- Title: Attention-guided Context Feature Pyramid Network for Object Detection
- Authors: Junxu Cao, Qi Chen, Jun Guo, and Ruichao Shi
- Abstract summary: We build a novel architecture, called Attention-guided Context Feature Pyramid Network (AC-FPN)
AC-FPN exploits discriminative information from various large receptive fields via integrating attention-guided multi-path features.
Our AC-FPN can be readily plugged into existing FPN-based models.
- Score: 10.30536638944019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For object detection, how to address the contradictory requirement between
feature map resolution and receptive field on high-resolution inputs still
remains an open question. In this paper, to tackle this issue, we build a novel
architecture, called Attention-guided Context Feature Pyramid Network (AC-FPN),
that exploits discriminative information from various large receptive fields
via integrating attention-guided multi-path features. The model contains two
modules. The first one is Context Extraction Module (CEM) that explores large
contextual information from multiple receptive fields. As redundant contextual
relations may mislead localization and recognition, we also design the second
module named Attention-guided Module (AM), which can adaptively capture the
salient dependencies over objects by using the attention mechanism. AM consists
of two sub-modules, i.e., Context Attention Module (CxAM) and Content Attention
Module (CnAM), which focus on capturing discriminative semantics and locating
precise positions, respectively. Most importantly, our AC-FPN can be readily
plugged into existing FPN-based models. Extensive experiments on object
detection and instance segmentation show that existing models with our proposed
CEM and AM significantly surpass their counterparts without them, and our model
successfully obtains state-of-the-art results. We have released the source code
at https://github.com/Caojunxu/AC-FPN.
Related papers
- A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - STF: Spatio-Temporal Fusion Module for Improving Video Object Detection [7.213855322671065]
Consive frames in a video contain redundancy, but they may also contain complementary information for the detection task.
We propose a-temporal fusion framework (STF) to leverage this complementary information.
The proposed-temporal fusion module leads to improved detection performance compared to baseline object detectors.
arXiv Detail & Related papers (2024-02-16T15:19:39Z) - Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector [72.05791402494727]
This paper studies the challenging cross-domain few-shot object detection (CD-FSOD)
It aims to develop an accurate object detector for novel domains with minimal labeled examples.
arXiv Detail & Related papers (2024-02-05T15:25:32Z) - Context-Enhanced Detector For Building Detection From Remote Sensing Images [41.3238458718635]
We propose a novel approach called Context-Enhanced Detector (CEDet)
Our approach utilizes a three-stage cascade structure to enhance the extraction of contextual information and improve building detection accuracy.
Our method achieves state-of-the-art performance on three building detection benchmarks, including CNBuilding-9P, CNBuilding-23P, and SpaceNet.
arXiv Detail & Related papers (2023-10-11T16:33:30Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - LOIS: Looking Out of Instance Semantics for Visual Question Answering [17.076621453814926]
We propose a model framework without bounding boxes to understand the causal nexus of object semantics in images.
We implement a mutual relation attention module to model sophisticated and deeper visual semantic relations between instance objects and background information.
Our proposed attention model can further analyze salient image regions by focusing on important word-related questions.
arXiv Detail & Related papers (2023-07-26T12:13:00Z) - Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction [13.454953507205278]
Multi-Modal Relation Extraction aims at identifying the relation between two entities in texts that contain visual clues.
We propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects.
Our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.
arXiv Detail & Related papers (2023-06-19T15:31:34Z) - Semantic Feature Integration network for Fine-grained Visual
Classification [5.182627302449368]
We propose the Semantic Feature Integration network (SFI-Net) to address the above difficulties.
By eliminating unnecessary features and reconstructing the semantic relations among discriminative features, our SFI-Net has achieved satisfying performance.
arXiv Detail & Related papers (2023-02-13T07:32:25Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - BatchFormerV2: Exploring Sample Relationships for Dense Representation
Learning [88.82371069668147]
BatchFormerV2 is a more general batch Transformer module, which enables exploring sample relationships for dense representation learning.
BatchFormerV2 consistently improves current DETR-based detection methods by over 1.3%.
arXiv Detail & Related papers (2022-04-04T05:53:42Z) - Online Multiple Object Tracking with Cross-Task Synergy [120.70085565030628]
We propose a novel unified model with synergy between position prediction and embedding association.
The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model.
arXiv Detail & Related papers (2021-04-01T10:19:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.