Semantic-Aware Transformation-Invariant RoI Align
- URL: http://arxiv.org/abs/2312.09609v1
- Date: Fri, 15 Dec 2023 08:50:00 GMT
- Title: Semantic-Aware Transformation-Invariant RoI Align
- Authors: Guo-Ye Yang, George Kiyohiro Nakayama, Zi-Kai Xiao, Tai-Jiang Mu,
Xiaolei Huang, Shi-Min Hu
- Abstract summary: Two-stage detectors often have higher detection accuracy than one-stage detectors.
We propose a novel RoI feature extractor, termed Semantic RoI Align (SRA)
SRA is capable of extracting invariant RoI features under a variety of transformations for two-stage detectors.
- Score: 26.823382081015055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Great progress has been made in learning-based object detection methods in
the last decade. Two-stage detectors often have higher detection accuracy than
one-stage detectors, due to the use of region of interest (RoI) feature
extractors which extract transformation-invariant RoI features for different
RoI proposals, making refinement of bounding boxes and prediction of object
categories more robust and accurate. However, previous RoI feature extractors
can only extract invariant features under limited transformations. In this
paper, we propose a novel RoI feature extractor, termed Semantic RoI Align
(SRA), which is capable of extracting invariant RoI features under a variety of
transformations for two-stage detectors. Specifically, we propose a semantic
attention module to adaptively determine different sampling areas by leveraging
the global and local semantic relationship within the RoI. We also propose a
Dynamic Feature Sampler which dynamically samples features based on the RoI
aspect ratio to enhance the efficiency of SRA, and a new position embedding,
\ie Area Embedding, to provide more accurate position information for SRA
through an improved sampling area representation. Experiments show that our
model significantly outperforms baseline models with slight computational
overhead. In addition, it shows excellent generalization ability and can be
used to improve performance with various state-of-the-art backbones and
detection methods.
Related papers
- Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - R2FD2: Fast and Robust Matching of Multimodal Remote Sensing Image via
Repeatable Feature Detector and Rotation-invariant Feature Descriptor [3.395266574804949]
We propose a novel feature matching method (named R2FD2) that is robust to radiation and rotation differences.
The proposed R2FD2 outperforms five state-of-the-art feature matching methods, and has superior advantages in universality and adaptability.
Our R2FD2 achieves the accuracy of matching within two pixels and has a great advantage in matching efficiency over other state-of-the-art methods.
arXiv Detail & Related papers (2022-12-05T13:55:02Z) - Detecting Rotated Objects as Gaussian Distributions and Its 3-D
Generalization [81.29406957201458]
Existing detection methods commonly use a parameterized bounding box (BBox) to model and detect (horizontal) objects.
We argue that such a mechanism has fundamental limitations in building an effective regression loss for rotation detection.
We propose to model the rotated objects as Gaussian distributions.
We extend our approach from 2-D to 3-D with a tailored algorithm design to handle the heading estimation.
arXiv Detail & Related papers (2022-09-22T07:50:48Z) - AO2-DETR: Arbitrary-Oriented Object Detection Transformer [17.287517988299925]
We propose an Arbitrary-Oriented Object DEtection TRansformer framework, termed AO2-DETR.
An oriented proposal generation mechanism is proposed to explicitly generate oriented proposals.
And a rotation-aware set matching loss is used to ensure the one-to-one matching process for direct set prediction.
arXiv Detail & Related papers (2022-05-25T13:57:13Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z) - I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage
Object Detectors [64.93963042395976]
Implicit Instance-Invariant Network (I3Net) is tailored for adapting one-stage detectors.
I3Net implicitly learns instance-invariant features via exploiting the natural characteristics of deep features in different layers.
Experiments reveal that I3Net exceeds the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2021-03-25T11:14:36Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z) - RoRD: Rotation-Robust Descriptors and Orthographic Views for Local
Feature Matching [32.10261486751993]
We present a novel framework that combines learning of invariant descriptors through data augmentation and viewpoint projection.
We evaluate the effectiveness of the proposed approach on key tasks including pose estimation and visual place recognition.
arXiv Detail & Related papers (2021-03-15T17:40:25Z) - AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection [8.39479809973967]
Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce examples.
Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component.
We present that a general few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations.
arXiv Detail & Related papers (2020-11-30T10:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.