Rethinking the Aligned and Misaligned Features in One-stage Object
Detection
- URL: http://arxiv.org/abs/2108.12176v1
- Date: Fri, 27 Aug 2021 08:40:37 GMT
- Title: Rethinking the Aligned and Misaligned Features in One-stage Object
Detection
- Authors: Yang Yang, Min Li, Bo Meng, Junxing Ren, Degang Sun, Zihao Huang
- Abstract summary: One-stage object detectors rely on the point feature to predict the detection results.
We propose a simple and plug-in operator that could generate aligned and disentangled features for each task.
Based on the object-aligned and task-disentangled operator (OAT), we propose OAT-Net, which explicitly exploits point-set features for more accurate detection results.
- Score: 9.270523894683278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-stage object detectors rely on the point feature to predict the detection
results. However, the point feature may lack the information of the whole
object and lead to a misalignment between the object and the point feature.
Meanwhile, the classification and regression tasks are sensitive to different
object regions, but their features are spatially aligned. In this paper, we
propose a simple and plug-in operator that could generate aligned and
disentangled features for each task, respectively, without breaking the fully
convolutional manner. By predicting two task-aware point sets that are located
in each sensitive region, this operator could disentangle the two tasks from
the spatial dimension, as well as align the point feature with the object. We
also reveal an interesting finding of the opposite effect of the long-range
skip-connection for classification and regression, respectively. Based on the
object-aligned and task-disentangled operator (OAT), we propose OAT-Net, which
explicitly exploits point-set features for more accurate detection results.
Extensive experiments on the MS-COCO dataset show that OAT can consistently
boost different one-stage detectors by $\sim$2 AP. Notably, OAT-Net achieves
53.7 AP with Res2Net-101-DCN backbone and shows promising performance gain for
small objects.
Related papers
- Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery [51.83786195178233]
We design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction.
Renormalized connection (RC) on the KDN enables synergistic focusing'' of multi-scale features.
RCs extend the multi-level feature's divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks.
arXiv Detail & Related papers (2024-09-09T13:56:22Z) - PointOBB: Learning Oriented Object Detection via Single Point
Supervision [55.88982271340328]
This paper proposes PointOBB, the first single Point-based OBB generation method, for oriented object detection.
PointOBB operates through the collaborative utilization of three distinctive views: an original view, a resized view, and a rotated/flipped (rot/flp) view.
Experimental results on the DIOR-R and DOTA-v1.0 datasets demonstrate that PointOBB achieves promising performance.
arXiv Detail & Related papers (2023-11-23T15:51:50Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection [55.291579862817656]
Existing oriented object detection methods commonly use metric AP$_50$ to measure the performance of the model.
We argue that AP$_50$ is inherently unsuitable for oriented object detection due to its large tolerance in angle deviation.
We propose an Aspect Ratio Sensitive Oriented Object Detector with Transformer, termed ARS-DETR, which exhibits a competitive performance.
arXiv Detail & Related papers (2023-03-09T02:20:56Z) - Improving Object Detection and Attribute Recognition by Feature
Entanglement Reduction [26.20319853343761]
We show that object detection should be attribute-independent and attributes be largely object-independent.
We disentangle them by the use of a two-stream model where the category and attribute features are computed independently but the classification heads share Regions of Interest (RoIs)
Compared with a traditional single-stream model, our model shows significant improvements over VG-20, a subset of Visual Genome, on both supervised and attribute transfer tasks.
arXiv Detail & Related papers (2021-08-25T22:27:06Z) - TOOD: Task-aligned One-stage Object Detection [41.43371563426291]
One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization.
We propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner.
Experiments are conducted on MS-COCO, where TOOD achieves a 51.1 AP at single-model single-scale testing.
arXiv Detail & Related papers (2021-08-17T17:00:01Z) - Points as Queries: Weakly Semi-supervised Object Detection by Points [25.286468630229592]
We introduce a new detector, Point DETR, which extends DETR by adding a point encoder.
In particular, when using 20% fully labeled data from COCO, our detector achieves a promising performance, 33.3 AP.
arXiv Detail & Related papers (2021-04-15T13:08:25Z) - Decoupled Self Attention for Accurate One Stage Object Detection [4.791635488070342]
A decoupled self attention(DSA) module is proposed for one stage object detection models in this paper.
Although the network of DSA module is simple, but it can effectively improve the performance of object detection, also it can be easily embedded in many detection models.
arXiv Detail & Related papers (2020-12-14T15:19:30Z) - AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection [8.39479809973967]
Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce examples.
Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component.
We present that a general few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations.
arXiv Detail & Related papers (2020-11-30T10:21:32Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - NETNet: Neighbor Erasing and Transferring Network for Better Single Shot
Object Detection [170.30694322460045]
We propose a new Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid features and explore scale-aware features.
A single-shot network called NETNet is constructed for scale-aware object detection.
arXiv Detail & Related papers (2020-01-18T15:21:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.