BoxeR: Box-Attention for 2D and 3D Transformers
- URL: http://arxiv.org/abs/2111.13087v1
- Date: Thu, 25 Nov 2021 13:54:25 GMT
- Title: BoxeR: Box-Attention for 2D and 3D Transformers
- Authors: Duy-Kien Nguyen, Jihong Ju, Olaf Booji, Martin R. Oswald, Cees G. M.
Snoek
- Abstract summary: We present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on an input feature map.
BoxeR-2D naturally reasons about box information within its attention module, making it suitable for end-to-end instance detection and segmentation tasks.
BoxeR-3D is capable of generating discriminative information from a bird-eye-view plane for 3D end-to-end object detection.
- Score: 36.03241565421038
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a simple attention mechanism, we call
Box-Attention. It enables spatial interaction between grid features, as sampled
from boxes of interest, and improves the learning capability of transformers
for several vision tasks. Specifically, we present BoxeR, short for Box
Transformer, which attends to a set of boxes by predicting their transformation
from a reference window on an input feature map. The BoxeR computes attention
weights on these boxes by considering its grid structure. Notably, BoxeR-2D
naturally reasons about box information within its attention module, making it
suitable for end-to-end instance detection and segmentation tasks. By learning
invariance to rotation in the box-attention module, BoxeR-3D is capable of
generating discriminative information from a bird-eye-view plane for 3D
end-to-end object detection. Our experiments demonstrate that the proposed
BoxeR-2D achieves better results on COCO detection, and reaches comparable
performance with well-established and highly-optimized Mask R-CNN on COCO
instance segmentation. BoxeR-3D already obtains a compelling performance for
the vehicle category of Waymo Open, without any class-specific optimization.
The code will be released.
Related papers
- Boximator: Generating Rich and Controllable Motions for Video Synthesis [12.891562157919237]
Boximator is a new approach for fine-grained motion control.
Boximator functions as a plug-in for existing video diffusion models.
It achieves state-of-the-art video quality (FVD) scores, improving on two base models, and further enhanced after incorporating box constraints.
arXiv Detail & Related papers (2024-02-02T16:59:48Z) - Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision [81.60564776995682]
We present Point2RBox, an end-to-end solution for point-supervised object detection.
Our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives.
In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives.
arXiv Detail & Related papers (2023-11-23T15:57:41Z) - OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object
Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network.
It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes.
It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z) - H2RBox: Horizonal Box Annotation is All You Need for Oriented Object
Detection [63.66553556240689]
Oriented object detection emerges in many applications from aerial images to autonomous driving.
Many existing detection benchmarks are annotated with horizontal bounding box only which is also less costive than fine-grained rotated box.
This paper proposes a simple yet effective oriented object detection approach called H2RBox.
arXiv Detail & Related papers (2022-10-13T05:12:45Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - SRCN3D: Sparse R-CNN 3D for Compact Convolutional Multi-View 3D Object
Detection and Tracking [12.285423418301683]
This paper proposes Sparse R-CNN 3D (SRCN3D), a novel two-stage fully-sparse detector that incorporates sparse queries, sparse attention with box-wise sampling, and sparse prediction.
Experiments on nuScenes dataset demonstrate that SRCN3D achieves competitive performance in both 3D object detection and multi-object tracking tasks.
arXiv Detail & Related papers (2022-06-29T07:58:39Z) - 3D-MAN: 3D Multi-frame Attention Network for Object Detection [22.291051951077485]
3D-MAN is a 3D multi-frame attention network that effectively aggregates features from multiple perspectives.
We show that 3D-MAN achieves state-of-the-art results compared to published single-frame and multi-frame methods.
arXiv Detail & Related papers (2021-03-30T03:44:22Z) - BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View [117.44028458220427]
On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices.
We present a fully end-to-end 3D object detection framework that can infer oriented 3D boxes solely from BEV images.
arXiv Detail & Related papers (2020-03-09T15:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.