Transformers for Object Detection in Large Point Clouds
- URL: http://arxiv.org/abs/2209.15258v1
- Date: Fri, 30 Sep 2022 06:35:43 GMT
- Title: Transformers for Object Detection in Large Point Clouds
- Authors: Felicia Ruppel, Florian Faion, Claudius Gl\"aser, Klaus Dietmayer
- Abstract summary: We present TransLPC, a novel detection model for large point clouds based on a transformer architecture.
We propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries.
This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data.
- Score: 9.287964414592826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present TransLPC, a novel detection model for large point clouds that is
based on a transformer architecture. While object detection with transformers
has been an active field of research, it has proved difficult to apply such
models to point clouds that span a large area, e.g. those that are common in
autonomous driving, with lidar or radar data. TransLPC is able to remedy these
issues: The structure of the transformer model is modified to allow for larger
input sequence lengths, which are sufficient for large point clouds. Besides
this, we propose a novel query refinement technique to improve detection
accuracy, while retaining a memory-friendly number of transformer decoder
queries. The queries are repositioned between layers, moving them closer to the
bounding box they are estimating, in an efficient manner. This simple technique
has a significant effect on detection accuracy, which is evaluated on the
challenging nuScenes dataset on real-world lidar data. Besides this, the
proposed method is compatible with existing transformer-based solutions that
require object detection, e.g. for joint multi-object tracking and detection,
and enables them to be used in conjunction with large point clouds.
Related papers
- Feature Shrinkage Pyramid for Camouflaged Object Detection with
Transformers [34.42710399235461]
Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection.
They suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders.
We propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features.
arXiv Detail & Related papers (2023-03-26T20:50:58Z) - RegFormer: An Efficient Projection-Aware Transformer Network for
Large-Scale Point Cloud Registration [73.69415797389195]
We propose an end-to-end transformer network (RegFormer) for large-scale point cloud alignment.
Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers.
Our transformer has linear complexity, which guarantees high efficiency even for large-scale scenes.
arXiv Detail & Related papers (2023-03-22T08:47:37Z) - Applying Plain Transformers to Real-World Point Clouds [0.0]
This work revisits the plain transformers in real-world point cloud understanding.
To close the performance gap due to the lack of inductive bias, we investigate self-supervised pre-training with masked autoencoder (MAE)
Our models achieve SOTA results in semantic segmentation on the S3DIS dataset and object detection on the ScanNet dataset with lower computational costs.
arXiv Detail & Related papers (2023-02-28T21:06:36Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Towards Light Weight Object Detection System [6.535035773534901]
We present an approximation of the self-attention layers used in the transformer architecture.
We also present a method that uses a transformer encoder layer for multi-resolution feature fusion.
arXiv Detail & Related papers (2022-10-08T00:55:15Z) - An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector.
ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector.
We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - TransLoc3D : Point Cloud based Large-scale Place Recognition using
Adaptive Receptive Fields [40.55971834919629]
We argue that fixed receptive fields are not well suited for place recognition.
We propose a novel Adaptive Receptive Field Module (ARFM), which can adaptively adjust the size of the receptive field based on the input point cloud.
We also present a novel network architecture, named TransLoc3D, to obtain discriminative global descriptors of point clouds.
arXiv Detail & Related papers (2021-05-25T01:54:31Z) - DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain.
We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.
CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.