PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
- URL: http://arxiv.org/abs/2206.01256v1
- Date: Thu, 2 Jun 2022 19:13:03 GMT
- Title: PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
- Authors: Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Qi Gao, Tiancai Wang,
Xiangyu Zhang, Jian Sun
- Abstract summary: PETRv2 is a unified framework for 3D perception from multi-view images.
We extend the 3D position embedding in PETR for temporal modeling.
PETRv2 achieves state-of-the-art performance on 3D object detection and BEV segmentation.
- Score: 105.29493158036105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose PETRv2, a unified framework for 3D perception from
multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal
modeling, which utilizes the temporal information of previous frames to boost
3D object detection. More specifically, we extend the 3D position embedding (3D
PE) in PETR for temporal modeling. The 3D PE achieves the temporal alignment on
object position of different frames. A feature-guided position encoder is
further introduced to improve the data adaptability of 3D PE. To support for
high-quality BEV segmentation, PETRv2 provides a simply yet effective solution
by adding a set of segmentation queries. Each segmentation query is responsible
for segmenting one specific patch of BEV map. PETRv2 achieves state-of-the-art
performance on 3D object detection and BEV segmentation. Detailed robustness
analysis is also conducted on PETR framework. We hope PETRv2 can serve as a
unified framework for 3D perception.
Related papers
- HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras [45.739224968302565]
We present an end-to-end framework named HENet for multi-task 3D perception.
Specifically, we propose a hybrid image encoding network, using a large image encoder for short-term frames and a small image encoder for long-term temporal frames.
According to the characteristics of each perception task, we utilize BEV features of different grid sizes, independent BEV encoders, and task decoders for different tasks.
arXiv Detail & Related papers (2024-04-03T07:10:18Z) - 3DFusion, A real-time 3D object reconstruction pipeline based on
streamed instance segmented data [0.552480439325792]
This paper presents a real-time segmentation and reconstruction system that utilizes RGB-D images.
The system performs pixel-level segmentation on RGB-D data, effectively separating foreground objects from the background.
The real-time 3D modelling can be applied across various domains, including augmented/virtual reality, interior design, urban planning, road assistance, security systems, and more.
arXiv Detail & Related papers (2023-11-11T20:11:58Z) - AOP-Net: All-in-One Perception Network for Joint LiDAR-based 3D Object
Detection and Panoptic Segmentation [9.513467995188634]
AOP-Net is a LiDAR-based multi-task framework that combines 3D object detection and panoptic segmentation.
The AOP-Net achieves state-of-the-art performance for published works on the nuScenes benchmark for both 3D object detection and panoptic segmentation tasks.
arXiv Detail & Related papers (2023-02-02T05:31:53Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object
Detection [11.13693561702228]
The dominant multi-camera 3D detection paradigm is based on explicit 3D feature construction.
Other methods implicitly introduce geometric positional encoding to build the relationship between image tokens and 3D objects.
We propose Focal-PETR with instance-guided supervision and spatial alignment module.
arXiv Detail & Related papers (2022-12-11T13:38:54Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - PETR: Position Embedding Transformation for Multi-View 3D Object
Detection [80.93664973321168]
PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features.
PETR achieves state-of-the-art performance on standard nuScenes dataset and ranks 1st place on the benchmark.
arXiv Detail & Related papers (2022-03-10T20:33:28Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.