Objects as Spatio-Temporal 2.5D points
- URL: http://arxiv.org/abs/2212.02755v2
- Date: Wed, 7 Dec 2022 04:56:19 GMT
- Title: Objects as Spatio-Temporal 2.5D points
- Authors: Paridhi Singh, Gaurav Singh and Arun Kumar
- Abstract summary: We propose a weakly supervised method to estimate 3D position of objects by jointly learning to regress the 2D object detections scene's depth prediction in a single feed-forward pass of a network.
Our proposed method extends a single-point based object detector, and introduces a novel object representation where each object is modeled as a BEV point-temporally, without the need of any 3D or BEV annotations for training and LiDAR data at query time.
- Score: 5.588892124219713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Determining accurate bird's eye view (BEV) positions of objects and tracks in
a scene is vital for various perception tasks including object interactions
mapping, scenario extraction etc., however, the level of supervision required
to accomplish that is extremely challenging to procure. We propose a
light-weight, weakly supervised method to estimate 3D position of objects by
jointly learning to regress the 2D object detections and scene's depth
prediction in a single feed-forward pass of a network. Our proposed method
extends a center-point based single-shot object detector, and introduces a
novel object representation where each object is modeled as a BEV point
spatio-temporally, without the need of any 3D or BEV annotations for training
and LiDAR data at query time. The approach leverages readily available 2D
object supervision along with LiDAR point clouds (used only during training) to
jointly train a single network, that learns to predict 2D object detection
alongside the whole scene's depth, to spatio-temporally model object tracks as
points in BEV. The proposed method is computationally over $\sim$10x efficient
compared to recent SOTA approaches while achieving comparable accuracies on
KITTI tracking benchmark.
Related papers
- PatchContrast: Self-Supervised Pre-training for 3D Object Detection [14.603858163158625]
We introduce PatchContrast, a novel self-supervised point cloud pre-training framework for 3D object detection.
We show that our method outperforms existing state-of-the-art models on three commonly-used 3D detection datasets.
arXiv Detail & Related papers (2023-08-14T07:45:54Z) - OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection [29.530177591608297]
Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost.
Most of the current state-of-the-art detectors follow the query-based bird's-eye-view (BEV) paradigm.
We propose an Object-Centric query-BEV detector OCBEV, which can carve the temporal and spatial cues of moving targets more effectively.
arXiv Detail & Related papers (2023-06-02T17:59:48Z) - DR-WLC: Dimensionality Reduction cognition for object detection and pose
estimation by Watching, Learning and Checking [30.58114448119465]
Existing object detection and pose estimation methods mostly adopt the same-dimensional data for training.
DR-WLC, a dimensionality reduction cognitive model, can perform both object detection and pose estimation tasks at the same time.
arXiv Detail & Related papers (2023-01-17T15:08:32Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Weakly Supervised Learning of Keypoints for 6D Object Pose Estimation [73.40404343241782]
We propose a weakly supervised 6D object pose estimation approach based on 2D keypoint detection.
Our approach achieves comparable performance with state-of-the-art fully supervised approaches.
arXiv Detail & Related papers (2022-03-07T16:23:47Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - 6D Object Pose Estimation using Keypoints and Part Affinity Fields [24.126513851779936]
The task of 6D object pose estimation from RGB images is an important requirement for autonomous service robots to be able to interact with the real world.
We present a two-step pipeline for estimating the 6 DoF translation and orientation of known objects.
arXiv Detail & Related papers (2021-07-05T14:41:19Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data.
The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes.
We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.