Stereo Neural Vernier Caliper
- URL: http://arxiv.org/abs/2203.11018v1
- Date: Mon, 21 Mar 2022 14:36:07 GMT
- Title: Stereo Neural Vernier Caliper
- Authors: Shichao Li, Zechun Liu, Zhiqiang Shen, Kwang-Ting Cheng
- Abstract summary: We propose a new object-centric framework for learning-based stereo 3D object detection.
We tackle a problem of how to predict a refined update given an initial 3D cuboid guess.
Our approach achieves state-of-the-art performance on the KITTI benchmark.
- Score: 57.187088191829886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new object-centric framework for learning-based stereo 3D object
detection. Previous studies build scene-centric representations that do not
consider the significant variation among outdoor instances and thus lack the
flexibility and functionalities that an instance-level model can offer. We
build such an instance-level model by formulating and tackling a local update
problem, i.e., how to predict a refined update given an initial 3D cuboid
guess. We demonstrate how solving this problem can complement scene-centric
approaches in (i) building a coarse-to-fine multi-resolution system, (ii)
performing model-agnostic object location refinement, and (iii) conducting
stereo 3D tracking-by-detection. Extensive experiments demonstrate the
effectiveness of our approach, which achieves state-of-the-art performance on
the KITTI benchmark. Code and pre-trained models are available at
https://github.com/Nicholasli1995/SNVC.
Related papers
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Shape Anchor Guided Holistic Indoor Scene Understanding [9.463220988312218]
We propose a shape anchor guided learning strategy (AncLearn) for robust holistic indoor scene understanding.
AncLearn generates anchors that dynamically fit instance surfaces to (i) unmix noise and target-related features for offering reliable proposals at the detection stage.
We embed AncLearn into a reconstruction-from-detection learning system (AncRec) to generate high-quality semantic scene models.
arXiv Detail & Related papers (2023-09-20T08:30:20Z) - Object DGCNN: 3D Object Detection using Dynamic Graphs [32.090268859180334]
3D object detection often involves complicated training and testing pipelines.
Inspired by recent non-maximum suppression-free 2D object detection models, we propose a 3D object detection architecture on point clouds.
arXiv Detail & Related papers (2021-10-13T17:59:38Z) - Learning Compositional Shape Priors for Few-Shot 3D Reconstruction [36.40776735291117]
We show that complex encoder-decoder architectures exploit large amounts of per-category data.
We propose three ways to learn a class-specific global shape prior, directly from data.
Experiments on the popular ShapeNet dataset show that our method outperforms a zero-shot baseline by over 40%.
arXiv Detail & Related papers (2021-06-11T14:55:49Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.