Open-Pose 3D Zero-Shot Learning: Benchmark and Challenges
- URL: http://arxiv.org/abs/2312.07039v2
- Date: Tue, 16 Apr 2024 12:05:55 GMT
- Title: Open-Pose 3D Zero-Shot Learning: Benchmark and Challenges
- Authors: Weiguang Zhao, Guanyu Yang, Rui Zhang, Chenru Jiang, Chaolong Yang, Yuyao Yan, Amir Hussain, Kaizhu Huang,
- Abstract summary: We propose a more realistic and challenging scenario named open-pose 3D zero-shot classification.
First, we revisit the current research on 3D zero-shot classification.
We propose two benchmark datasets specifically designed for the open-pose setting.
- Score: 23.663199578392447
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification, and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D.
Related papers
- Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det.
OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes.
It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Open-CRB: Towards Open World Active Learning for 3D Object Detection [40.80953254074535]
LiDAR-based 3D object detection has recently seen significant advancements through active learning (AL)
In real-world deployments where streaming point clouds may include unknown or novel objects, the ability of current AL methods to capture such objects remains unexplored.
This paper investigates a more practical and challenging research task: Open World Active Learning for 3D Object Detection (OWAL-3D)
arXiv Detail & Related papers (2023-10-16T13:32:53Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Learning Occupancy for Monocular 3D Object Detection [25.56336546513198]
We propose textbfOccupancyM3D, a method of learning occupancy for monocular 3D detection.
It directly learns occupancy in frustum and 3D space, leading to more discriminative and informative 3D features and representations.
Experiments on KITTI and open datasets demonstrate that the proposed method achieves a new state of the art and surpasses other methods by a significant margin.
arXiv Detail & Related papers (2023-05-25T04:03:46Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - End-to-End Learning of Multi-category 3D Pose and Shape Estimation [128.881857704338]
We propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D.
The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations.
In addition to being end-to-end in image to 3D learning, our method also handles objects from multiple categories using a single neural network.
arXiv Detail & Related papers (2021-12-19T17:10:40Z) - FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection [3.330229314824913]
We present FCAF3D - a first-in-class fully convolutional anchor-free indoor 3D object detection method.
It is a simple yet effective method that uses a voxel representation of a point cloud and processes voxels with sparse convolutions.
It can handle large-scale scenes with minimal runtime through a single fully convolutional feed-forward pass.
arXiv Detail & Related papers (2021-12-01T07:28:52Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Weakly Supervised 3D Object Detection from Point Clouds [27.70180601788613]
3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios.
We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
arXiv Detail & Related papers (2020-07-28T03:30:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.