Perceiving Unseen 3D Objects by Poking the Objects
- URL: http://arxiv.org/abs/2302.13375v1
- Date: Sun, 26 Feb 2023 18:22:13 GMT
- Title: Perceiving Unseen 3D Objects by Poking the Objects
- Authors: Linghao Chen, Yunzhou Song, Hujun Bao, Xiaowei Zhou
- Abstract summary: We propose a poking-based approach that automatically discovers and reconstructs 3D objects.
The poking process not only enables the robot to discover unseen 3D objects but also produces multi-view observations.
The experiments on real-world data show that our approach could unsupervisedly discover and reconstruct unseen 3D objects with high quality.
- Score: 45.70559270947074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel approach to interactive 3D object perception for robots.
Unlike previous perception algorithms that rely on known object models or a
large amount of annotated training data, we propose a poking-based approach
that automatically discovers and reconstructs 3D objects. The poking process
not only enables the robot to discover unseen 3D objects but also produces
multi-view observations for 3D reconstruction of the objects. The reconstructed
objects are then memorized by neural networks with regular supervised learning
and can be recognized in new test images. The experiments on real-world data
show that our approach could unsupervisedly discover and reconstruct unseen 3D
objects with high quality, and facilitate real-world applications such as
robotic grasping. The code and supplementary materials are available at the
project page: https://zju3dv.github.io/poking_perception.
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Understanding 3D Object Interaction from a Single Image [18.681222155879656]
Humans can easily understand a single image as depicting multiple potential objects permitting interaction.
We would like to endow machines with the similar ability, so that intelligent agents can better explore the 3D scene or manipulate objects.
arXiv Detail & Related papers (2023-05-16T17:59:26Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - LanguageRefer: Spatial-Language Model for 3D Visual Grounding [72.7618059299306]
We develop a spatial-language model for a 3D visual grounding problem.
We show that our model performs competitively on visio-linguistic datasets proposed by ReferIt3D.
arXiv Detail & Related papers (2021-07-07T18:55:03Z) - Seeing by haptic glance: reinforcement learning-based 3D object
Recognition [31.80213713136647]
Human is able to conduct 3D recognition by a limited number of haptic contacts between the target object and his/her fingers without seeing the object.
This capability is defined as haptic glance' in cognitive neuroscience.
Most of the existing 3D recognition models were developed based on dense 3D data.
In many real-life use cases, where robots are used to collect 3D data by haptic exploration, only a limited number of 3D points could be collected.
A novel reinforcement learning based framework is proposed, where the haptic exploration procedure is optimized simultaneously with the objective 3D recognition with actively collected 3D
arXiv Detail & Related papers (2021-02-15T15:38:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.