Learning 3D Part Assembly from a Single Image
- URL: http://arxiv.org/abs/2003.09754v2
- Date: Tue, 24 Mar 2020 17:44:20 GMT
- Title: Learning 3D Part Assembly from a Single Image
- Authors: Yichen Li and Kaichun Mo and Lin Shao and Minhyuk Sung and Leonidas
Guibas
- Abstract summary: We introduce a novel problem, single-image-guided 3D part assembly, along with a learningbased solution.
We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object.
- Score: 20.175502864488493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous assembly is a crucial capability for robots in many applications.
For this task, several problems such as obstacle avoidance, motion planning,
and actuator control have been extensively studied in robotics. However, when
it comes to task specification, the space of possibilities remains
underexplored. Towards this end, we introduce a novel problem,
single-image-guided 3D part assembly, along with a learningbased solution. We
study this problem in the setting of furniture assembly from a given complete
set of parts and a single image depicting the entire assembled object. Multiple
challenges exist in this setting, including handling ambiguity among parts
(e.g., slats in a chair back and leg stretchers) and 3D pose prediction for
parts and part subassemblies, whether visible or occluded. We address these
issues by proposing a two-module pipeline that leverages strong 2D-3D
correspondences and assembly-oriented graph message-passing to infer part
relationships. In experiments with a PartNet-based synthetic benchmark, we
demonstrate the effectiveness of our framework as compared with three baseline
approaches.
Related papers
- AssemblyComplete: 3D Combinatorial Construction with Deep Reinforcement Learning [4.3507834596906125]
A critical goal in robotics is to teach robots to adapt to real-world collaborative tasks, particularly in automatic assembly.
This paper introduces 3D assembly completion, which is demonstrated using unit primitives (i.e., Lego bricks)
We propose a two-part deep reinforcement learning (DRL) framework that tackles teaching the robot to understand the objective of an incomplete assembly and learning a construction policy to complete the assembly.
arXiv Detail & Related papers (2024-10-20T18:51:17Z) - DVPE: Divided View Position Embedding for Multi-View 3D Object Detection [7.791229698270439]
Current research faces challenges in balancing between receptive fields and reducing interference when aggregating multi-view features.
This paper proposes a divided view method, in which features are modeled globally via the visibility crossattention mechanism, but interact only with partial features in a divided local virtual space.
Our framework, named DVPE, achieves state-of-the-art performance (57.2% mAP and 64.5% NDS) on the nuScenes test set.
arXiv Detail & Related papers (2024-07-24T02:44:41Z) - Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images [24.10809783713574]
This paper introduces a novel task: translating multi-view images of a structural 3D model into a detailed sequence of assembly instructions.
We propose an end-to-end model known as the Neural Assembler.
arXiv Detail & Related papers (2024-04-25T08:53:23Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding [56.00186960144545]
3D visual grounding is the task of localizing the object in a 3D scene which is referred by a description in natural language.
We propose a dense 3D grounding network, featuring four novel stand-alone modules that aim to improve grounding performance.
arXiv Detail & Related papers (2023-09-08T19:27:01Z) - Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries.
Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z) - 3D Part Assembly Generation with Instance Encoded Transformer [22.330218525999857]
We propose a multi-layer transformer-based framework that involves geometric and relational reasoning between parts to update the part poses iteratively.
We extend our framework to a new task called in-process part assembly.
Our method achieves far more than 10% improvements over the current state-of-the-art in multiple metrics on the public PartNet dataset.
arXiv Detail & Related papers (2022-07-05T02:40:57Z) - Discovering 3D Parts from Image Collections [98.16987919686709]
We tackle the problem of 3D part discovery from only 2D image collections.
Instead of relying on manually annotated parts for supervision, we propose a self-supervised approach.
Our key insight is to learn a novel part shape prior that allows each part to fit an object shape faithfully while constrained to have simple geometry.
arXiv Detail & Related papers (2021-07-28T20:29:16Z) - Generative 3D Part Assembly via Dynamic Graph Learning [34.108515032411695]
Part assembly is a challenging yet crucial task in 3D computer vision and robotics.
We propose an assembly-oriented dynamic graph learning framework that leverages an iterative graph neural network as a backbone.
arXiv Detail & Related papers (2020-06-14T04:26:42Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from
a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives.
Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives.
Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.