6-DoF Stability Field via Diffusion Models
- URL: http://arxiv.org/abs/2310.17649v1
- Date: Thu, 26 Oct 2023 17:59:12 GMT
- Title: 6-DoF Stability Field via Diffusion Models
- Authors: Takuma Yoneda, Tianchong Jiang, Gregory Shakhnarovich, Matthew R.
Walter
- Abstract summary: We present 6-DoFusion, a generative model capable of generating 3D poses of an object that produces a stable configuration of a given scene.
We evaluate our model on different object placement and stacking tasks, demonstrating its ability to construct stable scenes.
- Score: 9.631625582146537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A core capability for robot manipulation is reasoning over where and how to
stably place objects in cluttered environments. Traditionally, robots have
relied on object-specific, hand-crafted heuristics in order to perform such
reasoning, with limited generalizability beyond a small number of object
instances and object interaction patterns. Recent approaches instead learn
notions of physical interaction, namely motion prediction, but require
supervision in the form of labeled object information or come at the cost of
high sample complexity, and do not directly reason over stability or object
placement. We present 6-DoFusion, a generative model capable of generating 3D
poses of an object that produces a stable configuration of a given scene.
Underlying 6-DoFusion is a diffusion model that incrementally refines a
randomly initialized SE(3) pose to generate a sample from a learned,
context-dependent distribution over stable poses. We evaluate our model on
different object placement and stacking tasks, demonstrating its ability to
construct stable scenes that involve novel object classes as well as to improve
the accuracy of state-of-the-art 3D pose estimation methods.
Related papers
- Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations.
We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action.
Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z) - Fit-NGP: Fitting Object Models to Neural Graphics Primitives [19.513102875891775]
We show that the density field created by a state-of-the-art efficient radiance field reconstruction method is suitable for highly accurate pose estimation.
We present a fully automatic object pose estimation system based on a robot arm with a single wrist-mounted camera.
arXiv Detail & Related papers (2024-01-04T16:57:56Z) - 3D-Aware Hypothesis & Verification for Generalizable Relative Object
Pose Estimation [69.73691477825079]
We present a new hypothesis-and-verification framework to tackle the problem of generalizable object pose estimation.
To measure reliability, we introduce a 3D-aware verification that explicitly applies 3D transformations to the 3D object representations learned from the two input images.
arXiv Detail & Related papers (2023-10-05T13:34:07Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation
of In-hand Objects [1.8263882169310044]
We introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data.
We also introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation.
arXiv Detail & Related papers (2023-06-28T01:18:53Z) - ShapeShift: Superquadric-based Object Pose Estimation for Robotic
Grasping [85.38689479346276]
Current techniques heavily rely on a reference 3D object, limiting their generalizability and making it expensive to expand to new object categories.
This paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object.
arXiv Detail & Related papers (2023-04-10T20:55:41Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.