TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
- URL: http://arxiv.org/abs/2211.09325v3
- Date: Thu, 2 May 2024 16:04:19 GMT
- Title: TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
- Authors: Chuer Pan, Brian Okorn, Harry Zhang, Ben Eisner, David Held,
- Abstract summary: We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task.
We demonstrate our method's capability to generalize to unseen objects, in some cases after training on only 10 demonstrations in the real world.
- Score: 14.011777717620282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How do we imbue robots with the ability to efficiently manipulate unseen objects and transfer relevant skills based on demonstrations? End-to-end learning methods often fail to generalize to novel objects or unseen configurations. Instead, we focus on the task-specific pose relationship between relevant parts of interacting objects. We conjecture that this relationship is a generalizable notion of a manipulation task that can transfer to new objects in the same category; examples include the relationship between the pose of a pan relative to an oven or the pose of a mug relative to a mug rack. We call this task-specific pose relationship "cross-pose" and provide a mathematical definition of this concept. We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task using learned cross-object correspondences. The estimated cross-pose is then used to guide a downstream motion planner to manipulate the objects into the desired pose relationship (placing a pan into the oven or the mug onto the mug rack). We demonstrate our method's capability to generalize to unseen objects, in some cases after training on only 10 demonstrations in the real world. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments across a number of tasks. Supplementary information and videos can be found at https://sites.google.com/view/tax-pose/home.
Related papers
- ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions [66.20773952864802]
We develop a dataset consisting of 8.5k images and 59.3k inferences about actions grounded in those images.
We propose ActionCOMET, a framework to discern knowledge present in language models specific to the provided visual input.
arXiv Detail & Related papers (2024-10-17T15:22:57Z) - Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors [30.579707929061026]
Our work explores the grounding of fine-grained part descriptors for precise manipulation in a zero-shot setting.
We tackle the problem by framing it as a dense semantic part correspondence task.
Our model returns a gripper pose for manipulating a specific part, using as reference a user-defined click from a source image of a visually different instance of the same object.
arXiv Detail & Related papers (2024-03-21T16:26:19Z) - ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics [55.85916671269219]
This paper introduces ManiPose, a pioneering benchmark designed to advance the study of pose-varying manipulation tasks.
A comprehensive dataset features geometrically consistent and manipulation-oriented 6D pose labels for 2936 real-world scanned rigid objects and 100 articulated objects.
Our benchmark demonstrates notable advancements in pose estimation, pose-aware manipulation, and real-robot skill transfer.
arXiv Detail & Related papers (2024-03-20T07:48:32Z) - Few-Shot In-Context Imitation Learning via Implicit Graph Alignment [15.215659641228655]
We formulate imitation learning as a conditional alignment problem between graph representations of objects.
We show that this conditioning allows for in-context learning, where a robot can perform a task on a set of new objects immediately after the demonstrations.
arXiv Detail & Related papers (2023-10-18T18:26:01Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - LOCATE: Localize and Transfer Object Parts for Weakly Supervised
Affordance Grounding [43.157518990171674]
Humans excel at acquiring knowledge through observation.
A key step to acquire this skill is to identify what part of the object affords each action, which is called affordance grounding.
We propose a framework called LOCATE that can identify matching object parts across images.
arXiv Detail & Related papers (2023-03-16T21:47:49Z) - Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z) - Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of
Articulated Objects [73.23249640099516]
We learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views.
Our insight is that adjacent parts that move relative to each other must be connected by a joint.
We show that our method works for different structures, from quadrupeds, to single-arm robots, to humans.
arXiv Detail & Related papers (2021-12-21T16:37:48Z) - ZePHyR: Zero-shot Pose Hypothesis Rating [36.52070583343388]
We introduce a novel method for zero-shot object pose estimation in clutter.
Our approach uses a hypothesis generation and scoring framework, with a focus on learning a scoring function that generalizes to objects not used for training.
We demonstrate how our system can be used by quickly scanning and building a model of a novel object, which can immediately be used by our method for pose estimation.
arXiv Detail & Related papers (2021-04-28T01:48:39Z) - Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts.
We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation.
Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z) - Manipulation-Oriented Object Perception in Clutter through Affordance
Coordinate Frames [10.90648422740674]
In this work, we combine the notions of affordance and category-level pose, and introduce the Affordance Coordinate Frame (ACF)
With ACF, we represent each object class in terms of individual affordance parts and the compatibility between them, where each part is associated with a part category-level pose for robot manipulation.
In our experiments, we demonstrate that ACF outperforms state-of-the-art methods for object detection, as well as category-level pose estimation for object parts.
arXiv Detail & Related papers (2020-10-16T07:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.