DualAfford: Learning Collaborative Visual Affordance for Dual-gripper
Manipulation
- URL: http://arxiv.org/abs/2207.01971v6
- Date: Mon, 27 Mar 2023 04:10:14 GMT
- Title: DualAfford: Learning Collaborative Visual Affordance for Dual-gripper
Manipulation
- Authors: Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun
Mo, Hao Dong
- Abstract summary: We propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks.
The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning.
- Score: 14.964836973282594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is essential yet challenging for future home-assistant robots to
understand and manipulate diverse 3D objects in daily human environments.
Towards building scalable systems that can perform diverse manipulation tasks
over various 3D shapes, recent works have advocated and demonstrated promising
results learning visual actionable affordance, which labels every point over
the input 3D geometry with an action likelihood of accomplishing the downstream
task (e.g., pushing or picking-up). However, these works only studied
single-gripper manipulation tasks, yet many real-world tasks require two hands
to achieve collaboratively. In this work, we propose a novel learning
framework, DualAfford, to learn collaborative affordance for dual-gripper
manipulation tasks. The core design of the approach is to reduce the quadratic
problem for two grippers into two disentangled yet interconnected subtasks for
efficient learning. Using the large-scale PartNet-Mobility and ShapeNet
datasets, we set up four benchmark tasks for dual-gripper manipulation.
Experiments prove the effectiveness and superiority of our method over three
baselines.
Related papers
- Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - A Unified Framework for 3D Scene Understanding [50.6762892022386]
UniSeg3D is a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model.
It facilitates inter-task knowledge sharing and promotes comprehensive 3D scene understanding.
Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods.
arXiv Detail & Related papers (2024-07-03T16:50:07Z) - The Power of the Senses: Generalizable Manipulation from Vision and
Touch through Masked Multimodal Learning [60.91637862768949]
We propose Masked Multimodal Learning (M3L) to fuse visual and tactile information in a reinforcement learning setting.
M3L learns a policy and visual-tactile representations based on masked autoencoding.
We evaluate M3L on three simulated environments with both visual and tactile observations.
arXiv Detail & Related papers (2023-11-02T01:33:00Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D
Action Representation Learning [33.68311764817763]
We propose Prompted Contrast with Masked Motion Modeling, PCM$rm 3$, for versatile 3D action representation learning.
Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner.
Tests on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM$rm 3$ compared to the state-of-the-art works.
arXiv Detail & Related papers (2023-08-08T01:27:55Z) - Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection,
Segmentation, and Depth Estimation [11.608682595506354]
TaskPrompter presents an innovative multi-task prompting framework.
It unifies the learning of (i) task-generic representations, (ii) task-specific representations, and (iii) cross-task interactions.
New benchmark requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation.
arXiv Detail & Related papers (2023-04-03T13:41:35Z) - End-to-End Affordance Learning for Robotic Manipulation [4.405918052597016]
Learning to manipulate 3D objects in an interactive environment has been a challenging problem in Reinforcement Learning.
Visual affordance has shown great prospects in providing object-centric information priors with effective actionable semantics.
In this study, we take advantage of visual affordance by using the contact information generated during the RL training process to predict contact maps of interest.
arXiv Detail & Related papers (2022-09-26T18:24:28Z) - Cross-task Attention Mechanism for Dense Multi-task Learning [16.040894192229043]
We jointly address 2D semantic segmentation, and two geometry-related tasks, namely dense depth and surface normal estimation.
We propose a novel multi-task learning architecture that exploits pair-wise cross-task exchange through correlation-guided attention and self-attention.
arXiv Detail & Related papers (2022-06-17T17:59:45Z) - Learning Object Manipulation Skills via Approximate State Estimation
from Real Videos [47.958512470724926]
Humans are adept at learning new tasks by watching a few instructional videos.
On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain.
In this paper, we explore a method that facilitates learning object manipulation skills directly from videos.
arXiv Detail & Related papers (2020-11-13T08:53:47Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.