DualAfford: Learning Collaborative Visual Affordance for Dual-gripper
Manipulation
- URL: http://arxiv.org/abs/2207.01971v6
- Date: Mon, 27 Mar 2023 04:10:14 GMT
- Title: DualAfford: Learning Collaborative Visual Affordance for Dual-gripper
Manipulation
- Authors: Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun
Mo, Hao Dong
- Abstract summary: We propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks.
The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning.
- Score: 14.964836973282594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is essential yet challenging for future home-assistant robots to
understand and manipulate diverse 3D objects in daily human environments.
Towards building scalable systems that can perform diverse manipulation tasks
over various 3D shapes, recent works have advocated and demonstrated promising
results learning visual actionable affordance, which labels every point over
the input 3D geometry with an action likelihood of accomplishing the downstream
task (e.g., pushing or picking-up). However, these works only studied
single-gripper manipulation tasks, yet many real-world tasks require two hands
to achieve collaboratively. In this work, we propose a novel learning
framework, DualAfford, to learn collaborative affordance for dual-gripper
manipulation tasks. The core design of the approach is to reduce the quadratic
problem for two grippers into two disentangled yet interconnected subtasks for
efficient learning. Using the large-scale PartNet-Mobility and ShapeNet
datasets, we set up four benchmark tasks for dual-gripper manipulation.
Experiments prove the effectiveness and superiority of our method over three
baselines.
Related papers
- SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending [79.83865372778273]
We introduce SkillBlender, a novel hierarchical reinforcement learning framework for versatile humanoid loco-manipulation.<n>SkillBlender first pretrains goal-conditioned task-agnostic primitive skills, and then dynamically blends these skills to accomplish complex loco-manipulation tasks.<n>We also introduce SkillBench, a parallel, cross-embodiment, and diverse simulated benchmark containing three embodiments, four primitive skills, and eight challenging loco-manipulation tasks.
arXiv Detail & Related papers (2025-06-11T03:24:26Z) - VTAO-BiManip: Masked Visual-Tactile-Action Pre-training with Object Understanding for Bimanual Dexterous Manipulation [8.882764358932276]
Bimanual dexterous manipulation remains significant challenges in robotics due to the high DoFs of each hand and their coordination.
Existing single-hand manipulation techniques often leverage human demonstrations to guide RL methods but fail to generalize to complex bimanual tasks involving multiple sub-skills.
We introduce VTAO-BiManip, a novel framework that combines visual-tactile-action pretraining with object understanding to facilitate curriculum RL to enable human-like bimanual manipulation.
arXiv Detail & Related papers (2025-01-07T08:14:53Z) - S2O: Static to Openable Enhancement for Articulated 3D Objects [20.310491257189422]
We introduce the static to openable (S2O) task which creates interactive articulated 3D objects from static counterparts.
Our work enables efficient creation of interactive 3D objects for robotic manipulation and embodied AI tasks.
arXiv Detail & Related papers (2024-09-27T16:34:13Z) - Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - A Unified Framework for 3D Scene Understanding [50.6762892022386]
UniSeg3D is a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model.
It facilitates inter-task knowledge sharing and promotes comprehensive 3D scene understanding.
Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods.
arXiv Detail & Related papers (2024-07-03T16:50:07Z) - Twisting Lids Off with Two Hands [82.21668778600414]
We show how policies trained in simulation can be effectively and efficiently transferred to the real world.
Specifically, we consider the problem of twisting lids of various bottle-like objects with two hands.
This is the first sim-to-real RL system that enables such capabilities on bimanual multi-fingered hands.
arXiv Detail & Related papers (2024-03-04T18:59:30Z) - WHU-Synthetic: A Synthetic Perception Dataset for 3-D Multitask Model Research [9.945833036861892]
WHU-Synthetic is a large-scale 3D synthetic perception dataset designed for multi-task learning.
We implement several novel settings, making it possible to realize certain ideas that are difficult to achieve in real-world scenarios.
arXiv Detail & Related papers (2024-02-29T11:38:44Z) - The Power of the Senses: Generalizable Manipulation from Vision and
Touch through Masked Multimodal Learning [60.91637862768949]
We propose Masked Multimodal Learning (M3L) to fuse visual and tactile information in a reinforcement learning setting.
M3L learns a policy and visual-tactile representations based on masked autoencoding.
We evaluate M3L on three simulated environments with both visual and tactile observations.
arXiv Detail & Related papers (2023-11-02T01:33:00Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D
Action Representation Learning [33.68311764817763]
We propose Prompted Contrast with Masked Motion Modeling, PCM$rm 3$, for versatile 3D action representation learning.
Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner.
Tests on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM$rm 3$ compared to the state-of-the-art works.
arXiv Detail & Related papers (2023-08-08T01:27:55Z) - Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection,
Segmentation, and Depth Estimation [11.608682595506354]
TaskPrompter presents an innovative multi-task prompting framework.
It unifies the learning of (i) task-generic representations, (ii) task-specific representations, and (iii) cross-task interactions.
New benchmark requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation.
arXiv Detail & Related papers (2023-04-03T13:41:35Z) - End-to-End Affordance Learning for Robotic Manipulation [4.405918052597016]
Learning to manipulate 3D objects in an interactive environment has been a challenging problem in Reinforcement Learning.
Visual affordance has shown great prospects in providing object-centric information priors with effective actionable semantics.
In this study, we take advantage of visual affordance by using the contact information generated during the RL training process to predict contact maps of interest.
arXiv Detail & Related papers (2022-09-26T18:24:28Z) - DenseMTL: Cross-task Attention Mechanism for Dense Multi-task Learning [18.745373058797714]
We propose a novel multi-task learning architecture that leverages pairwise cross-task exchange through correlation-guided attention and self-attention.
We conduct extensive experiments across three multi-task setups, showing the advantages of our approach compared to competitive baselines in both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2022-06-17T17:59:45Z) - Learning Object Manipulation Skills via Approximate State Estimation
from Real Videos [47.958512470724926]
Humans are adept at learning new tasks by watching a few instructional videos.
On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain.
In this paper, we explore a method that facilitates learning object manipulation skills directly from videos.
arXiv Detail & Related papers (2020-11-13T08:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.