Related papers: Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

URL: http://arxiv.org/abs/2407.00614v1
Date: Sun, 30 Jun 2024 07:42:57 GMT
Title: Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics
Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang,
Abstract summary: Affordance features of objects serve as a bridge in the functional interaction between agents and objects. We propose a granularity-aware affordance feature extraction method for locating functional affordance areas. We also use highly activated coarse-grained affordance features in hand-object interaction regions to predict grasp gestures. This forms a complete dexterous robotic functional grasping framework GAAF-Dex.
Score: 27.124273762587848
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we propose a granularity-aware affordance feature extraction method for locating functional affordance areas and predicting dexterous coarse gestures. We study the intrinsic mechanisms of human tool use. On one hand, we use fine-grained affordance features of object-functional finger contact areas to locate functional affordance regions. On the other hand, we use highly activated coarse-grained affordance features in hand-object interaction regions to predict grasp gestures. Additionally, we introduce a model-based post-processing module that includes functional finger coordinate localization, finger-to-end coordinate transformation, and force feedback-based coarse-to-fine grasping. This forms a complete dexterous robotic functional grasping framework GAAF-Dex, which learns Granularity-Aware Affordances from human-object interaction for tool-based Functional grasping in Dexterous Robotics. Unlike fully-supervised methods that require extensive data annotation, we employ a weakly supervised approach to extract relevant cues from exocentric (Exo) images of hand-object interactions to supervise feature extraction in egocentric (Ego) images. We have constructed a small-scale dataset, FAH, which includes near 6K images of functional hand-object interaction Exo- and Ego images of 18 commonly used tools performing 6 tasks. Extensive experiments on the dataset demonstrate our method outperforms state-of-the-art methods. The code will be made publicly available at https://github.com/yangfan293/GAAF-DEX.

Related papers

Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control [72.00655365269]
We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
arXiv Detail & Related papers (2025-06-02T17:57:06Z)
Tool-as-Interface: Learning Robot Policies from Human Tool Usage through Imitation Learning [16.394434999046293]
We propose a framework to transfer tool-use knowledge from humans to robots. We validate our approach on diverse real-world tasks, including meatball scooping, pan flipping, wine bottle balancing, and other complex tasks.
arXiv Detail & Related papers (2025-04-06T20:40:19Z)
SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction [86.54738165527502]
We introduce a novel task of generating realistic and diverse 3D hand trajectories given a single image of an object. Hand-object interaction trajectory priors can greatly benefit applications in robotics, embodied AI, augmented reality and related fields.
arXiv Detail & Related papers (2025-03-28T20:53:20Z)
Multi-Keypoint Affordance Representation for Functional Dexterous Grasping [26.961157077703756]
We propose a multi-keypoint affordance representation for functional dexterous grasping. Our method encodes task-driven grasp configurations by localizing functional contact points. Our method significantly improves affordance localization accuracy, grasp consistency, and generalization to unseen tools and tasks.
arXiv Detail & Related papers (2025-02-27T11:54:53Z)
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation [78.60543357822957]
Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. We introduce DexHandDiff, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. Our framework achieves an average of 70.7% success rate on goal adaptive dexterous tasks, highlighting its robustness and flexibility in contact-rich manipulation.
arXiv Detail & Related papers (2024-11-27T18:03:26Z)
FunGrasp: Functional Grasping for Diverse Dexterous Hands [8.316017819784603]
We introduce FunGrasp, a system that enables functional dexterous grasping across various robot hands. To achieve robust sim-to-real transfer, we employ several techniques including privileged learning, system identification, domain randomization, and gravity compensation.
arXiv Detail & Related papers (2024-11-24T07:30:54Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris. Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z)
Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction. The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z)
Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction. Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z)
AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose [16.65196181081623]
We present AffordPose, a large-scale dataset of hand-object interactions with affordance-driven hand pose. We collect a total of 26.7K hand-object interactions, each including the 3D object shape, the part-level affordance label, and the manually adjusted hand poses. The comprehensive data analysis shows the common characteristics and diversity of hand-object interactions per affordance.
arXiv Detail & Related papers (2023-09-16T10:25:28Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
How to select and use tools? : Active Perception of Target Objects Using Multimodal Deep Learning [9.677391628613025]
We focus on active perception using multimodal sensorimotor data while a robot interacts with objects. We construct a deep neural networks (DNN) model that learns to recognize object characteristics. We also examine the contributions of images, force, and tactile data and show that learning a variety of multimodal information results in rich perception for tool use.
arXiv Detail & Related papers (2021-06-04T12:49:30Z)
TANGO: Commonsense Generalization in Predicting Tool Interactions for Mobile Manipulators [15.61285199988595]
We introduce TANGO, a novel neural model for predicting task-specific tool interactions. TANGO encodes the world state comprising of objects and symbolic relationships between them using a graph neural network. We show that by augmenting the representation of the environment with pre-trained embeddings derived from a knowledge-base, the model can generalize effectively to novel environments.
arXiv Detail & Related papers (2021-05-05T18:11:57Z)
Learning Visually Guided Latent Actions for Assistive Teleoperation [9.75385535829762]
We develop assistive robots that condition their latent embeddings on visual inputs. We show that incorporating object detectors pretrained on small amounts of cheap, easy-to-collect structured data enables i) accurately recognizing the current context and ii) generalizing control embeddings to new objects and tasks.
arXiv Detail & Related papers (2021-05-02T23:58:28Z)
Learning Dexterous Grasping with Object-Centric Visual Affordances [86.49357517864937]
Dexterous robotic hands are appealing for their agility and human-like morphology. We introduce an approach for learning dexterous grasping. Our key idea is to embed an object-centric visual affordance model within a deep reinforcement learning loop.
arXiv Detail & Related papers (2020-09-03T04:00:40Z)
Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches. We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map. Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.