Skill Generalization with Verbs
- URL: http://arxiv.org/abs/2410.14118v1
- Date: Fri, 18 Oct 2024 02:12:18 GMT
- Title: Skill Generalization with Verbs
- Authors: Rachel Ma, Lyndon Lam, Benjamin A. Spiegel, Aditya Ganeshan, Roma Patel, Ben Abbatematteo, David Paulius, Stefanie Tellex, George Konidaris,
- Abstract summary: It is imperative that robots can understand natural language commands issued by humans.
We propose a method for generalizing manipulation skills to novel objects using verbs.
We show that our model can generate trajectories that are usable for executing five verb commands applied to novel instances of two different object categories on a real robot.
- Score: 20.90116318432194
- License:
- Abstract: It is imperative that robots can understand natural language commands issued by humans. Such commands typically contain verbs that signify what action should be performed on a given object and that are applicable to many objects. We propose a method for generalizing manipulation skills to novel objects using verbs. Our method learns a probabilistic classifier that determines whether a given object trajectory can be described by a specific verb. We show that this classifier accurately generalizes to novel object categories with an average accuracy of 76.69% across 13 object categories and 14 verbs. We then perform policy search over the object kinematics to find an object trajectory that maximizes classifier prediction for a given verb. Our method allows a robot to generate a trajectory for a novel object based on a verb, which can then be used as input to a motion planner. We show that our model can generate trajectories that are usable for executing five verb commands applied to novel instances of two different object categories on a real robot.
Related papers
- Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation.
Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal.
We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z) - Opening the Vocabulary of Egocentric Actions [42.94865322371628]
This paper proposes a novel open vocabulary action recognition task.
Given a set of verbs and objects observed during training, the goal is to generalize the verbs to an open vocabulary of actions with seen and novel objects.
We create open vocabulary benchmarks on the EPIC-KITCHENS-100 and Assembly101 datasets.
arXiv Detail & Related papers (2023-08-22T15:08:02Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - Actionable Phrase Detection using NLP [0.0]
Actionables are terms that, in the most basic sense, imply the necessity of taking a specific action.
In this paper, the aim is to explore if Actionables can be extracted from raw text using Linguistic filters designed from scratch.
arXiv Detail & Related papers (2022-10-30T13:37:49Z) - Do Trajectories Encode Verb Meaning? [22.409307683247967]
Grounded language models learn to connect concrete categories like nouns and adjectives to the world via images and videos.
In this paper, we investigate the extent to which trajectories (i.e. the position and rotation of objects over time) naturally encode verb semantics.
We find that trajectories correlate as-is with some verbs (e.g., fall), and that additional abstraction via self-supervised pretraining can further capture nuanced differences in verb meaning.
arXiv Detail & Related papers (2022-06-23T19:57:16Z) - Learning 6-DoF Object Poses to Grasp Category-level Objects by Language
Instructions [74.63313641583602]
This paper studies the task of any objects grasping from the known categories by free-form language instructions.
We bring these disciplines together on this open challenge, which is essential to human-robot interaction.
We propose a language-guided 6-DoF category-level object localization model to achieve robotic grasping by comprehending human intention.
arXiv Detail & Related papers (2022-05-09T04:25:14Z) - Synthesis and Execution of Communicative Robotic Movements with
Generative Adversarial Networks [59.098560311521034]
We focus on how to transfer on two different robotic platforms the same kinematics modulation that humans adopt when manipulating delicate objects.
We choose to modulate the velocity profile adopted by the robots' end-effector, inspired by what humans do when transporting objects with different characteristics.
We exploit a novel Generative Adversarial Network architecture, trained with human kinematics examples, to generalize over them and generate new and meaningful velocity profiles.
arXiv Detail & Related papers (2022-03-29T15:03:05Z) - Language Grounding with 3D Objects [60.67796160959387]
We introduce a novel reasoning task that targets both visual and non-visual language about 3D objects.
We introduce several CLIP-based models for distinguishing objects.
We find that adding view estimation to language grounding models improves accuracy on both SNARE and when identifying objects referred to in language on a robot platform.
arXiv Detail & Related papers (2021-07-26T23:35:58Z) - Embodying Pre-Trained Word Embeddings Through Robot Actions [9.048164930020404]
Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots.
Previous studies have shown that robots can use words that are not included in the action-description paired datasets by using pre-trained word embeddings.
We transform the pre-trained word embeddings to embodied ones by using the robot's sensory-motor experiences.
arXiv Detail & Related papers (2021-04-17T12:04:49Z) - COBE: Contextualized Object Embeddings from Narrated Instructional Video [52.73710465010274]
We propose a new framework for learning Contextualized OBject Embeddings from automatically-transcribed narrations of instructional videos.
We leverage the semantic and compositional structure of language by training a visual detector to predict a contextualized word embedding of the object and its associated narration.
Our experiments show that our detector learns to predict a rich variety of contextual object information, and that it is highly effective in the settings of few-shot and zero-shot learning.
arXiv Detail & Related papers (2020-07-14T19:04:08Z) - Robot Object Retrieval with Contextual Natural Language Queries [26.88600852700681]
We develop a model to retrieve objects based on descriptions of their usage.
Our model directly predicts an object's appearance from the object's use specified by a verb phrase.
Based on contextual information present in the language commands, our model can generalize to unseen object classes and unknown nouns.
arXiv Detail & Related papers (2020-06-23T18:13:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.