Related papers: Embodying Pre-Trained Word Embeddings Through Robot Actions

Embodying Pre-Trained Word Embeddings Through Robot Actions

URL: http://arxiv.org/abs/2104.08521v1
Date: Sat, 17 Apr 2021 12:04:49 GMT
Title: Embodying Pre-Trained Word Embeddings Through Robot Actions
Authors: Minori Toyoda, Kanata Suzuki, Hiroki Mori, Yoshihiko Hayashi, Tetsuya Ogata
Abstract summary: Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots. Previous studies have shown that robots can use words that are not included in the action-description paired datasets by using pre-trained word embeddings. We transform the pre-trained word embeddings to embodied ones by using the robot's sensory-motor experiences.
Score: 9.048164930020404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a promising neural network model with which to acquire a grounded representation of robot actions and the linguistic descriptions thereof. Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots that interact with people via linguistic dialogue. Previous studies have shown that robots can use words that are not included in the action-description paired datasets by using pre-trained word embeddings. However, the word embeddings trained under the distributional hypothesis are not grounded, as they are derived purely from a text corpus. In this letter, we transform the pre-trained word embeddings to embodied ones by using the robot's sensory-motor experiences. We extend a bidirectional translation model for actions and descriptions by incorporating non-linear layers that retrofit the word embeddings. By training the retrofit layer and the bidirectional translation model alternately, our proposed model is able to transform the pre-trained word embeddings to adapt to a paired action-description dataset. Our results demonstrate that the embeddings of synonyms form a semantic cluster by reflecting the experiences (actions and environments) of a robot. These embeddings allow the robot to properly generate actions from unseen words that are not paired with actions in a dataset.

Related papers

Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris. Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z)
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control. Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z)
Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z)
Natural Language Robot Programming: NLP integrated with autonomous robotic grasping [1.7045152415056037]
We present a grammar-based natural language framework for robot programming, specifically for pick-and-place tasks. Our approach uses a custom dictionary of action words, designed to store together words that share meaning. We validate our framework through simulation and real-world experimentation, using a Franka Panda robotic arm.
arXiv Detail & Related papers (2023-04-06T11:06:30Z)
Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary. We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information. In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z)
Learning Flexible Translation between Robot Actions and Language Descriptions [16.538887534958555]
We propose a paired gated autoencoders (PGAE) for flexible translation between robot actions and language descriptions. We train our model in an end-to-end fashion by pairing each action with appropriate descriptions that contain a signal informing about the translation direction. With the option to use a pretrained language model as the language encoder, our model has the potential to recognise unseen natural language input.
arXiv Detail & Related papers (2022-07-15T12:37:05Z)
Synthesis and Execution of Communicative Robotic Movements with Generative Adversarial Networks [59.098560311521034]
We focus on how to transfer on two different robotic platforms the same kinematics modulation that humans adopt when manipulating delicate objects. We choose to modulate the velocity profile adopted by the robots' end-effector, inspired by what humans do when transporting objects with different characteristics. We exploit a novel Generative Adversarial Network architecture, trained with human kinematics examples, to generalize over them and generate new and meaningful velocity profiles.
arXiv Detail & Related papers (2022-03-29T15:03:05Z)
Augmenting semantic lexicons using word embeddings and transfer learning [1.101002667958165]
We propose two models for predicting sentiment scores to augment semantic lexicons at a relatively low cost using word embeddings and transfer learning. Our evaluation shows both models are able to score new words with a similar accuracy to reviewers from Amazon Mechanical Turk, but at a fraction of the cost.
arXiv Detail & Related papers (2021-09-18T20:59:52Z)
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. We propose to leverage offline robot datasets with crowd-sourced natural language labels. We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)
Interactive Re-Fitting as a Technique for Improving Word Embeddings [0.0]
We make it possible for humans to adjust portions of a word embedding space by moving sets of words closer to one another. Our approach allows users to trigger selective post-processing as they interact with and assess potential bias in word embeddings.
arXiv Detail & Related papers (2020-09-30T21:54:22Z)
Caption Generation of Robot Behaviors based on Unsupervised Learning of Action Segments [10.356412004005767]
Bridging robot action sequences and their natural language captions is an important task to increase explainability of human assisting robots. In this paper, we propose a system for generating natural language captions that describe behaviors of human assisting robots.
arXiv Detail & Related papers (2020-03-23T03:44:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.