Creating Multimodal Interactive Agents with Imitation and
Self-Supervised Learning
- URL: http://arxiv.org/abs/2112.03763v1
- Date: Tue, 7 Dec 2021 15:17:27 GMT
- Title: Creating Multimodal Interactive Agents with Imitation and
Self-Supervised Learning
- Authors: DeepMind Interactive Agents Team: Josh Abramson, Arun Ahuja, Arthur
Brussee, Federico Carnevale, Mary Cassin, Felix Fischer, Petko Georgiev, Alex
Goldin, Tim Harley, Felix Hill, Peter C Humphreys, Alden Hung, Jessica
Landon, Timothy Lillicrap, Hamza Merzic, Alistair Muldal, Adam Santoro, Guy
Scully, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu
- Abstract summary: A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language.
Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment.
We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time.
- Score: 20.02604302565522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common vision from science fiction is that robots will one day inhabit our
physical spaces, sense the world as we do, assist our physical labours, and
communicate with us through natural language. Here we study how to design
artificial agents that can interact naturally with humans using the
simplification of a virtual environment. We show that imitation learning of
human-human interactions in a simulated world, in conjunction with
self-supervised learning, is sufficient to produce a multimodal interactive
agent, which we call MIA, that successfully interacts with non-adversarial
humans 75% of the time. We further identify architectural and algorithmic
techniques that improve performance, such as hierarchical action selection.
Altogether, our results demonstrate that imitation of multi-modal, real-time
human behaviour may provide a straightforward and surprisingly effective means
of imbuing agents with a rich behavioural prior from which agents might then be
fine-tuned for specific purposes, thus laying a foundation for training capable
agents for interactive robots or digital assistants. A video of MIA's behaviour
may be found at https://youtu.be/ZFgRhviF7mY
Related papers
- Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - Structured World Models from Human Videos [45.08503470821952]
We tackle the problem of learning complex, general behaviors directly in the real world.
We propose an approach for robots to efficiently learn manipulation skills using only a handful of real-world interaction trajectories.
arXiv Detail & Related papers (2023-08-21T17:59:32Z) - Affordances from Human Videos as a Versatile Representation for Robotics [31.248842798600606]
We train a visual affordance model that estimates where and how in the scene a human is likely to interact.
The structure of these behavioral affordances directly enables the robot to perform many complex tasks.
We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild.
arXiv Detail & Related papers (2023-04-17T17:59:34Z) - Generative Agents: Interactive Simulacra of Human Behavior [86.1026716646289]
We introduce generative agents--computational software agents that simulate believable human behavior.
We describe an architecture that extends a large language model to store a complete record of the agent's experiences.
We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims.
arXiv Detail & Related papers (2023-04-07T01:55:19Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z) - Improving Multimodal Interactive Agents with Reinforcement Learning from
Human Feedback [16.268581985382433]
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback.
Here we demonstrate how to use reinforcement learning from human feedback to improve upon simulated, embodied agents.
arXiv Detail & Related papers (2022-11-21T16:00:31Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - Imitating Interactive Intelligence [24.95842455898523]
We study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment.
To build agents that can robustly interact with humans, we would ideally train them while they interact with humans.
We use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour.
arXiv Detail & Related papers (2020-12-10T13:55:47Z) - Learning Affordance Landscapes for Interaction Exploration in 3D
Environments [101.90004767771897]
Embodied agents must be able to master how their environment works.
We introduce a reinforcement learning approach for exploration for interaction.
We demonstrate our idea with AI2-iTHOR.
arXiv Detail & Related papers (2020-08-21T00:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.