Reinforcement Learning from Passive Data via Latent Intentions
- URL: http://arxiv.org/abs/2304.04782v1
- Date: Mon, 10 Apr 2023 17:59:05 GMT
- Title: Reinforcement Learning from Passive Data via Latent Intentions
- Authors: Dibya Ghosh, Chethan Bhateja, Sergey Levine
- Abstract summary: We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
- Score: 86.4969514480008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Passive observational data, such as human videos, is abundant and rich in
information, yet remains largely untapped by current RL methods. Perhaps
surprisingly, we show that passive data, despite not having reward or action
labels, can still be used to learn features that accelerate downstream RL. Our
approach learns from passive data by modeling intentions: measuring how the
likelihood of future outcomes change when the agent acts to achieve a
particular task. We propose a temporal difference learning objective to learn
about intentions, resulting in an algorithm similar to conventional RL, but
which learns entirely from passive data. When optimizing this objective, our
agent simultaneously learns representations of states, of policies, and of
possible outcomes in an environment, all from raw observational data. Both
theoretically and empirically, this scheme learns features amenable for value
prediction for downstream tasks, and our experiments demonstrate the ability to
learn from many forms of passive data, including cross-embodiment video data
and YouTube videos.
Related papers
- FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning [28.523528119584526]
Few-shot imitation learning relies on only a small amount of task-specific demonstrations to efficiently adapt a policy for a given downstream tasks.
We propose FlowRetrieval, an approach that leverages optical flow representations for both extracting similar motions to target tasks from prior data.
Our results show FlowRetrieval significantly outperforms prior methods across simulated and real-world domains.
arXiv Detail & Related papers (2024-08-29T23:48:08Z) - VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime
Inference [36.61783715563126]
Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning without forgetting.
We introduce a novel approach to lifelong learning, which is streaming (observes each training example only once)
We propose a novel emphvirtual gradients based approach for continual representation learning which adapts to each new example while also generalizing well on past data to prevent catastrophic forgetting.
arXiv Detail & Related papers (2023-09-15T07:54:49Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Data Quality in Imitation Learning [15.939363481618738]
In offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity.
This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations.
In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift.
arXiv Detail & Related papers (2023-06-04T18:48:32Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.