Deep Reinforcement and InfoMax Learning
- URL: http://arxiv.org/abs/2006.07217v3
- Date: Mon, 16 Nov 2020 18:57:23 GMT
- Title: Deep Reinforcement and InfoMax Learning
- Authors: Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R
Devon Hjelm
- Abstract summary: We introduce an objective based on Deep InfoMax which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps.
We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future.
- Score: 32.426674181365456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We begin with the hypothesis that a model-free agent whose representations
are predictive of properties of future states (beyond expected rewards) will be
more capable of solving and adapting to new RL problems. To test that
hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains
the agent to predict the future by maximizing the mutual information between
its internal representation of successive timesteps. We test our approach in
several synthetic settings, where it successfully learns representations that
are predictive of the future. Finally, we augment C51, a strong RL baseline,
with our temporal DIM objective and demonstrate improved performance on a
continual learning task and on the recently introduced Procgen environment.
Related papers
- On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - Can LMs Generalize to Future Data? An Empirical Analysis on Text
Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets.
Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets.
We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - PI-QT-Opt: Predictive Information Improves Multi-Task Robotic
Reinforcement Learning at Scale [14.444439310266873]
Predictive Information QT-Opt learns representations of the predictive information to solve up to 297 vision-based robot manipulation tasks in simulation and the real world.
We demonstrate that modeling the predictive information significantly improves success rates on the training tasks and leads to better zero-shot transfer to unseen novel tasks.
arXiv Detail & Related papers (2022-10-15T07:30:31Z) - Policy Gradients Incorporating the Future [66.20567145291342]
We introduce a method that allows an agent to "look into the future" without explicitly predicting it.
We propose to allow an agent, during its training on past experience, to observe what emphactually happened in the future at that time.
This gives our agent the opportunity to utilize rich and useful information about the future trajectory dynamics in addition to the present.
arXiv Detail & Related papers (2021-08-04T14:57:11Z) - Pre-Trained Models: Past, Present and Future [126.21572378910746]
Large-scale pre-trained models (PTMs) have recently achieved great success and become a milestone in the field of artificial intelligence (AI)
By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks.
It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch.
arXiv Detail & Related papers (2021-06-14T02:40:32Z) - Data-Efficient Reinforcement Learning with Self-Predictive
Representations [21.223069189953037]
We train an agent to predict its own latent state representations multiple steps into the future.
On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels.
Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari.
arXiv Detail & Related papers (2020-07-12T07:38:15Z) - Reinforcement Learning through Active Inference [62.997667081978825]
We show how ideas from active inference can augment traditional reinforcement learning approaches.
We develop and implement a novel objective for decision making, which we term the free energy of the expected future.
We demonstrate that the resulting algorithm successfully exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
arXiv Detail & Related papers (2020-02-28T10:28:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.