Related papers: Deep Reinforcement and InfoMax Learning

Deep Reinforcement and InfoMax Learning

URL: http://arxiv.org/abs/2006.07217v3
Date: Mon, 16 Nov 2020 18:57:23 GMT
Title: Deep Reinforcement and InfoMax Learning
Authors: Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R Devon Hjelm
Abstract summary: We introduce an objective based on Deep InfoMax which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps. We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future.
Score: 32.426674181365456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps. We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future. Finally, we augment C51, a strong RL baseline, with our temporal DIM objective and demonstrate improved performance on a continual learning task and on the recently introduced Procgen environment.

Related papers

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition [86.21199607040147]
Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation language models. We introduce Chain-of-Description, a step-by-step visual understanding method, and integrate structured chain-of-thought (CoT) reasoning to support in-depth multimodal reasoning. Extensive experiments demonstrate that SIcog produces next-generation foundation MLLMs with substantially improved multimodal cognition.
arXiv Detail & Related papers (2025-03-16T00:25:13Z)
On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly. In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z)
An Empirical Study on the Power of Future Prediction in Partially Observable Environments [15.773444560355694]
Self-predictive auxiliary tasks have been shown to improve performance in fully observed settings, but their role in partial observability remains underexplored. We introduce $textttDRL2$, an approach that explicitly decouples representation learning from reinforcement learning. Our findings reinforce the idea that future prediction performance serves as a reliable indicator of representation quality and contributes to improved RL performance.
arXiv Detail & Related papers (2024-02-11T04:53:40Z)
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency [53.8779374188643]
We propose a principled framework with provable regret guarantees to orchestrate reasoning and acting. Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon. At each step, the LLM agent takes the initial action of the planned trajectory ("act for now"), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state.
arXiv Detail & Related papers (2023-09-29T16:36:39Z)
Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets. We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z)
Predictive Experience Replay for Continual Visual Control and Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
PI-QT-Opt: Predictive Information Improves Multi-Task Robotic Reinforcement Learning at Scale [14.444439310266873]
Predictive Information QT-Opt learns representations of the predictive information to solve up to 297 vision-based robot manipulation tasks in simulation and the real world. We demonstrate that modeling the predictive information significantly improves success rates on the training tasks and leads to better zero-shot transfer to unseen novel tasks.
arXiv Detail & Related papers (2022-10-15T07:30:31Z)
Policy Gradients Incorporating the Future [66.20567145291342]
We introduce a method that allows an agent to "look into the future" without explicitly predicting it. We propose to allow an agent, during its training on past experience, to observe what emphactually happened in the future at that time. This gives our agent the opportunity to utilize rich and useful information about the future trajectory dynamics in addition to the present.
arXiv Detail & Related papers (2021-08-04T14:57:11Z)
Pre-Trained Models: Past, Present and Future [126.21572378910746]
Large-scale pre-trained models (PTMs) have recently achieved great success and become a milestone in the field of artificial intelligence (AI) By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch.
arXiv Detail & Related papers (2021-06-14T02:40:32Z)
Data-Efficient Reinforcement Learning with Self-Predictive Representations [21.223069189953037]
We train an agent to predict its own latent state representations multiple steps into the future. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari.
arXiv Detail & Related papers (2020-07-12T07:38:15Z)
Reinforcement Learning through Active Inference [62.997667081978825]
We show how ideas from active inference can augment traditional reinforcement learning approaches. We develop and implement a novel objective for decision making, which we term the free energy of the expected future. We demonstrate that the resulting algorithm successfully exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
arXiv Detail & Related papers (2020-02-28T10:28:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.