Related papers: Learning Synthetic Environments and Reward Networks for Reinforcement Learning

Learning Synthetic Environments and Reward Networks for Reinforcement Learning

URL: http://arxiv.org/abs/2202.02790v1
Date: Sun, 6 Feb 2022 14:55:59 GMT
Title: Learning Synthetic Environments and Reward Networks for Reinforcement Learning
Authors: Fabio Ferreira and Thomas Nierhoff and Andreas Saelinger and Frank Hutter
Abstract summary: We introduce Synthetic Environments (SEs) and Reward Networks (RNs) as proxy environment models for training Reinforcement Learning (RL) agents. We show that an agent, after being trained exclusively on the SE, is able to solve the corresponding real environment.
Score: 34.01695320809796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Synthetic Environments (SEs) and Reward Networks (RNs), represented by neural networks, as proxy environment models for training Reinforcement Learning (RL) agents. We show that an agent, after being trained exclusively on the SE, is able to solve the corresponding real environment. While an SE acts as a full proxy to a real environment by learning about its state dynamics and rewards, an RN is a partial proxy that learns to augment or replace rewards. We use bi-level optimization to evolve SEs and RNs: the inner loop trains the RL agent, and the outer loop trains the parameters of the SE / RN via an evolution strategy. We evaluate our proposed new concept on a broad range of RL algorithms and classic control environments. In a one-to-one comparison, learning an SE proxy requires more interactions with the real environment than training agents only on the real environment. However, once such an SE has been learned, we do not need any interactions with the real environment to train new agents. Moreover, the learned SE proxies allow us to train agents with fewer interactions while maintaining the original task performance. Our empirical results suggest that SEs achieve this result by learning informed representations that bias the agents towards relevant states. Moreover, we find that these proxies are robust against hyperparameter variation and can also transfer to unseen agents.

Related papers

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges. While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z)
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments [20.826907313227323]
Building generalist agents that can rapidly adapt to new environments is a key challenge for deploying AI in the digital and real worlds. We propose a novel approach to pre-train relatively small policies on relatively small datasets and adapt them to unseen environments via in-context learning. Our key idea is that retrieval offers a powerful bias for fast adaptation.
arXiv Detail & Related papers (2024-12-06T03:54:55Z)
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models. The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers. We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z)
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design [11.922951794283168]
In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (D
arXiv Detail & Related papers (2024-02-05T19:47:45Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Multi-Agent Transfer Learning in Reinforcement Learning-Based Ride-Sharing Systems [3.7311680121118345]
Reinforcement learning (RL) has been used in a range of simulated real-world tasks. In this paper we investigate the impact of TL transfer parameters with fixed source and target roles.
arXiv Detail & Related papers (2021-12-01T11:23:40Z)
What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm" We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Learning Synthetic Environments for Reinforcement Learning with Evolution Strategies [34.13101380723782]
This work explores learning agent-agnostic synthetic environments (SEs) for Reinforcement Learning. SEs act as a proxy for target environments and allow agents to be trained more efficiently than when directly trained on the target environment. We show that our method is capable of learning SEs for two discrete-action-space tasks that allow us to train agents more robustly and with up to 60% fewer steps.
arXiv Detail & Related papers (2021-01-24T14:16:13Z)
Emergent Social Learning via Multi-agent Reinforcement Learning [91.57176641192771]
Social learning is a key component of human and animal intelligence. This paper investigates whether independent reinforcement learning agents can learn to use social learning to improve their performance.
arXiv Detail & Related papers (2020-10-01T17:54:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.