Zipfian environments for Reinforcement Learning
- URL: http://arxiv.org/abs/2203.08222v1
- Date: Tue, 15 Mar 2022 19:59:10 GMT
- Title: Zipfian environments for Reinforcement Learning
- Authors: Stephanie C. Y. Chan and Andrew K. Lampinen and Pierre H. Richemond
and Felix Hill
- Abstract summary: We show that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories.
We develop three complementary RL environments where the agent's experience varies according to a Zipfian (discrete power law) distribution.
Our results show that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories.
- Score: 19.309119596790563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As humans and animals learn in the natural world, they encounter
distributions of entities, situations and events that are far from uniform.
Typically, a relatively small set of experiences are encountered frequently,
while many important experiences occur only rarely. The highly-skewed,
heavy-tailed nature of reality poses particular learning challenges that humans
and animals have met by evolving specialised memory systems. By contrast, most
popular RL environments and benchmarks involve approximately uniform variation
of properties, objects, situations or tasks. How will RL algorithms perform in
worlds (like ours) where the distribution of environment features is far less
uniform? To explore this question, we develop three complementary RL
environments where the agent's experience varies according to a Zipfian
(discrete power law) distribution. On these benchmarks, we find that standard
Deep RL architectures and algorithms acquire useful knowledge of common
situations and tasks, but fail to adequately learn about rarer ones. To
understand this failure better, we explore how different aspects of current
approaches may be adjusted to help improve performance on rare events, and show
that the RL objective function, the agent's memory system and self-supervised
learning objectives can all influence an agent's ability to learn from uncommon
experiences. Together, these results show that learning robustly from skewed
experience is a critical challenge for applying Deep RL methods beyond
simulations or laboratories, and our Zipfian environments provide a basis for
measuring future progress towards this goal.
Related papers
- Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations [22.6449779859417]
General intelligence requires quick adaption across tasks.
In this paper, we explore a wider range of scenarios where not only the distribution but also the environment spaces may change.
We introduce a causality-guided self-adaptive representation-based approach, called CSR, that equips the agent to generalize effectively.
arXiv Detail & Related papers (2024-07-30T08:48:49Z) - Curiosity & Entropy Driven Unsupervised RL in Multiple Environments [0.0]
We propose and experiment with five new modifications to the original work.
In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more.
However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
arXiv Detail & Related papers (2024-01-08T19:25:40Z) - Adaptive action supervision in reinforcement learning from real-world
multi-agent demonstrations [10.174009792409928]
We propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios.
In the experiments, using chase-and-escape and football tasks with the different dynamics between the unknown source and target environments, we show that our approach achieved a balance between the generalization and the generalization ability compared with the baselines.
arXiv Detail & Related papers (2023-05-22T13:33:37Z) - Human-Timescale Adaptation in an Open-Ended Task Space [56.55530165036327]
We show that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans.
Our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
arXiv Detail & Related papers (2023-01-18T15:39:21Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Continuous Coordination As a Realistic Scenario for Lifelong Learning [6.044372319762058]
We introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings.
We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation.
We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.
arXiv Detail & Related papers (2021-03-04T18:44:03Z) - When Is Generalizable Reinforcement Learning Tractable? [74.87383727210705]
We study the query complexity required to train RL agents that can generalize to multiple environments.
We introduce Strong Proximity, a structural condition which precisely characterizes the relative closeness of different environments.
We show that under a natural weakening of this condition, RL can require query complexity that is exponential in the horizon to generalize.
arXiv Detail & Related papers (2021-01-01T19:08:24Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.