Dynamics Generalisation in Reinforcement Learning via Adaptive
Context-Aware Policies
- URL: http://arxiv.org/abs/2310.16686v1
- Date: Wed, 25 Oct 2023 14:50:05 GMT
- Title: Dynamics Generalisation in Reinforcement Learning via Adaptive
Context-Aware Policies
- Authors: Michael Beukman, Devon Jarvis, Richard Klein, Steven James, Benjamin
Rosman
- Abstract summary: We present an investigation into how context should be incorporated into behaviour learning to improve generalisation.
We introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information.
We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance.
- Score: 13.410372954752496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While reinforcement learning has achieved remarkable successes in several
domains, its real-world application is limited due to many methods failing to
generalise to unfamiliar conditions. In this work, we consider the problem of
generalising to new transition dynamics, corresponding to cases in which the
environment's response to the agent's actions differs. For example, the
gravitational force exerted on a robot depends on its mass and changes the
robot's mobility. Consequently, in such cases, it is necessary to condition an
agent's actions on extrinsic state information and pertinent contextual
information reflecting how the environment responds. While the need for
context-sensitive policies has been established, the manner in which context is
incorporated architecturally has received less attention. Thus, in this work,
we present an investigation into how context information should be incorporated
into behaviour learning to improve generalisation. To this end, we introduce a
neural network architecture, the Decision Adapter, which generates the weights
of an adapter module and conditions the behaviour of an agent on the context
information. We show that the Decision Adapter is a useful generalisation of a
previously proposed architecture and empirically demonstrate that it results in
superior generalisation performance compared to previous approaches in several
environments. Beyond this, the Decision Adapter is more robust to irrelevant
distractor variables than several alternative methods.
Related papers
- Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning [4.902544998453533]
We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization.
Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings.
arXiv Detail & Related papers (2024-04-15T07:31:48Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Context-Aware Composition of Agent Policies by Markov Decision Process
Entity Embeddings and Agent Ensembles [1.124711723767572]
Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts.
In order to perform services and carry out activities in a goal-oriented manner, agents require prior knowledge.
We propose a novel simulation-based approach that enables the representation of heterogeneous contexts.
arXiv Detail & Related papers (2023-08-28T12:13:36Z) - Decomposed Mutual Information Optimization for Generalized Context in
Meta-Reinforcement Learning [35.87062321504049]
Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making.
This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning.
Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges.
arXiv Detail & Related papers (2022-10-09T09:44:23Z) - AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning [13.167123175701802]
This paper formalizes the task of adapting to changing environmental dynamics in Reinforcement Learning (RL)
We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such generalization tasks.
We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.
arXiv Detail & Related papers (2022-08-03T22:52:26Z) - Generalizing Decision Making for Automated Driving with an Invariant
Environment Representation using Deep Reinforcement Learning [55.41644538483948]
Current approaches either do not generalize well beyond the training data or are not capable to consider a variable number of traffic participants.
We propose an invariant environment representation from the perspective of the ego vehicle.
We show that the agents are capable to generalize successfully to unseen scenarios, due to the abstraction.
arXiv Detail & Related papers (2021-02-12T20:37:29Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - One Solution is Not All You Need: Few-Shot Extrapolation via Structured
MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments.
By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z) - Adapting to Unseen Environments through Explicit Representation of
Context [16.8615211682877]
In order to deploy autonomous agents to domains such as autonomous driving, infrastructure management, health care, and finance, they must be able to adapt safely to unseen situations.
This paper proposes a principled approach where a context module is coevolved with a skill module.
The Context+Skill approach leads to significantly more robust behavior in environments with previously unseen effects.
arXiv Detail & Related papers (2020-02-13T17:15:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.