AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning
- URL: http://arxiv.org/abs/2208.02376v1
- Date: Wed, 3 Aug 2022 22:52:26 GMT
- Title: AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning
- Authors: Wangyang Yue, Yuan Zhou, Xiaochuan Zhang, Yuchen Hua, Zhiyuan Wang,
Guang Kou
- Abstract summary: This paper formalizes the task of adapting to changing environmental dynamics in Reinforcement Learning (RL)
We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such generalization tasks.
We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.
- Score: 13.167123175701802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) techniques have drawn great attention in many
challenging tasks, but their performance deteriorates dramatically when applied
to real-world problems. Various methods, such as domain randomization, have
been proposed to deal with such situations by training agents under different
environmental setups, and therefore they can be generalized to different
environments during deployment. However, they usually do not incorporate the
underlying environmental factor information that the agents interact with
properly and thus can be overly conservative when facing changes in the
surroundings. In this paper, we first formalize the task of adapting to
changing environmental dynamics in RL as a generalization problem using
Contextual Markov Decision Processes (CMDPs). We then propose the Asymmetric
Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to
deal with such generalization tasks. We demonstrate the essential improvements
in the performance of AACC over existing baselines experimentally in a range of
simulated environments.
Related papers
- Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations [22.6449779859417]
General intelligence requires quick adaption across tasks.
In this paper, we explore a wider range of scenarios where not only the distribution but also the environment spaces may change.
We introduce a causality-guided self-adaptive representation-based approach, called CSR, that equips the agent to generalize effectively.
arXiv Detail & Related papers (2024-07-30T08:48:49Z) - Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning [4.902544998453533]
We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization.
Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings.
arXiv Detail & Related papers (2024-04-15T07:31:48Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Analysis of the Memorization and Generalization Capabilities of AI
Agents: Are Continual Learners Robust? [91.682459306359]
In continual learning (CL), an AI agent learns from non-stationary data streams under dynamic environments.
In this paper, a novel CL framework is proposed to achieve robust generalization to dynamic environments while retaining past knowledge.
The generalization and memorization performance of the proposed framework are theoretically analyzed.
arXiv Detail & Related papers (2023-09-18T21:00:01Z) - Improving adaptability to new environments and removing catastrophic
forgetting in Reinforcement Learning by using an eco-system of agents [3.5786621294068373]
Adapting a Reinforcement Learning (RL) agent to an unseen environment is a difficult task due to typical over-fitting on the training environment.
There is a risk of catastrophic forgetting, where the performance on previously seen environments is seriously hampered.
This paper proposes a novel approach that exploits an ecosystem of agents to address both concerns.
arXiv Detail & Related papers (2022-04-13T17:52:54Z) - Learning Domain Invariant Representations in Goal-conditioned Block MDPs [25.445394992810925]
We propose a theoretical framework that characterizes the generalizability of goal-conditioned policies to new environments.
Under this framework, we develop a practical method PA-SkewFit that enhances domain generalization.
arXiv Detail & Related papers (2021-10-27T08:10:45Z) - Self-Supervised Policy Adaptation during Deployment [98.25486842109936]
Self-supervision allows the policy to continue training after deployment without using any rewards.
Empirical evaluations are performed on diverse simulation environments from DeepMind Control suite and ViZDoom.
Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments.
arXiv Detail & Related papers (2020-07-08T17:56:27Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.