An Adaptive Deep RL Method for Non-Stationary Environments with
Piecewise Stable Context
- URL: http://arxiv.org/abs/2212.12735v1
- Date: Sat, 24 Dec 2022 13:43:39 GMT
- Title: An Adaptive Deep RL Method for Non-Stationary Environments with
Piecewise Stable Context
- Authors: Xiaoyu Chen, Xiangming Zhu, Yufeng Zheng, Pushi Zhang, Li Zhao, Wenxue
Cheng, Peng Cheng, Yongqiang Xiong, Tao Qin, Jianyu Chen, Tie-Yan Liu
- Abstract summary: Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian.
In this paper, we propose a textittextbfSegmented textbfContext textbfBelief textbfAugmented textbfDeep(SeCBAD) RL method.
Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current
- Score: 109.49663559151377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key challenges in deploying RL to real-world applications is to
adapt to variations of unknown environment contexts, such as changing terrains
in robotic tasks and fluctuated bandwidth in congestion control. Existing works
on adaptation to unknown environment contexts either assume the contexts are
the same for the whole episode or assume the context variables are Markovian.
However, in many real-world applications, the environment context usually stays
stable for a stochastic period and then changes in an abrupt and unpredictable
manner within an episode, resulting in a segment structure, which existing
works fail to address. To leverage the segment structure of piecewise stable
context in real-world applications, in this paper, we propose a
\textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented
\textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief
distribution over latent context with the posterior over segment length and
perform more accurate belief context inference with observed data within the
current context segment. The inferred belief context can be leveraged to
augment the state, leading to a policy that can adapt to abrupt variations in
context. We demonstrate empirically that SeCBAD can infer context segment
length accurately and outperform existing methods on a toy grid world
environment and Mujuco tasks with piecewise-stable context.
Related papers
- ContextDet: Temporal Action Detection with Adaptive Context Aggregation [47.84334557998388]
We introduce a single-stage ContextDet framework for temporal action detection (TAD)
Our model features a pyramid adaptive context aggragation (ACA) architecture, capturing long context and improving action discriminability.
By varying the length of these large kernels across the ACA pyramid, our model provides lightweight yet effective context aggregation and action discrimination.
arXiv Detail & Related papers (2024-10-20T04:28:19Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Context is Environment [45.88558331853988]
Researchers should consider environment as context, and harness the power of learning.
Researchers in domains should consider context as environment to better structure data towards adaptive learning.
arXiv Detail & Related papers (2023-09-18T15:51:27Z) - State Regularized Policy Optimization on Data with Dynamics Shift [25.412472472457324]
In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics.
In this paper, we find that in many environments with similar structures and different dynamics, optimal policies have similar stationary state distributions.
Such distribution is used to regularize the policy trained in a new environment, leading to the SRPO (textbfS textbfRegularized textbfPolicy textbfOptimization) algorithm.
arXiv Detail & Related papers (2023-06-06T10:06:09Z) - Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning.
We propose a new approach called context-aware fine-tuning.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z) - Dynamics-Adaptive Continual Reinforcement Learning via Progressive
Contextualization [29.61829620717385]
Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime.
DaCoRL learns a context-conditioned policy using progressive contextualization.
DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
arXiv Detail & Related papers (2022-09-01T10:26:58Z) - AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning [13.167123175701802]
This paper formalizes the task of adapting to changing environmental dynamics in Reinforcement Learning (RL)
We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such generalization tasks.
We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.
arXiv Detail & Related papers (2022-08-03T22:52:26Z) - RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video
Retrieval [66.2075707179047]
We propose a novel mixture-of-expert transformer RoME that disentangles the text and the video into three levels.
We utilize a transformer-based attention mechanism to fully exploit visual and text embeddings at both global and local levels.
Our method outperforms the state-of-the-art methods on the YouCook2 and MSR-VTT datasets.
arXiv Detail & Related papers (2022-06-26T11:12:49Z) - Reinforcement Learning in Presence of Discrete Markovian Context
Evolution [7.467644044726776]
We consider a context-dependent Reinforcement Learning setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution.
We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling.
We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption.
arXiv Detail & Related papers (2022-02-14T08:52:36Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.