Reinforcement Learning in Presence of Discrete Markovian Context
Evolution
- URL: http://arxiv.org/abs/2202.06557v1
- Date: Mon, 14 Feb 2022 08:52:36 GMT
- Title: Reinforcement Learning in Presence of Discrete Markovian Context
Evolution
- Authors: Hang Ren, Aivar Sootla, Taher Jafferjee, Junxiao Shen, Jun Wang and
Haitham Bou-Ammar
- Abstract summary: We consider a context-dependent Reinforcement Learning setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution.
We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling.
We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption.
- Score: 7.467644044726776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a context-dependent Reinforcement Learning (RL) setting, which is
characterized by: a) an unknown finite number of not directly observable
contexts; b) abrupt (discontinuous) context changes occurring during an
episode; and c) Markovian context evolution. We argue that this challenging
case is often met in applications and we tackle it using a Bayesian approach
and variational inference. We adapt a sticky Hierarchical Dirichlet Process
(HDP) prior for model learning, which is arguably best-suited for Markov
process modeling. We then derive a context distillation procedure, which
identifies and removes spurious contexts in an unsupervised fashion. We argue
that the combination of these two components allows to infer the number of
contexts from data thus dealing with the context cardinality assumption. We
then find the representation of the optimal policy enabling efficient policy
learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically
(using gym environments cart-pole swing-up, drone, intersection) that our
approach succeeds where state-of-the-art methods of other frameworks fail and
elaborate on the reasons for such failures.
Related papers
- Learning Rules Explaining Interactive Theorem Proving Tactic Prediction [5.229806149125529]
We represent the problem as an Inductive Logic Programming (ILP) task.
Using the ILP representation we enriched the feature space by encoding additional, computationally expensive properties.
We use this enriched feature space to learn rules explaining when a tactic is applicable to a given proof state.
arXiv Detail & Related papers (2024-11-02T09:18:33Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - READ: Improving Relation Extraction from an ADversarial Perspective [33.44949503459933]
We propose an adversarial training method specifically designed for relation extraction (RE)
Our approach introduces both sequence- and token-level perturbations to the sample and uses a separate perturbation vocabulary to improve the search for entity and context perturbations.
arXiv Detail & Related papers (2024-04-02T16:42:44Z) - DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - A Reinforcement Learning Approach to Domain-Knowledge Inclusion Using
Grammar Guided Symbolic Regression [0.0]
We propose a Reinforcement-Based Grammar-Guided Symbolic Regression (RBG2-SR) method.
RBG2-SR constrains the representational space with domain-knowledge using context-free grammar as reinforcement action space.
We show that our method is competitive against other state-of-the-art methods on the benchmarks and offers the best error-complexity trade-off.
arXiv Detail & Related papers (2022-02-09T10:13:14Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z) - Contextual Policy Transfer in Reinforcement Learning Domains via Deep
Mixtures-of-Experts [24.489002406693128]
We introduce a novel mixture-of-experts formulation for learning state-dependent beliefs over source task dynamics.
We show how this model can be incorporated into standard policy reuse frameworks.
arXiv Detail & Related papers (2020-02-29T07:58:36Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.