Efficient Reinforcement Learning with Impaired Observability: Learning
to Act with Delayed and Missing State Observations
- URL: http://arxiv.org/abs/2306.01243v2
- Date: Thu, 26 Oct 2023 20:10:07 GMT
- Title: Efficient Reinforcement Learning with Impaired Observability: Learning
to Act with Delayed and Missing State Observations
- Authors: Minshuo Chen, Jie Meng, Yu Bai, Yinyu Ye, H. Vincent Poor, Mengdi Wang
- Abstract summary: This paper introduces a theoretical investigation into efficient reinforcement learning in control systems.
We present algorithms and establish near-optimal regret upper and lower bounds, of the form $tildemathcalO(sqrtrm poly(H) SAK)$, for RL in the delayed and missing observation settings.
- Score: 92.25604137490168
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In real-world reinforcement learning (RL) systems, various forms of {\it
impaired observability} can complicate matters. These situations arise when an
agent is unable to observe the most recent state of the system due to latency
or lossy channels, yet the agent must still make real-time decisions. This
paper introduces a theoretical investigation into efficient RL in control
systems where agents must act with delayed and missing state observations. We
present algorithms and establish near-optimal regret upper and lower bounds, of
the form $\tilde{\mathcal{O}}(\sqrt{{\rm poly}(H) SAK})$, for RL in the delayed
and missing observation settings. Here $S$ and $A$ are the sizes of state and
action spaces, $H$ is the time horizon and $K$ is the number of episodes.
Despite impaired observability posing significant challenges to the policy
class and planning, our results demonstrate that learning remains efficient,
with the regret bound optimally depending on the state-action size of the
original system. Additionally, we provide a characterization of the performance
of the optimal policy under impaired observability, comparing it to the optimal
value obtained with full observability. Numerical results are provided to
support our theory.
Related papers
- Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Offline RL with Observation Histories: Analyzing and Improving Sample
Complexity [70.7884839812069]
offline reinforcement learning can synthesize more optimal behavior from a dataset consisting only of suboptimal trials.
We show that standard offline RL algorithms conditioned on observation histories suffer from poor sample complexity.
We propose that offline RL can explicitly optimize this loss to aid worst-case sample complexity.
arXiv Detail & Related papers (2023-10-31T17:29:46Z) - Posterior Sampling with Delayed Feedback for Reinforcement Learning with
Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling.
We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays.
We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Exploring the Training Robustness of Distributional Reinforcement
Learning against Noisy State Observations [7.776010676090131]
State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.
In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
arXiv Detail & Related papers (2021-09-17T22:37:39Z) - State Action Separable Reinforcement Learning [11.04892417160547]
We propose a new learning paradigm, State Action Separable Reinforcement Learning (sasRL)
sasRL, wherein the action space is decoupled from the value function learning process for higher efficiency.
Experiments on several gaming scenarios show that sasRL outperforms state-of-the-art MDP-based RL algorithms by up to $75%$.
arXiv Detail & Related papers (2020-06-05T22:02:57Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.