Related papers: Blind Decision Making: Reinforcement Learning with Delayed Observations

Blind Decision Making: Reinforcement Learning with Delayed Observations

URL: http://arxiv.org/abs/2011.07715v1
Date: Mon, 16 Nov 2020 04:29:14 GMT
Title: Blind Decision Making: Reinforcement Learning with Delayed Observations
Authors: Mridul Agarwal, Vaneet Aggarwal
Abstract summary: Reinforcement learning assumes that the state update from the previous actions happens instantaneously. When the state update is not available, the decision taken is partly in the blind since it cannot rely on the current state information. This paper proposes an approach, where the delay in the knowledge of the state can be used, and the decisions are made based on the available information.
Score: 43.126718159042305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning typically assumes that the state update from the previous actions happens instantaneously, and thus can be used for making future decisions. However, this may not always be true. When the state update is not available, the decision taken is partly in the blind since it cannot rely on the current state information. This paper proposes an approach, where the delay in the knowledge of the state can be used, and the decisions are made based on the available information which may not include the current state information. One approach could be to include the actions after the last-known state as a part of the state information, however, that leads to an increased state-space making the problem complex and slower in convergence. The proposed algorithm gives an alternate approach where the state space is not enlarged, as compared to the case when there is no delay in the state update. Evaluations on the basic RL environments further illustrate the improved performance of the proposed algorithm.

Related papers

OpenPI-C: A Better Benchmark and Stronger Baseline for Open-Vocabulary State Tracking [55.62705574507595]
OpenPI is the only dataset annotated for open-vocabulary state tracking. We categorize 3 types of problems on the procedure level, step level and state change level respectively. For the evaluation metric, we propose a cluster-based metric to fix the original metric's preference for repetition.
arXiv Detail & Related papers (2023-06-01T16:48:20Z)
Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration [97.19464604735802]
A promising technique for exploration is to maximize the entropy of visited state distribution. It tends to struggle in a supervised setup with a task reward, where an agent prefers to visit high-value states. We present a novel exploration technique that maximizes the value-conditional state entropy.
arXiv Detail & Related papers (2023-05-31T01:09:28Z)
Intermittently Observable Markov Decision Processes [26.118176084782842]
We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. We develop two finite-state approximations to the tree MDP to find near-optimal policies efficiently.
arXiv Detail & Related papers (2023-02-23T03:38:03Z)
Approximate Information States for Worst-Case Control and Learning in Uncertain Systems [2.7282382992043885]
We consider a non-stochastic model, where disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate information state. We illustrate the application of our results in control and reinforcement learning using numerical examples.
arXiv Detail & Related papers (2023-01-12T15:36:36Z)
Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation [3.5092955099876266]
Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning. We propose entangled bisimulation, a bisimulation metric that allows the specification of the distance function between states. We show how entangled bisimulation can meaningfully improve over previous methods on the Distracting Control Suite (DCS)
arXiv Detail & Related papers (2022-01-28T18:06:06Z)
Preferential Temporal Difference Learning [53.81943554808216]
We propose an approach to re-weighting states used in TD updates, both when they are the input and when they provide the target for the update. We prove that our approach converges with linear function approximation and illustrate its desirable empirical behaviour compared to other TD-style methods.
arXiv Detail & Related papers (2021-06-11T17:05:15Z)
State estimation with limited sensors -- A deep learning based approach [0.0]
We propose a novel deep learning based state estimation framework that learns from sequential data. We illustrate that utilizing sequential data allows for state recovery from only one or two sensors.
arXiv Detail & Related papers (2021-01-27T16:14:59Z)
A New Bandit Setting Balancing Information from State Evolution and Corrupted Context [52.67844649650687]
We propose a new sequential decision-making setting combining key aspects of two established online learning problems with bandit feedback. The optimal action to play at any given moment is contingent on an underlying changing state which is not directly observable by the agent. We present an algorithm that uses a referee to dynamically combine the policies of a contextual bandit and a multi-armed bandit.
arXiv Detail & Related papers (2020-11-16T14:35:37Z)
InfoBot: Transfer and Exploration via the Information Bottleneck [105.28380750802019]
A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We propose to learn about decision states from prior experience. We find that this simple mechanism effectively identifies decision states, even in partially observed settings.
arXiv Detail & Related papers (2019-01-30T15:33:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.