Prospective Side Information for Latent MDPs
- URL: http://arxiv.org/abs/2310.07596v1
- Date: Wed, 11 Oct 2023 15:37:31 GMT
- Title: Prospective Side Information for Latent MDPs
- Authors: Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis
- Abstract summary: We study the class of LMDPs with em prospective side information, when an agent receives additional, weakly revealing, information at the beginning of each episode.
We show that, surprisingly, this problem is not captured by contemporary settings and algorithms designed for partially observed environments.
We then establish that any sample efficient algorithm must suffer at least $Omega(K2/3)$-regret, as opposed to standard $Omega(sqrtK)$ lower bounds, and design an algorithm with a matching upper bound.
- Score: 80.00842638151558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many interactive decision-making settings, there is latent and unobserved
information that remains fixed. Consider, for example, a dialogue system, where
complete information about a user, such as the user's preferences, is not
given. In such an environment, the latent information remains fixed throughout
each episode, since the identity of the user does not change during an
interaction. This type of environment can be modeled as a Latent Markov
Decision Process (LMDP), a special instance of Partially Observed Markov
Decision Processes (POMDPs). Previous work established exponential lower bounds
in the number of latent contexts for the LMDP class. This puts forward a
question: under which natural assumptions a near-optimal policy of an LMDP can
be efficiently learned? In this work, we study the class of LMDPs with {\em
prospective side information}, when an agent receives additional, weakly
revealing, information on the latent context at the beginning of each episode.
We show that, surprisingly, this problem is not captured by contemporary
settings and algorithms designed for partially observed environments. We then
establish that any sample efficient algorithm must suffer at least
$\Omega(K^{2/3})$-regret, as opposed to standard $\Omega(\sqrt{K})$ lower
bounds, and design an algorithm with a matching upper bound.
Related papers
- Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs [63.47351876442425]
We study episodic linear mixture MDPs with the unknown transition and adversarial rewards under full-information feedback.
We propose a novel algorithm that combines the benefits of two popular methods: occupancy-measure-based and policy-based.
Our algorithm enjoys an $widetildemathcalO(d sqrtH3 K + sqrtHK(H + barP_K$)$ dynamic regret, where $d$ is the feature dimension.
arXiv Detail & Related papers (2024-11-05T13:55:52Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation [73.2390735383842]
We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions.
We show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm.
These results can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.
arXiv Detail & Related papers (2024-06-03T14:51:27Z) - User-Level Differential Privacy With Few Examples Per User [73.81862394073308]
We consider the example-scarce regime, where each user has only a few examples, and obtain the following results.
For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm.
We present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting.
arXiv Detail & Related papers (2023-09-21T21:51:55Z) - Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes.
We propose an algorithm based on Thompson sampling with dynamically-sized episodes.
We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z) - ChronosPerseus: Randomized Point-based Value Iteration with Importance
Sampling for POSMDPs [2.3204178451683264]
In reinforcement learning, agents have successfully used environments modeled with Markov decision processes (MDPs)
In many problem domains, an agent may suffer from noisy observations or random times until its subsequent decision.
We propose that partially observable semi-Markov decision processes (POSMDPs) can be helpful in dealing with the unknown time aspect.
arXiv Detail & Related papers (2022-07-16T03:31:47Z) - BATS: Best Action Trajectory Stitching [22.75880303352508]
We introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset.
We prove that this property allows one to make upper and lower bounds on the value function up to appropriate distance metrics.
We show an example in which simply behavior cloning the optimal policy of the MDP created by our algorithm avoids this problem.
arXiv Detail & Related papers (2022-04-26T01:48:32Z) - RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP)
In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent.
We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.