Linear Partial Monitoring for Sequential Decision-Making: Algorithms,
Regret Bounds and Applications
- URL: http://arxiv.org/abs/2302.03683v2
- Date: Mon, 13 Nov 2023 19:52:04 GMT
- Title: Linear Partial Monitoring for Sequential Decision-Making: Algorithms,
Regret Bounds and Applications
- Authors: Johannes Kirschner, Tor Lattimore, Andreas Krause
- Abstract summary: Partial monitoring is an expressive framework for sequential decision-making.
We present a simple and unified analysis of partial monitoring, and further extend the model to the contextual and kernelized setting.
- Score: 70.67112733968654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partial monitoring is an expressive framework for sequential decision-making
with an abundance of applications, including graph-structured and dueling
bandits, dynamic pricing and transductive feedback models. We survey and extend
recent results on the linear formulation of partial monitoring that naturally
generalizes the standard linear bandit setting. The main result is that a
single algorithm, information-directed sampling (IDS), is (nearly) worst-case
rate optimal in all finite-action games. We present a simple and unified
analysis of stochastic partial monitoring, and further extend the model to the
contextual and kernelized setting.
Related papers
- Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Causal Temporal Regime Structure Learning [49.77103348208835]
We introduce a new optimization-based method (linear) that concurrently learns the Directed Acyclic Graph (DAG) for each regime.
We conduct extensive experiments and show that our method consistently outperforms causal discovery models across various settings.
arXiv Detail & Related papers (2023-11-02T17:26:49Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - On the Optimization Landscape of Dynamic Output Feedback: A Case Study
for Linear Quadratic Regulator [12.255864026960403]
We show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller.
These results shed light on designing efficient algorithms for general decision-making problems with partially observed information.
arXiv Detail & Related papers (2022-09-12T06:43:35Z) - Globally Convergent Policy Search over Dynamic Filters for Output
Estimation [64.90951294952094]
We introduce the first direct policy search algorithm convex which converges to the globally optimal $textitdynamic$ filter.
We show that informativity overcomes the aforementioned degeneracy.
arXiv Detail & Related papers (2022-02-23T18:06:20Z) - Derivative-Free Policy Optimization for Risk-Sensitive and Robust
Control Design: Implicit Regularization and Sample Complexity [15.940861063732608]
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL)
We investigate the convergence theory of policy robustness (PG) methods for the linear risk-sensitive and robust controller.
One feature of our algorithms is that during the learning phase, a certain level complexity/risk-sensitivity controller is preserved.
arXiv Detail & Related papers (2021-01-04T16:00:46Z) - Experiments in Extractive Summarization: Integer Linear Programming,
Term/Sentence Scoring, and Title-driven Models [1.3286165491120467]
We describe a new framework, NewsSumm, that includes many existing and new approaches for summarization including ILP and title-driven approaches.
We show that the new title-driven reduction idea leads to improvement in performance for both unsupervised and supervised approaches considered.
arXiv Detail & Related papers (2020-08-01T01:05:55Z) - A Short Note on Analyzing Sequence Complexity in Trajectory Prediction
Benchmarks [8.870188183999852]
An approach for determining a dataset representation in terms of a small set of distinguishable sub-sequences is proposed.
A first proof of concept on synthetically generated and real-world datasets shows the viability of the approach.
arXiv Detail & Related papers (2020-03-27T11:44:11Z) - Information Directed Sampling for Linear Partial Monitoring [112.05623123909895]
We introduce information directed sampling (IDS) for partial monitoring with a linear reward and observation structure.
IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game.
We extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.
arXiv Detail & Related papers (2020-02-25T21:30:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.