Regret Bounds for Information-Directed Reinforcement Learning
- URL: http://arxiv.org/abs/2206.04640v1
- Date: Thu, 9 Jun 2022 17:36:17 GMT
- Title: Regret Bounds for Information-Directed Reinforcement Learning
- Authors: Botao Hao and Tor Lattimore
- Abstract summary: Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL)
We develop novel information-theoretic tools to bound the information ratio and cumulative information gain about the learning target.
- Score: 40.783225558237746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information-directed sampling (IDS) has revealed its potential as a
data-efficient algorithm for reinforcement learning (RL). However, theoretical
understanding of IDS for Markov Decision Processes (MDPs) is still limited. We
develop novel information-theoretic tools to bound the information ratio and
cumulative information gain about the learning target. Our theoretical results
shed light on the importance of choosing the learning target such that the
practitioners can balance the computation and regret bounds. As a consequence,
we derive prior-free Bayesian regret bounds for vanilla-IDS which learns the
whole environment under tabular finite-horizon MDPs. In addition, we propose a
computationally-efficient regularized-IDS that maximizes an additive form
rather than the ratio form and show that it enjoys the same regret bound as
vanilla-IDS. With the aid of rate-distortion theory, we improve the regret
bound by learning a surrogate, less informative environment. Furthermore, we
extend our analysis to linear MDPs and prove similar regret bounds for Thompson
sampling as a by-product.
Related papers
- Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Information Directed Sampling for Sparse Linear Bandits [42.232086950768476]
We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances.
Numerical results demonstrate significant regret reductions by sparse IDS relative to several baselines.
arXiv Detail & Related papers (2021-05-29T10:26:23Z) - Perturbation Theory for the Information Bottleneck [6.117084972237769]
Information bottleneck (IB) method formalizes extracting relevant information from data.
nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general.
We derive a perturbation theory for the IB method and report the first complete characterization of the learning onset.
arXiv Detail & Related papers (2021-05-28T16:59:01Z) - Bounding Information Leakage in Machine Learning [26.64770573405079]
This paper investigates fundamental bounds on information leakage.
We identify and bound the success rate of the worst-case membership inference attack.
We derive bounds on the mutual information between the sensitive attributes and model parameters.
arXiv Detail & Related papers (2021-05-09T08:49:14Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.