Efficient and Interpretable Bandit Algorithms
- URL: http://arxiv.org/abs/2310.14751v2
- Date: Thu, 8 Feb 2024 22:37:36 GMT
- Title: Efficient and Interpretable Bandit Algorithms
- Authors: Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton
- Abstract summary: A bandit algorithm is interpretable if it explores with the objective of reducing uncertainty in the unknown model parameter.
We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.
- Score: 18.99853072645046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by the importance of explainability in modern machine learning, we
design bandit algorithms that are efficient and interpretable. A bandit
algorithm is interpretable if it explores with the objective of reducing
uncertainty in the unknown model parameter. To quantify the interpretability,
we introduce a novel metric of model error, which compares the rate reduction
of the mean reward estimates to their actual means among all the plausible
actions. We propose CODE, a bandit algorithm based on a Constrained Optimal
DEsign, that is interpretable and maximally reduces the uncertainty. The key
idea in CODE is to explore among all plausible actions, determined by a
statistical constraint, to achieve interpretability. We implement CODE
efficiently in both multi-armed and linear bandits and derive near-optimal
regret bounds by leveraging the optimality criteria of the approximate optimal
design. CODE can be also viewed as removing phases in conventional phased
elimination, which makes it more practical and general. We demonstrate the
advantage of CODE by numerical experiments on both synthetic and real-world
problems. CODE outperforms other state-of-the-art interpretable designs while
matching the performance of popular but uninterpretable designs, such as upper
confidence bound algorithms.
Related papers
- Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Revisiting and Advancing Fast Adversarial Training Through The Lens of
Bi-Level Optimization [60.72410937614299]
We propose a new tractable bi-level optimization problem, design and analyze a new set of algorithms termed Bi-level AT (FAST-BAT)
FAST-BAT is capable of defending sign-based projected descent (PGD) attacks without calling any gradient sign method and explicit robust regularization.
arXiv Detail & Related papers (2021-12-23T06:25:36Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - Adaptive Discretization for Model-Based Reinforcement Learning [10.21634042036049]
We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm.
Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space.
arXiv Detail & Related papers (2020-07-01T19:36:46Z) - Interpretable Random Forests via Rule Extraction [0.0]
We introduce SIRUS, a stable rule learning algorithm which takes the form of a short and simple list of rules.
Our R/C++ software implementation sirus is available from CRAN.
arXiv Detail & Related papers (2020-04-29T08:13:35Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.