Fixed-Budget Change Point Identification in Piecewise Constant Bandits
- URL: http://arxiv.org/abs/2501.12957v1
- Date: Wed, 22 Jan 2025 15:30:44 GMT
- Title: Fixed-Budget Change Point Identification in Piecewise Constant Bandits
- Authors: Joseph Lazzaro, Ciara Pike-Burke,
- Abstract summary: We provide the first non-asymptotic analysis of policies designed to locate abrupt changes in the mean reward function under bandit feedback.
We study the problem under a large and small budget regime, and for both settings establish lower bounds on the error probability.
We propose a regime adaptive algorithm which is near optimal for both small and large budgets simultaneously.
- Score: 4.373803477995854
- License:
- Abstract: We study the piecewise constant bandit problem where the expected reward is a piecewise constant function with one change point (discontinuity) across the action space $[0,1]$ and the learner's aim is to locate the change point. Under the assumption of a fixed exploration budget, we provide the first non-asymptotic analysis of policies designed to locate abrupt changes in the mean reward function under bandit feedback. We study the problem under a large and small budget regime, and for both settings establish lower bounds on the error probability and provide algorithms with near matching upper bounds. Interestingly, our results show a separation in the complexity of the two regimes. We then propose a regime adaptive algorithm which is near optimal for both small and large budgets simultaneously. We complement our theoretical analysis with experimental results in simulated environments to support our findings.
Related papers
- The Minimal Search Space for Conditional Causal Bandits [0.18124328823188351]
Causal knowledge can be used to support decision-making problems.
This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention.
We then propose an efficient algorithm with a time complexity of $O(|V| + |E|)$ to identify this minimal set of nodes.
arXiv Detail & Related papers (2025-02-10T15:45:18Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Decentralized Multi-Task Online Convex Optimization Under Random Link
Failures [5.513958040574729]
We develop a robust decentralized saddle-point algorithm against random link failures with heterogeneous probabilities.
We extend our algorithm and analysis to the two-point bandit feedback scenario.
arXiv Detail & Related papers (2024-01-04T00:57:33Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Bandit Pareto Set Identification: the Fixed Budget Setting [10.967572582187014]
We study a pure exploration problem in a multi-armed bandit model.
The goal is to identify the distributions whose mean is not uniformly worse than that of another distribution.
arXiv Detail & Related papers (2023-11-07T13:43:18Z) - Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related
Rewards [5.347237827669861]
We study the piecewise stationary semi-bandit problem with causally related rewards.
In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process.
The problem becomes aggravated in the semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms.
arXiv Detail & Related papers (2023-07-26T12:06:13Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Non-Stationary Bandits under Recharging Payoffs: Improved Planning with
Sublinear Regret [34.44347218903429]
The multi-armed bandit setting has been recently studied in the non-stationary regime.
Mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played.
We show how our algorithm can be transformed into a bandit algorithm with sublinear regret.
arXiv Detail & Related papers (2022-05-29T23:55:36Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.