A Framework for Adapting Offline Algorithms to Solve Combinatorial
Multi-Armed Bandit Problems with Bandit Feedback
- URL: http://arxiv.org/abs/2301.13326v2
- Date: Wed, 11 Oct 2023 23:58:25 GMT
- Title: A Framework for Adapting Offline Algorithms to Solve Combinatorial
Multi-Armed Bandit Problems with Bandit Feedback
- Authors: Guanyu Nie and Yididiya Y Nadew and Yanhui Zhu and Vaneet Aggarwal and
Christopher John Quinn
- Abstract summary: We provide a framework for adapting discrete offline approximation algorithms into sublinear $alpha$-regret methods.
The proposed framework is applied to diverse applications in submodular horizon.
- Score: 27.192028744078282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the problem of stochastic, combinatorial multi-armed bandits
where the learner only has access to bandit feedback and the reward function
can be non-linear. We provide a general framework for adapting discrete offline
approximation algorithms into sublinear $\alpha$-regret methods that only
require bandit feedback, achieving
$\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative
$\alpha$-regret dependence on the horizon $T$. The framework only requires the
offline algorithms to be robust to small errors in function evaluation. The
adaptation procedure does not even require explicit knowledge of the offline
approximation algorithm -- the offline algorithm can be used as a black box
subroutine. To demonstrate the utility of the proposed framework, the proposed
framework is applied to diverse applications in submodular maximization. The
new CMAB algorithms for submodular maximization with knapsack constraints
outperform a full-bandit method developed for the adversarial setting in
experiments with real-world data.
Related papers
- Improved Parallel Algorithm for Non-Monotone Submodular Maximization under Knapsack Constraint [0.0]
This work proposes an efficient parallel algorithm for non-monomodular size under a knapsack constraint.
Our algorithm improves the existing parallel one from $8+epsilon$ to $7+epsilon$ with $O(log n)$ adaptive complexity.
arXiv Detail & Related papers (2024-09-06T17:17:52Z) - Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Unified Projection-Free Algorithms for Adversarial DR-Submodular Optimization [28.598226670015315]
This paper introduces unified projection-free Frank-Wolfe type algorithms for adversarial DR-submodular optimization.
For every problem considered in the non-monotone setting, the proposed algorithms are either the first with proven sub-linear $alpha$-regret bounds or have better $alpha$-regret bounds than the state of the art.
arXiv Detail & Related papers (2024-03-15T07:05:44Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.