Related papers: A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

URL: http://arxiv.org/abs/2301.13326v2
Date: Wed, 11 Oct 2023 23:58:25 GMT
Title: A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback
Authors: Guanyu Nie and Yididiya Y Nadew and Yanhui Zhu and Vaneet Aggarwal and Christopher John Quinn
Abstract summary: We provide a framework for adapting discrete offline approximation algorithms into sublinear $alpha$-regret methods. The proposed framework is applied to diverse applications in submodular horizon.
Score: 27.192028744078282
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$-regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$-regret dependence on the horizon $T$. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm -- the offline algorithm can be used as a black box subroutine. To demonstrate the utility of the proposed framework, the proposed framework is applied to diverse applications in submodular maximization. The new CMAB algorithms for submodular maximization with knapsack constraints outperform a full-bandit method developed for the adversarial setting in experiments with real-world data.

Related papers

Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback [27.613888121859393]
We study bi-criteria optimization for multi-armed bandits (CMAB) with bandit feedback. We propose a general framework that transforms discrete bi-linear offline approximation algorithms into online algorithms with sublinear regret and cumulative constraint violation guarantees. These applications highlight the framework's broad utility in adapting offline guarantees to online bi-criteria optimization under bandit feedback.
arXiv Detail & Related papers (2025-03-15T22:52:27Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization. We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution. Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Stochastic $k$-Submodular Bandits with Full Bandit Feedback [29.705337940879705]
We present the first sublinear $alpha$-regret bounds for online $k$-submodular optimization problems with full-bandit feedback. A key contribution of our work is analyzing the robustness of the algorithms.
arXiv Detail & Related papers (2024-12-14T05:02:53Z)
Safe and Efficient Online Convex Optimization with Linear Budget Constraints and Partial Feedback [3.5554907645160605]
This paper studies online convex optimization with unknown linear budget constraints. We propose a safe and efficient Lyapunov-optimization algorithm (SELO) that can achieve an $O(sqrtT)$ regret and zero cumulative constraint violation.
arXiv Detail & Related papers (2024-12-05T08:58:41Z)
Improved Parallel Algorithm for Non-Monotone Submodular Maximization under Knapsack Constraint [0.0]
This work proposes an efficient parallel algorithm for non-monomodular size under a knapsack constraint. Our algorithm improves the existing parallel one from $8+epsilon$ to $7+epsilon$ with $O(log n)$ adaptive complexity.
arXiv Detail & Related papers (2024-09-06T17:17:52Z)
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem. It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z)
Unified Projection-Free Algorithms for Adversarial DR-Submodular Optimization [28.598226670015315]
This paper introduces unified projection-free Frank-Wolfe type algorithms for adversarial DR-submodular optimization. For every problem considered in the non-monotone setting, the proposed algorithms are either the first with proven sub-linear $alpha$-regret bounds or have better $alpha$-regret bounds than the state of the art.
arXiv Detail & Related papers (2024-03-15T07:05:44Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z)
Bandit algorithms to emulate human decision making using probabilistic distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.