Combinatorial Pure Exploration with Bottleneck Reward Function and its
Extension to General Reward Functions
- URL: http://arxiv.org/abs/2102.12094v1
- Date: Wed, 24 Feb 2021 06:47:51 GMT
- Title: Combinatorial Pure Exploration with Bottleneck Reward Function and its
Extension to General Reward Functions
- Authors: Yihan Du, Yuko Kuroki, Wei Chen
- Abstract summary: We study the Combinatorial Pure Exploration problem with the bottleneck reward function (CPE-B) under the fixed-confidence and fixed-budget settings.
We present both fixed-confidence and fixed-budget algorithms, and provide the sample complexity lower bound for the fixed-confidence setting.
In addition, we extend CPE-B to general reward functions (CPE-G) and propose the first fixed-confidence algorithm for general non-linear reward functions with non-trivial sample complexity.
- Score: 13.982295536546728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the Combinatorial Pure Exploration problem with the
bottleneck reward function (CPE-B) under the fixed-confidence and fixed-budget
settings. In CPE-B, given a set of base arms and a collection of subsets of
base arms (super arms) following certain combinatorial constraint, a learner
sequentially plays (samples) a base arm and observes its random outcome, with
the objective of finding the optimal super arm that maximizes its bottleneck
value, defined as the minimum expected value among the base arms contained in
the super arm. CPE-B captures a variety of practical scenarios such as network
routing in communication networks, but it cannot be solved by the existing CPE
algorithms since most of them assumed linear reward functions. For CPE-B, we
present both fixed-confidence and fixed-budget algorithms, and provide the
sample complexity lower bound for the fixed-confidence setting, which implies
that our algorithms match the lower bound (within a logarithmic factor) for a
broad family of instances. In addition, we extend CPE-B to general reward
functions (CPE-G) and propose the first fixed-confidence algorithm for general
non-linear reward functions with non-trivial sample complexity. Our
experimental results on the top-$k$, path and matching instances demonstrate
the empirical superiority of our proposed algorithms over the baselines.
Related papers
- Query-Efficient Correlation Clustering with Noisy Oracle [17.11782578276788]
We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB)
We design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation and study their theoretical guarantees.
Our results are the first examples of clustering-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.
arXiv Detail & Related papers (2024-02-02T13:31:24Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Contextual Combinatorial Volatile Bandits via Gaussian Processes [10.312968200748116]
We consider a contextual bandit problem with a set of available base arms and their contexts.
We propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB)
We experimentally show that both algorithms vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.
arXiv Detail & Related papers (2021-10-05T18:02:10Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z) - Combinatorial Pure Exploration with Full-Bandit or Partial Linear
Feedback [18.29738891417779]
We first study the problem of pure exploration with full-bandit feedback (CPE-BL)
In CPE-BL, each pull of action $x$ reports a random feedback vector with expectation of $M_x theta $, where $M_x in mathbbRd$ is a transformation matrix for $x$, and gains a random (possibly nonlinear) reward related to $x$.
For CPE-PL, we develop the first emtime algorithm, which simultaneously addresses limited feedback, general reward function and action space.
arXiv Detail & Related papers (2020-06-14T13:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.