Related papers: Multi-objective Good Arm Identification with Bandit Feedback

Multi-objective Good Arm Identification with Bandit Feedback

URL: http://arxiv.org/abs/2503.10386v2
Date: Fri, 14 Mar 2025 14:37:28 GMT
Title: Multi-objective Good Arm Identification with Bandit Feedback
Authors: Xuanke Jiang, Kohei Hatano, Eiji Takimoto,
Abstract summary: We consider a good arm identification problem in a bandit setting with multi-objectives.<n>For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $mathcalD_i_t$.<n>The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.
Score: 0.589889361990138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\in[K]$ is associated with a distribution $\mathcal{D}_i$ defined over $\mathbb{R}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\mathcal{D}_{i_t}$. The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\xi_1,\ldots,\xi_M$ with a confidence bound $\delta$ and an accuracy rate $\epsilon$ with a bounded sample complexity, the other is output $\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\epsilon = 0$, and we give novel bounds for $M > 1$ and $\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.

Related papers

Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality. We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits [55.957560311008926]
We propose a piecewise stationary linear bandit (PSLB) model where the quality of an arm is measured by its return averaged over all contexts. PS$varepsilon$BAI$+$ is guaranteed to identify an $varepsilon$-optimal arm with probability $ge 1-delta$ and with a minimal number of samples.
arXiv Detail & Related papers (2024-10-10T06:15:42Z)
Testing Closeness of Multivariate Distributions via Ramsey Theory [40.926523210945064]
We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions. Specifically, given sample access to two unknown distributions $mathbf p, mathbf q$ on $mathbb Rd$, we want to distinguish between the case that $mathbf p=mathbf q$ versus $|mathbf p-mathbf q|_A_k > epsilon$. Our main result is the first closeness tester for this problem with em sub-learning sample complexity in any fixed dimension.
arXiv Detail & Related papers (2023-11-22T04:34:09Z)
Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration of the multi-armed bandit (R-CPE-MAB) problem. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in $d$.
arXiv Detail & Related papers (2023-08-20T11:56:02Z)
Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries. Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$ We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z)
Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances [11.555138461092177]
We show that the worst-case sample complexity of this problem is $$Thetaleft( sum_i =1n fracsigma_i2epsilon2 lnfrac1delta + sum_i in Gm fracsigma_j2epsilon2 textEnt(sigma2_Gr) right),$$ where $Gm, Gl, Gr
arXiv Detail & Related papers (2022-04-11T16:41:33Z)
On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards. We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
arXiv Detail & Related papers (2022-02-13T10:54:52Z)
Variance-Dependent Best Arm Identification [12.058652922228918]
We study the problem of identifying the best arm in a multi-armed bandit game. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms. We show that our algorithm is optimal up to doubly logarithmic terms.
arXiv Detail & Related papers (2021-06-19T04:13:54Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z)
The Sparse Hausdorff Moment Problem, with Application to Topic Models [5.151973524974052]
We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables. It suffices to know the moments to additive accuracy $w_mincdotzetaO(k)$.
arXiv Detail & Related papers (2020-07-16T04:23:57Z)
An Optimal Elimination Algorithm for Learning a Best Arm [37.18327505953403]
We consider the classic problem of $(epsilon,delta)$-PAC learning a best arm. We propose a new approach for $(epsilon,delta)$-PAC learning a best arm.
arXiv Detail & Related papers (2020-06-20T20:21:33Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.