Related papers: Learning the Pareto Front Using Bootstrapped Observation Samples

Learning the Pareto Front Using Bootstrapped Observation Samples

URL: http://arxiv.org/abs/2306.00096v2
Date: Wed, 22 May 2024 20:13:30 GMT
Title: Learning the Pareto Front Using Bootstrapped Observation Samples
Authors: Wonyoung Kim, Garud Iyengar, Assaf Zeevi,
Abstract summary: We propose an algorithm to identify a set of arms with undominated mean reward vectors. The sample complexity of our proposed algorithm is optimal up to a logarithmic factor. Key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions.
Score: 17.519167857253404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider Pareto front identification (PFI) for linear bandits (PFILin), i.e., the goal is to identify a set of arms with undominated mean reward vectors when the mean reward vector is a linear function of the context. PFILin includes the best arm identification problem and multi-objective active learning as special cases. The sample complexity of our proposed algorithm is optimal up to a logarithmic factor. In addition, the regret incurred by our algorithm during the estimation is within a logarithmic factor of the optimal regret among all algorithms that identify the Pareto front. Our key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions -- in contrast to the conventional estimator that only updates the parameter estimate along the chosen context. This allows us to use low-regret arms to collect information about Pareto optimal arms. Our key innovation is to reuse the exploration samples multiple times; in contrast to conventional estimators that use each sample only once. Numerical experiments demonstrate that the proposed algorithm successfully identifies the Pareto front while controlling the regret.

Related papers

Constrained Pareto Set Identification with Bandit Feedback [10.967572582187014]
Given a $K$-armed bandit with unknown means, the goal is to identify the set of arms whose mean is not uniformly worse than that of another arm.<n>Our focus lies in fixed-confidence identification, for which we introduce an algorithm that significantly outperforms racing-like algorithms.
arXiv Detail & Related papers (2025-06-09T18:29:28Z)
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Predictive Coresets [0.0]
Conventional coreset approaches determine weights by minimizing the Kullback-Leibler divergence between the likelihood functions of the full and weighted datasets. We propose an alternative variational method which employs randomized posteriors and finds weights to match the unknown posterior predictive distributions conditioned on the full and reduced datasets. We evaluate the performance of the proposed coreset construction on diverse problems, including random partitions and density estimation.
arXiv Detail & Related papers (2025-02-08T23:57:43Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals. We treat the perturbations as random variables endowed with prior distribution functions. A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z)
Transfer Learning for Contextual Multi-armed Bandits [8.97013379960904]
We study the problem of transfer learning for non contextual multi-armed bandits under the coparametric shift model. A novel transfer learning algorithm that attains the minimax regret is proposed. A simulation study is carried out to illustrate the benefits of utilizing the data from the auxiliary source domains for learning in the target domain.
arXiv Detail & Related papers (2022-11-22T22:24:28Z)
Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Vector Optimization with Stochastic Bandit Feedback [10.66048003460524]
We introduce vector optimization problems with geometric bandit feedback. We consider $K$ designs, with multi-dimensional mean reward vectors, which are partially ordered according to a polyhedral ordering cone $C$.
arXiv Detail & Related papers (2021-10-23T22:38:54Z)
Towards Deterministic Diverse Subset Sampling [14.236193187116049]
In this paper, we discuss a greedy deterministic adaptation of k-DPP. We demonstrate the usefulness of the model on an image search task.
arXiv Detail & Related papers (2021-05-28T16:05:58Z)
Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data. For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.