Adaptive Algorithms for Relaxed Pareto Set Identification
- URL: http://arxiv.org/abs/2307.00424v2
- Date: Fri, 3 Nov 2023 15:28:31 GMT
- Title: Adaptive Algorithms for Relaxed Pareto Set Identification
- Authors: Cyrille Kone, Emilie Kaufmann, Laura Richert
- Abstract summary: We revisit the fixed-confidence identification of the Pareto optimal set in a multi-objective multi-armed bandit model.
We propose a single sampling strategy, called Adaptive Pareto Exploration, that can be used in conjunction with different stopping rules.
We analyze the sample complexity of these different combinations, quantifying in particular the reduction in sample complexity.
- Score: 12.326452468513228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we revisit the fixed-confidence identification of the Pareto
optimal set in a multi-objective multi-armed bandit model. As the sample
complexity to identify the exact Pareto set can be very large, a relaxation
allowing to output some additional near-optimal arms has been studied. In this
work we also tackle alternative relaxations that allow instead to identify a
relevant subset of the Pareto set. Notably, we propose a single sampling
strategy, called Adaptive Pareto Exploration, that can be used in conjunction
with different stopping rules to take into account different relaxations of the
Pareto Set Identification problem. We analyze the sample complexity of these
different combinations, quantifying in particular the reduction in sample
complexity that occurs when one seeks to identify at most $k$ Pareto optimal
arms. We showcase the good practical performance of Adaptive Pareto Exploration
on a real-world scenario, in which we adaptively explore several vaccination
strategies against Covid-19 in order to find the optimal ones when multiple
immunogenicity criteria are taken into account.
Related papers
- Adaptive Combinatorial Experimental Design: Pareto Optimality for Decision-Making and Inference [2.9618272039677667]
We provide the first investigation into adaptive experimental design, focusing on the trade-off between regret and statistical power in multi-armed bandits (CMAB)<n>We propose two algorithms MixCombKL and MixCombUCB respectively for two relevant cases under different information structures.
arXiv Detail & Related papers (2026-02-27T17:58:37Z) - Almost Asymptotically Optimal Active Clustering Through Pairwise Observations [59.20614082241528]
We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses.<n>We establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the accuracy of the clustering.<n>We develop a computationally feasible variant of the Generalized Likelihood Ratio statistic and show that its performance gap to the lower bound can be accurately empirically estimated.
arXiv Detail & Related papers (2026-02-05T14:16:47Z) - Multi-Armed Sampling Problem and the End of Exploration [14.824891788575417]
This paper introduces the framework of multi-armed sampling, as the sampling counterpart to the optimization problem of multi-arm bandits.<n>We propose an algorithm that achieves plausible notions of regret for this framework and establish corresponding lower bounds.<n>Our work sheds light on the need for exploration and the convergence properties of algorithm for entropy-regularized reinforcement learning.
arXiv Detail & Related papers (2025-07-14T20:50:51Z) - Constrained Pareto Set Identification with Bandit Feedback [10.967572582187014]
Given a $K$-armed bandit with unknown means, the goal is to identify the set of arms whose mean is not uniformly worse than that of another arm.<n>Our focus lies in fixed-confidence identification, for which we introduce an algorithm that significantly outperforms racing-like algorithms.
arXiv Detail & Related papers (2025-06-09T18:29:28Z) - Adaptive Exploration for Multi-Reward Multi-Policy Evaluation [26.03922159496432]
policy evaluation problem in online multi-reward multi-policy discounted setting.
We adopt an $epsilon$-accurate estimates perspective to achieve $epsilon$accurate estimates across finite or convex sets of rewards.
arXiv Detail & Related papers (2025-02-04T17:35:51Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Learning the Pareto Front Using Bootstrapped Observation Samples [17.519167857253404]
We propose an algorithm to identify a set of arms with undominated mean reward vectors.
The sample complexity of our proposed algorithm is optimal up to a logarithmic factor.
Key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions.
arXiv Detail & Related papers (2023-05-31T18:15:09Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment.
Standard sampling-based algorithms can be used to search for solutions in the robots' joint space.
We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods.
We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z) - Pareto Regret Analyses in Multi-objective Multi-armed Bandit [22.17126026244685]
We study optimality in multi-objective multi-armed bandit.
We present new algorithms assuming both with and without prior information of the multi-objective multi-armed bandit setting.
The algorithms are shown optimal in adversarial settings and nearly optimal up to a logarithmic factor in settings simultaneously.
arXiv Detail & Related papers (2022-12-01T21:44:27Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Pareto Set Learning for Expensive Multi-Objective Optimization [5.419608513284392]
Expensive multi-objective optimization problems can be found in many real-world applications.
This paper develops a novel learning-based method to approximate the whole Pareto set for MOBO.
arXiv Detail & Related papers (2022-10-16T09:41:54Z) - Pareto Navigation Gradient Descent: a First-Order Algorithm for
Optimization in Pareto Set [17.617944390196286]
Modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions.
We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information.
arXiv Detail & Related papers (2021-10-17T04:07:04Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.