Related papers: Vector Optimization with Stochastic Bandit Feedback

Vector Optimization with Stochastic Bandit Feedback

URL: http://arxiv.org/abs/2110.12311v1
Date: Sat, 23 Oct 2021 22:38:54 GMT
Title: Vector Optimization with Stochastic Bandit Feedback
Authors: \c{C}a\u{g}{\i}n Ararat, Cem Tekin
Abstract summary: We introduce vector optimization problems with geometric bandit feedback. We consider $K$ designs, with multi-dimensional mean reward vectors, which are partially ordered according to a polyhedral ordering cone $C$.
Score: 10.66048003460524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce vector optimization problems with stochastic bandit feedback, which extends the best arm identification problem to vector-valued rewards. We consider $K$ designs, with multi-dimensional mean reward vectors, which are partially ordered according to a polyhedral ordering cone $C$. This generalizes the concept of Pareto set in multi-objective optimization and allows different sets of preferences of decision-makers to be encoded by $C$. Different than prior work, we define approximations of the Pareto set based on direction-free covering and gap notions. We study the setting where an evaluation of each design yields a noisy observation of the mean reward vector. Under subgaussian noise assumption, we investigate the sample complexity of the na\"ive elimination algorithm in an ($\epsilon,\delta$)-PAC setting, where the goal is to identify an ($\epsilon,\delta$)-PAC Pareto set with the minimum number of design evaluations. In particular, we identify cone-dependent geometric conditions on the deviations of empirical reward vectors from their mean under which the Pareto front can be approximated accurately. We run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, returned ($\epsilon,\delta$)-PAC Pareto set and the success of identification.

Related papers

Preference-based Pure Exploration [9.93543990054216]
We study the preference-based pure exploration problem for bandits with vector-valued rewards. We derive a novel lower bound on sample complexity for identifying the most preferred policy. We then provide a convex relaxation of the lower bound and leverage it to design the Preference-based Track and Stop algorithm.
arXiv Detail & Related papers (2024-12-04T03:02:55Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Faster Convergence with Multiway Preferences [99.68922143784306]
We consider the sign-function-based comparison feedback model and analyze the convergence rates with batched and multiway comparisons. Our work is the first to study the problem of convex optimization with multiway preferences and analyze the optimal convergence rates.
arXiv Detail & Related papers (2023-12-19T01:52:13Z)
Bandit Pareto Set Identification: the Fixed Budget Setting [12.326452468513228]
We study a pure exploration problem in a multi-armed bandit model. The goal is to identify the distributions whose mean is not uniformly worse than that of another distribution.
arXiv Detail & Related papers (2023-11-07T13:43:18Z)
Learning the Pareto Front Using Bootstrapped Observation Samples [17.519167857253404]
We propose an algorithm to identify a set of arms with undominated mean reward vectors. The sample complexity of our proposed algorithm is optimal up to a logarithmic factor. Key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions.
arXiv Detail & Related papers (2023-05-31T18:15:09Z)
Learning in Inverse Optimization: Incenter Cost, Augmented Suboptimality Loss, and Algorithms [4.0295993947651025]
We introduce the "incenter" concept, a new notion akin to circumcenter recently proposed by Besbes et al. We propose a novel loss function called Augmented Suboptimality Loss (ASL), a relaxation of the incenter concept for problems with inconsistent data. This algorithm combines approximate subgradient evaluations, together with mirror descent update steps, which is provably efficient for the IO problems with discrete feasible sets with high cardinality.
arXiv Detail & Related papers (2023-05-12T18:58:08Z)
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z)
Robust Subset Selection by Greedy and Evolutionary Pareto Optimization [23.0838604893412]
Subset selection aims to select a subset from a ground set to maximize some objective function. We show that a greedy algorithm can obtain an approximation ratio of $1-e-betagamma$, where $beta$ and $gamma$ are the correlation and submodularity ratios of the objective functions.
arXiv Detail & Related papers (2022-05-03T11:00:54Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Pareto Active Learning with Gaussian Processes and Adaptive Discretization [12.179548969182573]
We propose an algorithm that exploits the smoothness of the GP-sampled function and the structure of $(cal X,d)$ to learn fast. In essence, Adaptive $boldsymbolepsilon$-PAL employs a tree-based adaptive discretization technique to identify an $boldsymbolepsilon$-accurate Pareto set of designs.
arXiv Detail & Related papers (2020-06-24T21:27:27Z)
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization. We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.