Vector Optimization with Stochastic Bandit Feedback
- URL: http://arxiv.org/abs/2110.12311v1
- Date: Sat, 23 Oct 2021 22:38:54 GMT
- Title: Vector Optimization with Stochastic Bandit Feedback
- Authors: \c{C}a\u{g}{\i}n Ararat, Cem Tekin
- Abstract summary: We introduce vector optimization problems with geometric bandit feedback.
We consider $K$ designs, with multi-dimensional mean reward vectors, which are partially ordered according to a polyhedral ordering cone $C$.
- Score: 10.66048003460524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce vector optimization problems with stochastic bandit feedback,
which extends the best arm identification problem to vector-valued rewards. We
consider $K$ designs, with multi-dimensional mean reward vectors, which are
partially ordered according to a polyhedral ordering cone $C$. This generalizes
the concept of Pareto set in multi-objective optimization and allows different
sets of preferences of decision-makers to be encoded by $C$. Different than
prior work, we define approximations of the Pareto set based on direction-free
covering and gap notions. We study the setting where an evaluation of each
design yields a noisy observation of the mean reward vector. Under subgaussian
noise assumption, we investigate the sample complexity of the na\"ive
elimination algorithm in an ($\epsilon,\delta$)-PAC setting, where the goal is
to identify an ($\epsilon,\delta$)-PAC Pareto set with the minimum number of
design evaluations. In particular, we identify cone-dependent geometric
conditions on the deviations of empirical reward vectors from their mean under
which the Pareto front can be approximated accurately. We run experiments to
verify our theoretical results and illustrate how $C$ and sampling budget
affect the Pareto set, returned ($\epsilon,\delta$)-PAC Pareto set and the
success of identification.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Faster Convergence with Multiway Preferences [99.68922143784306]
We consider the sign-function-based comparison feedback model and analyze the convergence rates with batched and multiway comparisons.
Our work is the first to study the problem of convex optimization with multiway preferences and analyze the optimal convergence rates.
arXiv Detail & Related papers (2023-12-19T01:52:13Z) - Bandit Pareto Set Identification: the Fixed Budget Setting [12.326452468513228]
We study a pure exploration problem in a multi-armed bandit model.
The goal is to identify the distributions whose mean is not uniformly worse than that of another distribution.
arXiv Detail & Related papers (2023-11-07T13:43:18Z) - Learning the Pareto Front Using Bootstrapped Observation Samples [17.519167857253404]
We propose an algorithm to identify a set of arms with undominated mean reward vectors.
The sample complexity of our proposed algorithm is optimal up to a logarithmic factor.
Key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions.
arXiv Detail & Related papers (2023-05-31T18:15:09Z) - Learning in Inverse Optimization: Incenter Cost, Augmented Suboptimality
Loss, and Algorithms [4.0295993947651025]
We introduce the "incenter" concept, a new notion akin to circumcenter recently proposed by Besbes et al.
We propose a novel loss function called Augmented Suboptimality Loss (ASL), a relaxation of the incenter concept for problems with inconsistent data.
This algorithm combines approximate subgradient evaluations, together with mirror descent update steps, which is provably efficient for the IO problems with discrete feasible sets with high cardinality.
arXiv Detail & Related papers (2023-05-12T18:58:08Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Robust Subset Selection by Greedy and Evolutionary Pareto Optimization [23.0838604893412]
Subset selection aims to select a subset from a ground set to maximize some objective function.
We show that a greedy algorithm can obtain an approximation ratio of $1-e-betagamma$, where $beta$ and $gamma$ are the correlation and submodularity ratios of the objective functions.
arXiv Detail & Related papers (2022-05-03T11:00:54Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Pareto Active Learning with Gaussian Processes and Adaptive
Discretization [12.179548969182573]
We propose an algorithm that exploits the smoothness of the GP-sampled function and the structure of $(cal X,d)$ to learn fast.
In essence, Adaptive $boldsymbolepsilon$-PAL employs a tree-based adaptive discretization technique to identify an $boldsymbolepsilon$-accurate Pareto set of designs.
arXiv Detail & Related papers (2020-06-24T21:27:27Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.