Cost-Aware Optimal Pairwise Pure Exploration
- URL: http://arxiv.org/abs/2503.07877v1
- Date: Mon, 10 Mar 2025 21:50:35 GMT
- Title: Cost-Aware Optimal Pairwise Pure Exploration
- Authors: Di Wu, Chengshuai Shi, Ruida Zhou, Cong Shen,
- Abstract summary: This work introduces a versatile framework to study pure exploration, with a focus on identifying the pairwise relationships between targeted arm pairs.<n>Under the general framework of pairwise pure exploration with arm-specific costs, a performance lower bound is derived.<n>A novel algorithm, termed CAET (Cost-Aware Pairwise Exploration Task), is proposed to handle the arm-specific costs.
- Score: 20.624054807637425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pure exploration is one of the fundamental problems in multi-armed bandits (MAB). However, existing works mostly focus on specific pure exploration tasks, without a holistic view of the general pure exploration problem. This work fills this gap by introducing a versatile framework to study pure exploration, with a focus on identifying the pairwise relationships between targeted arm pairs. Moreover, unlike existing works that only optimize the stopping time (i.e., sample complexity), this work considers that arms are associated with potentially different costs and targets at optimizing the cumulative cost that occurred during learning. Under the general framework of pairwise pure exploration with arm-specific costs, a performance lower bound is derived. Then, a novel algorithm, termed CAET (Cost-Aware Pairwise Exploration Task), is proposed. CAET builds on the track-and-stop principle with a novel design to handle the arm-specific costs, which can potentially be zero and thus represent a very challenging case. Theoretical analyses prove that the performance of CAET approaches the lower bound asymptotically. Special cases are further discussed, including an extension to regret minimization, which is another major focus of MAB. The effectiveness and efficiency of CAET are also verified through experimental results under various settings.
Related papers
- Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.
We show that the widely used beam search method suffers from unacceptable over-optimism.
We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Cost Aware Best Arm Identification [13.380383930882784]
We call it emphCost Aware Best Arm Identification (CABAI)
We propose a simple algorithm called emphChernoff Overlap (CO), based on a square-root rule.
Our results show that (i.e. ignoring the heterogeneous action cost results in sub-optimality in practice, and (ii.) simple algorithms can deliver near-optimal performance over a wide range of problems.
arXiv Detail & Related papers (2024-02-26T16:27:08Z) - TrackFlow: Multi-Object Tracking with Normalizing Flows [36.86830078167583]
We aim at extending tracking-by-detection to multi-modal settings.
A rough estimate of 3D information is also available and must be merged with other traditional metrics.
Our approach consistently enhances the performance of several tracking-by-detection algorithms.
arXiv Detail & Related papers (2023-08-22T15:40:03Z) - Multi-task Learning of Order-Consistent Causal Graphs [59.9575145128345]
We consider the problem of discovering $K related Gaussian acyclic graphs (DAGs)
Under multi-task learning setting, we propose a $l_1/l$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models.
We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order.
arXiv Detail & Related papers (2021-11-03T22:10:18Z) - Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits [91.8283876874947]
We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm.
We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives.
We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
arXiv Detail & Related papers (2021-10-16T17:52:32Z) - Combinatorial Pure Exploration with Full-bandit Feedback and Beyond:
Solving Combinatorial Optimization under Uncertainty with Limited Observation [70.41056265629815]
When developing an algorithm for optimization, it is commonly assumed that parameters such as edge weights are exactly known as inputs.
In this article, we review recently proposed techniques for pure exploration problems with limited feedback.
arXiv Detail & Related papers (2020-12-31T12:40:52Z) - Present-Biased Optimization [8.775878711928591]
The paper extends the framework proposed by Akerlof (1991) for studying various aspects of human behavior related to time-inconsistent planning.
We show that the ratio between the cost of the solutions computed by present-biased agents and the cost of the optimal solutions may differ significantly depending on the problem constraints.
arXiv Detail & Related papers (2020-12-29T12:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.