Comparator-Adaptive $Φ$-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games
- URL: http://arxiv.org/abs/2505.17277v1
- Date: Thu, 22 May 2025 20:45:47 GMT
- Title: Comparator-Adaptive $Φ$-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games
- Authors: Soumita Hait, Ping Li, Haipeng Luo, Mengxiao Zhang,
- Abstract summary: A recent work by Lu et al., [2025] introduces an adaptive algorithm whose regret against a comparator $phi$ depends on a certain sparsity-based complexity measure of $phi$.<n>In this work, we propose a general idea to achieve an even better comparator-adaptive $Phi$-regret bound via much simpler algorithms.
- Score: 43.12477663757647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the classic expert problem, $\Phi$-regret measures the gap between the learner's total loss and that achieved by applying the best action transformation $\phi \in \Phi$. A recent work by Lu et al., [2025] introduces an adaptive algorithm whose regret against a comparator $\phi$ depends on a certain sparsity-based complexity measure of $\phi$, (almost) recovering and interpolating optimal bounds for standard regret notions such as external, internal, and swap regret. In this work, we propose a general idea to achieve an even better comparator-adaptive $\Phi$-regret bound via much simpler algorithms compared to Lu et al., [2025]. Specifically, we discover a prior distribution over all possible binary transformations and show that it suffices to achieve prior-dependent regret against these transformations. Then, we propose two concrete and efficient algorithms to achieve so, where the first one learns over multiple copies of a prior-aware variant of the Kernelized MWU algorithm of Farina et al., [2022], and the second one learns over multiple copies of a prior-aware variant of the BM-reduction [Blum and Mansour, 2007]. To further showcase the power of our methods and the advantages over Lu et al., [2025] besides the simplicity and better regret bounds, we also show that our second approach can be extended to the game setting to achieve accelerated and adaptive convergence rate to $\Phi$-equilibria for a class of general-sum games. When specified to the special case of correlated equilibria, our bound improves over the existing ones from Anagnostides et al., [2022a,b]
Related papers
- Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning [13.429541377715296]
We propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in infinite-horizon discounted linear Markov decision processes.<n>We show that, combined with a regularized approximate dynamic-programming scheme, the resulting algorithm achieves a regret of order $tildemathcalO (sqrtd3 (1 - gamma)- 7 / 2 T)$, where $T$ is the total number of sample transitions, $gamma in (0,1)$ is the discount factor, and $d$ is the feature dimensionality.
arXiv Detail & Related papers (2025-02-19T17:32:35Z) - Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization [55.81991984375959]
In this work, we give a new technique for analyzing individualized privacy accounting via the following simple observation.
We obtain several improved algorithms for private optimization problems, including decomposable submodular and set algorithm cover.
arXiv Detail & Related papers (2024-05-28T19:02:30Z) - Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits [12.674929126684528]
We address the problem of semi-bandits, where a player selects among P actions from the power set of a set containing d base items.
We show that our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches.
arXiv Detail & Related papers (2024-02-23T08:07:54Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Projection-free Adaptive Regret with Membership Oracles [31.422532403048738]
Most iterative algorithms require the computation of projections onto convex sets, which can be computationally expensive.
Recent work by GK22 gave sublinear adaptive regret guarantees with projection free algorithms based on the Frank Wolfe approach.
We give projection-free algorithms that are based on a different technique, inspired by Mhammedi22, that replaces projections by set-membership computations.
arXiv Detail & Related papers (2022-11-22T23:53:06Z) - Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation.
Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration.
We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Learning Infinite-horizon Average-reward MDPs with Linear Function
Approximation [44.374427255708135]
We develop new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation.
Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal $widetildeO(sqrtT)$ regret.
Next, taking inspiration from adversarial linear bandits, we develop yet another efficient algorithm with $widetildeO(sqrtT)$ regret.
arXiv Detail & Related papers (2020-07-23T08:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.