Related papers: Computational Intractability of Strategizing against Online Learners

Computational Intractability of Strategizing against Online Learners

URL: http://arxiv.org/abs/2503.04202v1
Date: Thu, 06 Mar 2025 08:28:50 GMT
Title: Computational Intractability of Strategizing against Online Learners
Authors: Angelos Assos, Yuval Dagan, Nived Rajaraman,
Abstract summary: We show that a standard no-regret algorithm can compute a near-optimal strategy against a learner.<n>This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings.
Score: 7.225039292056237
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online learning algorithms are widely used in strategic multi-agent settings, including repeated auctions, contract design, and pricing competitions, where agents adapt their strategies over time. A key question in such environments is how an optimizing agent can best respond to a learning agent to improve its own long-term outcomes. While prior work has developed efficient algorithms for the optimizer in special cases - such as structured auction settings or contract design - no general efficient algorithm is known. In this paper, we establish a strong computational hardness result: unless $\mathsf{P} = \mathsf{NP}$, no polynomial-time optimizer can compute a near-optimal strategy against a learner using a standard no-regret algorithm, specifically Multiplicative Weights Update (MWU). Our result proves an $\Omega(T)$ hardness bound, significantly strengthening previous work that only showed an additive $\Theta(1)$ impossibility result. Furthermore, while the prior hardness result focused on learners using fictitious play - an algorithm that is not no-regret - we prove intractability for a widely used no-regret learning algorithm. This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings.

Related papers

Learning to Play Against Unknown Opponents [9.346742321348366]
We show how to efficiently construct an optimal learning algorithm when the learning agent is not constrained to no-regret.<n>All these results make use of recently developed machinery that converts the analysis of learning algorithms to the class of corresponding geometric objects known as menus.
arXiv Detail & Related papers (2024-12-24T09:05:06Z)
Online Learning for Function Placement in Serverless Computing [6.810196737272974]
We study the placement of virtual functions aimed at minimizing the cost.<n>We propose a novel algorithm, using ideas based on multi-armed bandits.<n>We show through numerical experiments that the proposed algorithm both has good practical performance and modest computational complexity.
arXiv Detail & Related papers (2024-10-17T16:03:43Z)
Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets. Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z)
PDE-Based Optimal Strategy for Unconstrained Online Learning [40.61498562988079]
We present a framework that generates time-varying potential functions by solving a Partial Differential Equation (PDE) Our framework recovers some classical potentials, and more importantly provides a systematic approach to design new ones. This is the first parameter-free algorithm with optimal leading constant.
arXiv Detail & Related papers (2022-01-19T22:21:21Z)
A Pragmatic Look at Deep Imitation Learning [0.3626013617212666]
We re-implement 6 different adversarial imitation learning algorithms. We evaluate them on a widely-used expert trajectory dataset. GAIL consistently performs well across a range of sample sizes.
arXiv Detail & Related papers (2021-08-04T06:33:10Z)
Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits. We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms. We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
Quantum algorithms for hedging and the learning of Ising models [6.346764987071066]
A paradigmatic algorithm for online learning is the Hedge algorithm by Freund and Schapire. This work presents quantum algorithms for such online learning in an oracular setting.
arXiv Detail & Related papers (2020-02-14T12:48:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.