Related papers: Maximizing utility in multi-agent environments by anticipating the behavior of other learners

Maximizing utility in multi-agent environments by anticipating the behavior of other learners

URL: http://arxiv.org/abs/2407.04889v1
Date: Fri, 5 Jul 2024 23:16:18 GMT
Title: Maximizing utility in multi-agent environments by anticipating the behavior of other learners
Authors: Angelos Assos, Yuval Dagan, Constantinos Daskalakis,
Abstract summary: In multi-agent settings, the decisions of each agent can affect the utilities/losses the other agents. In this paper, we study repeated two-player games involving two types of agents.
Score: 17.703508282875323
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Learning algorithms are often used to make decisions in sequential decision-making environments. In multi-agent settings, the decisions of each agent can affect the utilities/losses of the other agents. Therefore, if an agent is good at anticipating the behavior of the other agents, in particular how they will make decisions in each round as a function of their experience that far, it could try to judiciously make its own decisions over the rounds of the interaction so as to influence the other agents to behave in a way that ultimately benefits its own utility. In this paper, we study repeated two-player games involving two types of agents: a learner, which employs an online learning algorithm to choose its strategy in each round; and an optimizer, which knows the learner's utility function and the learner's online learning algorithm. The optimizer wants to plan ahead to maximize its own utility, while taking into account the learner's behavior. We provide two results: a positive result for repeated zero-sum games and a negative result for repeated general-sum games. Our positive result is an algorithm for the optimizer, which exactly maximizes its utility against a learner that plays the Replicator Dynamics -- the continuous-time analogue of Multiplicative Weights Update (MWU). Additionally, we use this result to provide an algorithm for the optimizer against MWU, i.e.~for the discrete-time setting, which guarantees an average utility for the optimizer that is higher than the value of the one-shot game. Our negative result shows that, unless P=NP, there is no Fully Polynomial Time Approximation Scheme (FPTAS) for maximizing the utility of an optimizer against a learner that best-responds to the history in each round. Yet, this still leaves open the question of whether there exists a polynomial-time algorithm that optimizes the utility up to $o(T)$.

Related papers

Computational Intractability of Strategizing against Online Learners [7.225039292056237]
We show that a standard no-regret algorithm can compute a near-optimal strategy against a learner. This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings.
arXiv Detail & Related papers (2025-03-06T08:28:50Z)
Learning to Play Against Unknown Opponents [9.346742321348366]
We show how to efficiently construct an optimal learning algorithm when the learning agent is not constrained to no-regret. All these results make use of recently developed machinery that converts the analysis of learning algorithms to the class of corresponding geometric objects known as menus.
arXiv Detail & Related papers (2024-12-24T09:05:06Z)
Deep Learning Algorithms for Mean Field Optimal Stopping in Finite Space and Discrete Time [3.350071725971209]
This work studies the mean field optimal stopping (MFOS) problem, obtained as the number of agents approaches infinity. We propose two deep learning methods: one simulates full trajectories to learn optimal decisions, whereas the other leverages DPP with backward induction. We demonstrate the effectiveness of these approaches through numerical experiments on 6 different problems in spatial dimension up to 300.
arXiv Detail & Related papers (2024-10-11T14:27:17Z)
Optimally Improving Cooperative Learning in a Social Setting [4.200480236342444]
We consider a cooperative learning scenario where a collection of networked agents with individually owned classifiers dynamically update their predictions. We show a time algorithm for optimizing the aggregate objective function, and show that optimizing the egalitarian objective function is NP-hard. The performance of all of our algorithms are guaranteed by mathematical analysis and backed by experiments on synthetic and real data.
arXiv Detail & Related papers (2024-05-31T14:07:33Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems. Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success. We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z)
Strategizing against Learners in Bayesian Games [74.46970859427907]
We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy. We consider general Bayesian games, where the payoffs of both the payoffs of both the learner and the learner could depend on the type.
arXiv Detail & Related papers (2022-05-17T18:10:25Z)
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z)
ORSA: Outlier Robust Stacked Aggregation for Best- and Worst-Case Approximations of Ensemble Systems\ [0.0]
In Post-Silicon Validation for semiconductor devices (PSV), the task is to approximate the underlying function of the data with multiple learning algorithms. In PSV, the expectation is that an unknown number of subsets describe functions showing very different characteristics. Our method aims to find a suitable approximation that is robust to outliers and represents the best or worst case in a way that will apply to as many types as possible.
arXiv Detail & Related papers (2021-11-17T11:33:46Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games [102.23975166536326]
Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment. It captures the online decision-making problems that each player faces in an extensive-form game, as well as Markov decision processes and partially-observable Markov decision processes where the agent conditions on observed history. In this paper, we give the first algorithm for the bandit linear optimization problem for dilatedDM that offers both (i) linear-time losses and (ii) $O(sqrtT)$ cumulative regret in
arXiv Detail & Related papers (2021-03-08T05:00:13Z)
Time Efficiency in Optimization with a Bayesian-Evolutionary Algorithm [13.66850118870667]
We show that not all generate-and-test search algorithms are created equal. We propose a new algorithm, a combination of Bayesian optimization and an Evolutionary Algorithm, BEA for short, that starts with BO, then transfers knowledge to an EA, and subsequently runs the EA. The results show that BEA outperforms both BO and the EA in terms of time efficiency, and ultimately leads to better performance on well-known benchmark objective functions with many local optima.
arXiv Detail & Related papers (2020-05-04T15:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.