Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation
- URL: http://arxiv.org/abs/2508.14705v2
- Date: Tue, 26 Aug 2025 15:07:29 GMT
- Title: Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation
- Authors: Phurinut Srisawad, Juergen Branke, Long Tran-Thanh,
- Abstract summary: We study payoff manipulation in repeated multi-objective Stackelberg games.<n>We assume that the follower's utility function, representing preferences over multiple objectives, is unknown but linear.<n>This introduces a sequential decision-making challenge for the leader, who must balance preference elicitation with immediate utility maximisation.
- Score: 7.794550421457341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study payoff manipulation in repeated multi-objective Stackelberg games, where a leader may strategically influence a follower's deterministic best response, e.g., by offering a share of their own payoff. We assume that the follower's utility function, representing preferences over multiple objectives, is unknown but linear, and its weight parameter must be inferred through interaction. This introduces a sequential decision-making challenge for the leader, who must balance preference elicitation with immediate utility maximisation. We formalise this problem and propose manipulation policies based on expected utility (EU) and long-term expected utility (longEU), which guide the leader in selecting actions and offering incentives that trade off short-term gains with long-term impact. We prove that under infinite repeated interactions, longEU converges to the optimal manipulation. Empirical results across benchmark environments demonstrate that our approach improves cumulative leader utility while promoting mutually beneficial outcomes, all without requiring explicit negotiation or prior knowledge of the follower's utility function.
Related papers
- More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration [103.1589018460702]
"guidance-on-demand" approach expands exploration while preserving the value of self-discovery.<n>Experiments show AMPO substantially outperforms a strong baseline.<n>Using four peer-sized teachers, our method achieves comparable results to approaches that leverage a single, more powerful teacher.
arXiv Detail & Related papers (2025-10-02T17:14:00Z) - MultiScale Contextual Bandits for Long Term Objectives [36.85989221657821]
We introduce the framework of MultiScale Policy Learning to contextually reconcile AI systems need to act and optimize feedback at multiple timescales.<n>We show how the lower timescales with more plentiful data can provide a data-dependent hierarchical prior for faster learning at higher scales.
arXiv Detail & Related papers (2025-03-22T07:03:45Z) - Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer [12.252515483035737]
Current recommendation strategies grapple with two significant hurdles.<n>We introduce a future-conditioned strategy for multi-objective controllable recommendations.<n>We present the Multi-Objective Controllable Decision Transformer (MocDT), an offline Reinforcement Learning (RL) model capable of autonomously learning the mapping from multiple objectives to item sequences.
arXiv Detail & Related papers (2025-01-13T11:12:43Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems [60.91599969408029]
optimizing multiple objectives simultaneously is an important task for recommendation platforms.
Existing multi-objective recommender systems do not systematically consider such dynamic relationships.
arXiv Detail & Related papers (2024-07-04T02:19:49Z) - Contrastive Learning Method for Sequential Recommendation based on Multi-Intention Disentanglement [5.734747179463411]
We propose a Contrastive Learning sequential recommendation method based on Multi-Intention Disentanglement (MIDCL)
In our work, intentions are recognized as dynamic and diverse, and user behaviors are often driven by current multi-intentions.
We propose two types of contrastive learning paradigms for finding the most relevant user's interactive intention, and maximizing the mutual information of positive sample pairs.
arXiv Detail & Related papers (2024-04-28T15:13:36Z) - Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences.
Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples.
We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - Stateful Strategic Regression [20.7177095411398]
We describe the Stackelberg equilibrium of the resulting game and provide novel algorithms for computing it.
Our analysis reveals several intriguing insights about the role of multiple interactions in shaping the game's outcome.
Most importantly, we show that with multiple rounds of interaction at her disposal, the principal is more effective at incentivizing the agent to accumulate effort in her desired direction.
arXiv Detail & Related papers (2021-06-07T17:46:29Z) - Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles [73.15950858151594]
This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards.
We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling.
We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
arXiv Detail & Related papers (2020-10-27T22:06:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.