Related papers: Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games

Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games

URL: http://arxiv.org/abs/2511.05640v1
Date: Fri, 07 Nov 2025 16:27:59 GMT
Title: Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games
Authors: Hamza Virk, Sandro Amaglobeli, Zuhayr Syed,
Abstract summary: We introduce Blind-IGT, the first statistical framework to jointly recover $theta$ and $tau$ from observed behavior.<n>We prove it achieves the optimal $mathcalO(N-1/2)$ convergence rate for joint parameter recovery.<n>We extend our framework to Markov games and demonstrate optimal convergence rates with strong empirical performance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inverse Game Theory (IGT) methods based on the entropy-regularized Quantal Response Equilibrium (QRE) offer a tractable approach for competitive settings, but critically assume the agents' rationality parameter (temperature $\tau$) is known a priori. When $\tau$ is unknown, a fundamental scale ambiguity emerges that couples $\tau$ with the reward parameters ($\theta$), making them statistically unidentifiable. We introduce Blind-IGT, the first statistical framework to jointly recover both $\theta$ and $\tau$ from observed behavior. We analyze this bilinear inverse problem and establish necessary and sufficient conditions for unique identification by introducing a normalization constraint that resolves the scale ambiguity. We propose an efficient Normalized Least Squares (NLS) estimator and prove it achieves the optimal $\mathcal{O}(N^{-1/2})$ convergence rate for joint parameter recovery. When strong identifiability conditions fail, we provide partial identification guarantees through confidence set construction. We extend our framework to Markov games and demonstrate optimal convergence rates with strong empirical performance even when transition dynamics are unknown.

Related papers

Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z)
Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback [50.89125374999765]
We provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($mathtOMWU$) in NLHF.<n>Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values.
arXiv Detail & Related papers (2025-12-31T12:08:29Z)
The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents [0.6117371161379209]
We propose a formal probabilistic framework that provides agents with a non-altruistic motive for bidirectional knowledge exchange.<n>We show how these accumulated belief states serve as verifiable reward signals for Reinforcement Learning from Human Feedback (RLHF) and high-quality data filters for Supervised Fine-Tuning (SFT)<n> Simulation results validate that this uncertainty-driven strategy significantly outperforms random baselines in heterogeneous environments.
arXiv Detail & Related papers (2025-12-24T02:02:25Z)
EVaR-Optimal Arm Identification in Bandits [7.340828059560291]
We study the fixed-confidence best arm identification problem within the multiarmed bandit (MAB) framework under the Entropic Value-at-Risk criterion.
arXiv Detail & Related papers (2025-10-06T11:49:56Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$. We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z)
Implicit Bias and Fast Convergence Rates for Self-attention [26.766649949420746]
We study the fundamental optimization principles of self-attention, the defining mechanism of transformers.<n>We analyze the implicit bias of gradient-baseds in a self-attention layer with a decoder in a linear classification.
arXiv Detail & Related papers (2024-02-08T15:15:09Z)
Little Exploration is All You Need [1.9321472560290351]
We introduce a novel modification of standard UCB algorithm in the multi-armed bandit problem. We propose an adjusted bonus term of $1/ntau$, where $tau > 1/2$, that accounts for task difficulty. Our proposed algorithm, denoted as UCB$tau$, is substantiated through comprehensive regret and risk analyses.
arXiv Detail & Related papers (2023-10-26T16:28:29Z)
Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions. In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z)
Federated Learning in the Presence of Adversarial Client Unavailability [16.201377650598516]
Federated learning is a decentralized machine learning framework that enables collaborative model without revealing raw data. Due to the diverse hardware software limitations, a client may not always be available for the computation requests from the server. In harsh environments like battlefields, adversaries can selectively silence specific clients.
arXiv Detail & Related papers (2023-05-31T15:57:07Z)
A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown. We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$. We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.