Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints
- URL: http://arxiv.org/abs/2507.09473v1
- Date: Sun, 13 Jul 2025 03:18:02 GMT
- Title: Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints
- Authors: Yan Dai, Negin Golrezaei, Patrick Jaillet,
- Abstract summary: We study the dynamic allocation of a reusable resource to strategic agents with private valuations.<n>We develop an incentive-aware framework that makes primal-dual methods robust to strategic behavior.
- Score: 24.842944692980495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by applications such as cloud platforms allocating GPUs to users or governments deploying mobile health units across competing regions, we study the dynamic allocation of a reusable resource to strategic agents with private valuations. Our objective is to simultaneously (i) maximize social welfare, (ii) satisfy multi-dimensional long-term cost constraints, and (iii) incentivize truthful reporting. We begin by numerically evaluating primal-dual methods widely used in constrained online optimization and find them to be highly fragile in strategic settings -- agents can easily manipulate their reports to distort future dual updates for future gain. To address this vulnerability, we develop an incentive-aware framework that makes primal-dual methods robust to strategic behavior. Our design combines epoch-based lazy updates -- where dual variables remain fixed within each epoch -- with randomized exploration rounds that extract approximately truthful signals for learning. Leveraging carefully designed online learning subroutines that can be of independent interest for dual updates, our mechanism achieves $\tilde{\mathcal{O}}(\sqrt{T})$ social welfare regret, satisfies all cost constraints, and ensures incentive alignment. This matches the performance of non-strategic allocation approaches while being robust to strategic agents.
Related papers
- Learning to Lead: Incentivizing Strategic Agents in the Dark [50.93875404941184]
We study an online learning version of the generalized principal-agent model.<n>We develop the first provably sample-efficient algorithm for this challenging setting.<n>We establish a near optimal $tildeO(sqrtT) $ regret bound for learning the principal's optimal policy.
arXiv Detail & Related papers (2025-06-10T04:25:04Z) - UDuo: Universal Dual Optimization Framework for Online Matching [9.092568268958425]
We propose a novel paradigm that fundamentally rethinks online allocation through three key innovations.<n> temporal user arrival representation vector, resource pacing learner, and online time-series forecasting approach.<n> Experimental results show that UDuo achieves higher efficiency and faster convergence than the traditional arrival model in real-world pricing.
arXiv Detail & Related papers (2025-05-28T11:25:50Z) - Generative Auto-Bidding with Value-Guided Explorations [47.71346722705783]
This paper introduces a novel offline Generative Auto-bidding framework with Value-Guided Explorations (GAVE)<n> Experimental results on two offline datasets and real-world deployments demonstrate that GAVE outperforms state-of-the-art baselines in both offline evaluations and online A/B tests.
arXiv Detail & Related papers (2025-04-20T12:28:49Z) - DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z) - $TAR^2$: Temporal-Agent Reward Redistribution for Optimal Policy Preservation in Multi-Agent Reinforcement Learning [7.97295726921338]
Temporal-Agent Reward Redistribution $TAR2$ is a novel approach that decomposes sparse global rewards into agent-specific, time-step-specific components.<n>We show that $TAR2$ aligns with potential-based reward shaping, preserving the same optimal policies as the original environment.
arXiv Detail & Related papers (2025-02-07T12:07:57Z) - Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees [3.6787328174619254]
Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts.<n>Existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation.<n>We present the first comprehensive study of adversarial robustness in two-stage L2D systems.
arXiv Detail & Related papers (2025-02-03T03:44:35Z) - Dynamic Matching with Post-allocation Service and its Application to Refugee Resettlement [1.9689888982532262]
Motivated by our collaboration with a major refugee resettlement agency in the U.S., we study a dynamic matching problem where each new arrival (a refugee case) must be matched immediately and irrevocably to one of the static resources.<n>We develop learning-based algorithms that are adversarialally optimal in certain regimes, easy to interpret, and computationally fast.
arXiv Detail & Related papers (2024-10-30T13:17:38Z) - Scalable Online Exploration via Coverability [45.66375686120087]
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation.
We introduce a new objective, $L_Coverage, which generalizes previous exploration schemes and supports three fundamental desideratas.
$L_Coverage enables the first computationally efficient model-based and model-free algorithms for online (reward-free or reward-driven) reinforcement learning in MDPs with low coverability.
arXiv Detail & Related papers (2024-03-11T10:14:06Z) - Optimizing Credit Limit Adjustments Under Adversarial Goals Using
Reinforcement Learning [42.303733194571905]
We seek to find and automatize an optimal credit card limit adjustment policy by employing reinforcement learning techniques.
Our research establishes a conceptual structure for applying reinforcement learning framework to credit limit adjustment.
arXiv Detail & Related papers (2023-06-27T16:10:36Z) - Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning.
Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations.
A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.