Related papers: Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization

Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization

URL: http://arxiv.org/abs/2602.08261v1
Date: Mon, 09 Feb 2026 04:41:30 GMT
Title: Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization
Authors: Binglin Wu, Yingyi Zhang, Xianneng Li, Ruyue Deng, Chuan Yue, Weiru Zhang, Xiaoyi Zeng,
Abstract summary: PRO-Bid is a constraint-aware generative auto-bidding framework based on two synergistic mechanisms.<n>It achieves superior constraint satisfaction and value acquisition compared to state-of-the-art baselines.
Score: 8.514099612407062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Auto-bidding systems aim to maximize marketing value while satisfying strict efficiency constraints such as Target Cost-Per-Action (CPA). Although Decision Transformers provide powerful sequence modeling capabilities, applying them to this constrained setting encounters two challenges: 1) standard Return-to-Go conditioning causes state aliasing by neglecting the cost dimension, preventing precise resource pacing; and 2) standard regression forces the policy to mimic average historical behaviors, thereby limiting the capacity to optimize performance toward the constraint boundary. To address these challenges, we propose PRO-Bid, a constraint-aware generative auto-bidding framework based on two synergistic mechanisms: 1) Constraint-Decoupled Pareto Representation (CDPR) decomposes global constraints into recursive cost and value contexts to restore resource perception, while reweighting trajectories based on the Pareto frontier to focus on high-efficiency data; and 2) Counterfactual Regret Optimization (CRO) facilitates active improvement by utilizing a global outcome predictor to identify superior counterfactual actions. By treating these high-utility outcomes as weighted regression targets, the model transcends historical averages to approach the optimal constraint boundary. Extensive experiments on two public benchmarks and online A/B tests demonstrate that PRO-Bid achieves superior constraint satisfaction and value acquisition compared to state-of-the-art baselines.

Related papers

Constrained Group Relative Policy Optimization [18.3888203751956]
We introduce Constrained GRPO, a Lagrangian-based extension of GRPO for constrained policy optimization.<n>We show that a naive multi-component treatment in advantage estimation can break constrained learning.<n>We also evaluate Constrained GRPO on robotics tasks, where it improves constraint satisfaction while increasing task success.
arXiv Detail & Related papers (2026-02-05T16:44:23Z)
GAS: Enhancing Reward-Cost Balance of Generative Model-assisted Offline Safe RL [21.30558932544297]
Offline Safe Reinforcement Learning (OSRL) aims to learn a policy to achieve high performance in decision-making while satisfying constraints.<n>Recent works, inspired by the strong capabilities of Generative Models (GMs), reformulate decision-making in OSRL as a conditional generative process.<n>We propose Goal-Assisted Stitching (GAS), a novel algorithm designed to enhance stitching capabilities while effectively balancing reward and constraint satisfaction.
arXiv Detail & Related papers (2026-02-05T05:44:48Z)
Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning [2.0719232729184145]
offline Reinforcement Learning (RL) relies on policy constraints to shape performance.<n>Most existing methods commit to a single constraint family.<n>We propose Continuous Constraint Interpolation (CCI), a unified optimization framework.
arXiv Detail & Related papers (2026-01-30T14:21:41Z)
C2:Cross learning module enhanced decision transformer with Constraint-aware loss for auto-bidding [9.446373834962895]
Decision Transformer (DT) shows promise for generative auto-bidding by capturing temporal dependencies.<n>DT suffers from insufficient cross-correlation modeling among state, action, and return-to-go sequences.<n>We propose C2, a novel framework enhancing DT with two core innovations.
arXiv Detail & Related papers (2026-01-28T05:08:02Z)
MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z)
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making [75.29820290660065]
This paper proposes Thought-Centric Preference Optimization ( TCPO) for effective embodied decision-making.<n>It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation.<n>Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM.
arXiv Detail & Related papers (2025-09-10T11:16:21Z)
Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z)
A Deep Generative Learning Approach for Two-stage Adaptive Robust Optimization [3.124884279860061]
We introduce AGRO, a solution algorithm that performs adversarial generation for two-stage adaptive robust optimization.<n>AGRO generates high-dimensional contingencies that are simultaneously adversarial and realistic.<n>We show that AGRO outperforms the standard column-and-constraint algorithm by up to 1.8% in production-distribution planning and up to 11.6% in power system expansion.
arXiv Detail & Related papers (2024-09-05T17:42:19Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Algorithm for Constrained Markov Decision Process with Linear Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs. A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual. The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.