C2:Cross learning module enhanced decision transformer with Constraint-aware loss for auto-bidding
- URL: http://arxiv.org/abs/2601.20257v2
- Date: Thu, 29 Jan 2026 10:08:33 GMT
- Title: C2:Cross learning module enhanced decision transformer with Constraint-aware loss for auto-bidding
- Authors: Jinren Ding, Xuejian Xu, Shen Jiang, Zhitong Hao, Jinhui Yang, Peng Jiang,
- Abstract summary: Decision Transformer (DT) shows promise for generative auto-bidding by capturing temporal dependencies.<n>DT suffers from insufficient cross-correlation modeling among state, action, and return-to-go sequences.<n>We propose C2, a novel framework enhancing DT with two core innovations.
- Score: 9.446373834962895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision Transformer (DT) shows promise for generative auto-bidding by capturing temporal dependencies, but suffers from two critical limitations: insufficient cross-correlation modeling among state, action, and return-to-go (RTG) sequences, and indiscriminate learning of optimal/suboptimal behaviors. To address these, we propose C2, a novel framework enhancing DT with two core innovations: (1) a Cross Learning Block (CLB) via cross-attention to strengthen inter-sequence correlation modeling; (2) a Constraint-aware Loss (CL) incorporating budget and Cost-Per-Acquisition (CPA) constraints for selective learning of optimal trajectories. Extensive offline evaluations on the AuctionNet dataset demonstrate consistent performance gains (up to 3.2% over state-of-the-art method) across diverse budget settings; ablation studies verify the complementary synergy of CLB and CL, confirming C2's superiority in auto-bidding. The code for reproducing our results is available at: https://github.com/Dingjinren/C2.
Related papers
- Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization [8.514099612407062]
PRO-Bid is a constraint-aware generative auto-bidding framework based on two synergistic mechanisms.<n>It achieves superior constraint satisfaction and value acquisition compared to state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-09T04:41:30Z) - QASA: Quality-Guided K-Adaptive Slot Attention for Unsupervised Object-Centric Learning [80.82392186401354]
Slot Attention is an approach that binds different objects in a scene to a set of "slots"<n>Previous K-adaptive methods do not explicitly constrain slot-binding quality.<n>We propose Quality-Guided K-Adaptive Slot Attention (QASA)
arXiv Detail & Related papers (2026-01-19T10:42:07Z) - Randomized Neural Network with Adaptive Forward Regularization for Online Task-free Class Incremental Learning [16.323995111105884]
We propose a neural network (NN) with forward regularization (-F) to resist forgetting and enhance learning performance.<n>We derive the algorithm of the ensemble deep random vector functional link network (edRVFL) with adjustable forward regularization (-kF)<n>edRVFL-kF generates one-pass closed-form incremental updates and variable learning rates, effectively avoiding past replay and catastrophic forgetting.<n>We improve it to the plug-and-play edRVFL-kF-Bayes, enabling all hard ks in multiple sub-learners to be self-adaptively determined.
arXiv Detail & Related papers (2025-10-24T11:50:13Z) - Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z) - Boundary-to-Region Supervision for Offline Safe Reinforcement Learning [56.150983204962735]
Boundary-to-Region (B2R) is a framework that enables asymmetric conditioning through cost signal realignment.<n>B2R redefines CTG as a boundary constraint under a fixed safety budget, unifying the cost distribution of all feasible trajectories.<n> Experimental results show that B2R satisfies safety constraints in 35 out of 38 safety-critical tasks.
arXiv Detail & Related papers (2025-09-30T03:38:20Z) - PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z) - Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z) - Generative Bid Shading in Real-Time Bidding Advertising [7.7746704524695485]
This paper introduces Generative Bid Shading(GBS) as an end-to-end generative model.<n>It incorporates an autoregressive approach to generate ratios by capturing stepwise residual reward models.<n>It has been deployed on the Meit platform serving billions of bid requests daily.
arXiv Detail & Related papers (2025-08-06T03:34:49Z) - Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning [11.752632557524969]
Causal CLIP Adapter (CCA) is a novel framework that explicitly disentangles visual features extracted from CLIP.<n>Our method consistently outperforms state-of-the-art approaches in terms of few-shot performance and robustness to distributional shifts.
arXiv Detail & Related papers (2025-08-05T05:30:42Z) - Fast Second-Order Online Kernel Learning through Incremental Matrix Sketching and Decomposition [44.61147231796296]
Online Learning (OKL) has attracted considerable research interest due to its promising predictive performance in streaming environments.<n>Existing second-order OKL approaches suffer from at least quadratic time complexity with respect to the pre-set budget.<n>We propose FORKS, a fast incremental matrix sketching and decomposition approach tailored for second-order OKL.
arXiv Detail & Related papers (2024-10-15T02:07:48Z) - DELTA: Dynamic Embedding Learning with Truncated Conscious Attention for
CTR Prediction [61.68415731896613]
Click-Through Rate (CTR) prediction is a pivotal task in product and content recommendation.
We propose a model that enables Dynamic Embedding Learning with Truncated Conscious Attention for CTR prediction.
arXiv Detail & Related papers (2023-05-03T12:34:45Z) - Visual Alignment Constraint for Continuous Sign Language Recognition [74.26707067455837]
Vision-based Continuous Sign Language Recognition aims to recognize unsegmented gestures from image sequences.
In this work, we revisit the overfitting problem in recent CTC-based CSLR works and attribute it to the insufficient training of the feature extractor.
We propose a Visual Alignment Constraint (VAC) to enhance the feature extractor with more alignment supervision.
arXiv Detail & Related papers (2021-04-06T07:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.