Related papers: Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement

Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement

URL: http://arxiv.org/abs/2506.21956v1
Date: Fri, 27 Jun 2025 06:56:54 GMT
Title: Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement
Authors: Hao Jiang, Yongxiang Tang, Yanxiang Zeng, Pengjia Yuan, Yanhua Cheng, Teng Sha, Xialong Liu, Peng Jiang,
Abstract summary: We introduce the R* Decision Transformer (R* DT) to tackle the difficulties inherent in automated bidding.<n>R* DT stores actions based on state and return-to-go (RTG) value, as well as memorizing the RTG for a given state using a training set.<n> Comprehensive tests on a publicly available bidding dataset validate the R* DT's efficacy and highlight its superiority when dealing with mixed-quality trajectories.
Score: 8.221810937147755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the realm of online advertising, advertisers partake in ad auctions to obtain advertising slots, frequently taking advantage of auto-bidding tools provided by demand-side platforms. To improve the automation of these bidding systems, we adopt generative models, namely the Decision Transformer (DT), to tackle the difficulties inherent in automated bidding. Applying the Decision Transformer to the auto-bidding task enables a unified approach to sequential modeling, which efficiently overcomes short-sightedness by capturing long-term dependencies between past bidding actions and user behavior. Nevertheless, conventional DT has certain drawbacks: (1) DT necessitates a preset return-to-go (RTG) value before generating actions, which is not inherently produced; (2) The policy learned by DT is restricted by its training data, which is consists of mixed-quality trajectories. To address these challenges, we introduce the R* Decision Transformer (R* DT), developed in a three-step process: (1) R DT: Similar to traditional DT, R DT stores actions based on state and RTG value, as well as memorizing the RTG for a given state using the training set; (2) R^ DT: We forecast the highest value (within the training set) of RTG for a given state, deriving a suboptimal policy based on the current state and the forecasted supreme RTG value; (3) R* DT: Based on R^ DT, we generate trajectories and select those with high rewards (using a simulator) to augment our training dataset. This data enhancement has been shown to improve the RTG of trajectories in the training data and gradually leads the suboptimal policy towards optimality. Comprehensive tests on a publicly available bidding dataset validate the R* DT's efficacy and highlight its superiority when dealing with mixed-quality trajectories.

Related papers

Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer [29.029659384955206]
Decision Transformers (DT) play a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains.<n>We propose the Counterfactual Reasoning Decision Transformer (CRDT), a novel framework inspired by counterfactual reasoning.
arXiv Detail & Related papers (2025-05-14T03:45:16Z)
DRDT3: Diffusion-Refined Decision Test-Time Training Model [6.907105812732423]
Decision Transformer (DT) has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches.<n>We introduce a unified framework, called Diffusion-Refined Decision TTT (DRDT3), to achieve performance beyond DT models.
arXiv Detail & Related papers (2025-01-12T04:59:49Z)
Enhancing Decision Transformer with Diffusion-Based Trajectory Branch Generation [29.952637757286073]
Decision Transformer (DT) can learn effective policy from offline datasets by converting the offline reinforcement learning (RL) into a supervised sequence modeling task. We introduce Diffusion-Based Trajectory Branch Generation (BG), which expands the trajectories of the dataset with branches generated by a diffusion model. BG outperforms state-of-the-art sequence modeling methods on D4RL benchmark.
arXiv Detail & Related papers (2024-11-18T06:44:14Z)
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer [2.1186155813156926]
RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR. To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales. To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator.
arXiv Detail & Related papers (2024-07-24T10:20:19Z)
Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL) We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z)
Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information. Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction. Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z)
Continual Detection Transformer for Incremental Object Detection [154.8345288298059]
Incremental object detection (IOD) aims to train an object detector in phases, each with annotations for new object categories. As other incremental settings, IOD is subject to catastrophic forgetting, which is often addressed by techniques such as knowledge distillation (KD) and exemplar replay (ER) We propose a new method for transformer-based IOD which enables effective usage of KD and ER in this context.
arXiv Detail & Related papers (2023-04-06T14:38:40Z)
Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer [60.31021888394358]
Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR) We propose a SOurce-free Domain Adaptation framework for image SR (SODA-SR) to address this issue, i.e., adapt a source-trained model to a target domain with only unlabeled target data.
arXiv Detail & Related papers (2023-03-31T03:14:44Z)
Generalized Decision Transformer for Offline Hindsight Information Matching [16.7594941269479]
We present Generalized Decision Transformer (GDT) for solving any hindsight information matching (HIM) problem. We show how different choices for the feature function and the anti-causal aggregator lead to novel Categorical DT (CDT) and Bi-directional DT (BDT) for matching different statistics of the future.
arXiv Detail & Related papers (2021-11-19T18:56:13Z)
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.