Related papers: LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting

LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting

URL: http://arxiv.org/abs/2603.05134v1
Date: Thu, 05 Mar 2026 13:01:21 GMT
Title: LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting
Authors: Yewen Li, Zhiyi Lyu, Peng Jiang, Qingpeng Cai, Fei Pan, Bo An, Peng Jiang,
Abstract summary: Large language models (LLMs) offer a promising solution by leveraging prior human knowledge and reasoning abilities to improve auto-bidding performance.<n>We propose a hierarchical Large autoBidding Model (LBM) to leverage the reasoning capabilities of LLMs for developing a superior auto-bidding strategy.
Score: 29.012758785758262
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The growing scale of ad auctions on online advertising platforms has intensified competition, making manual bidding impractical and necessitating auto-bidding to help advertisers achieve their economic goals. Current auto-bidding methods have evolved to use offline reinforcement learning or generative methods to optimize bidding strategies, but they can sometimes behave counterintuitively due to the black-box training manner and limited mode coverage of datasets, leading to challenges in understanding task status and generalization in dynamic ad environments. Large language models (LLMs) offer a promising solution by leveraging prior human knowledge and reasoning abilities to improve auto-bidding performance. However, directly applying LLMs to auto-bidding faces difficulties due to the need for precise actions in competitive auctions and the lack of specialized auto-bidding knowledge, which can lead to hallucinations and suboptimal decisions. To address these challenges, we propose a hierarchical Large autoBidding Model (LBM) to leverage the reasoning capabilities of LLMs for developing a superior auto-bidding strategy. This includes a high-level LBM-Think model for reasoning and a low-level LBM-Act model for action generation. Specifically, we propose a dual embedding mechanism to efficiently fuse two modalities, including language and numerical inputs, for language-guided training of the LBM-Act; then, we propose an offline reinforcement fine-tuning technique termed GQPO for mitigating the LLM-Think's hallucinations and enhancing decision-making performance without simulation or real-world rollout like previous multi-turn LLM-based methods. Experiments demonstrate the superiority of a generative backbone based on our LBM, especially in an efficient training manner and generalization ability.

Related papers

DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs [21.30516760599435]
Large Language Models offer a promising alternative for AIGB.<n>They lack the numerical precision required for fine-grained optimization.<n>We propose DARA, a novel dual-phase framework that decomposes the decision-making process into two stages.<n>Our approach consistently outperforms existing baselines in terms of cumulative advertiser value under budget constraints.
arXiv Detail & Related papers (2026-01-21T06:58:44Z)
LLM-Auction: Generative Auction towards LLM-Native Advertising [10.695066036409274]
We propose a learning-based generative auction mechanism that integrates auction and LLM generation for LLM-native advertising.<n>We introduce Iterative Reward-Preference Optimization (IRPO) algorithm that alternately optimize the reward model and the LLM.<n>We show that LLM-Auction significantly outperforms existing baselines in allocation efficiency, while achieving the desired mechanism properties.
arXiv Detail & Related papers (2025-12-11T11:31:20Z)
Feedback-Induced Performance Decline in LLM-Based Decision-Making [6.5990946334144756]
Large Language Models (LLMs) can extract context from natural language problem descriptions.<n>This paper studies the behaviour of these models within a Markov Decision Process (MDPs)
arXiv Detail & Related papers (2025-07-20T10:38:56Z)
Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems [54.709976343045824]
Current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios.<n>We propose MTORL, a novel multi-task offline RL model that targets two key objectives.<n>We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation.
arXiv Detail & Related papers (2025-06-29T05:05:13Z)
Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics [0.46174569259495524]
This survey paper outlines the key developments in the field of Large Language Models (LLMs)<n>The techniques that have been most effective in bridging the gap between human and machine communications include the Chain-of-Thought prompting, Instruction Tuning, and Reinforcement Learning from Human Feedback.<n>A significant focus is placed on efficiency, detailing scaling strategies, optimization techniques, and the influential Mixture-of-Experts (MoE) architecture.
arXiv Detail & Related papers (2025-06-14T05:55:19Z)
Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking [61.61356842567952]
We propose STeP, a novel method for improving LLM-based agent training.<n>We synthesize self-reflected trajectories that include reflections and corrections of error steps.<n>Experiments demonstrate that our method improves agent performance across three representative tasks.
arXiv Detail & Related papers (2025-05-26T14:11:12Z)
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL [62.984693936073974]
Large language models (LLMs) excel in tasks like question answering and dialogue.<n>Complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning.<n>We propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents.
arXiv Detail & Related papers (2025-05-23T16:51:54Z)
Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing [14.432850893209817]
This paper introduces Attention Pruning, a fairness-aware simulated annealing approach to prune attention heads in large language models (LLMs)<n>Our experiments show that Attention Pruning achieves up to $40%$ reduction in gender bias and outperforms the state-of-the-art bias mitigation strategies.
arXiv Detail & Related papers (2025-03-20T03:02:32Z)
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models [54.381650481255235]
We introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (O) Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions. Empirical evaluations on eight recent LLMs, both open and closed-sourced, demonstrate that DRPO significantly enhances alignment performance.
arXiv Detail & Related papers (2024-11-13T16:15:38Z)
MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [77.72128397088409]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.<n>We also propose a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [84.31119464141631]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.<n>Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.