Large Language Models as Bidding Agents in Repeated HetNet Auction
- URL: http://arxiv.org/abs/2603.04455v1
- Date: Mon, 02 Mar 2026 07:30:01 GMT
- Title: Large Language Models as Bidding Agents in Repeated HetNet Auction
- Authors: Ismail Lotfi, Ali Ghrayeb, Samson Lasaulce, Merouane Debbah,
- Abstract summary: This paper investigates the integration of large language models (LLMs) as reasoning agents in repeated spectrum auctions within heterogeneous networks (HetNets)<n>We propose a distributed auction-based framework in which each base station (BS) independently conducts its own multi-channel auction, and user equipments (UEs) strategically decide both their association and bid values.<n> Simulation results reveal that the LLM-empowered UE achieves consistently higher channel access frequency and improved budget efficiency compared to benchmarks.
- Score: 4.305340565419997
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper investigates the integration of large language models (LLMs) as reasoning agents in repeated spectrum auctions within heterogeneous networks (HetNets). While auction-based mechanisms have been widely employed for efficient resource allocation, most prior works assume one-shot auctions, static bidder behavior, and idealized conditions. In contrast to traditional formulations where base station (BS) association and power allocation are centrally optimized, we propose a distributed auction-based framework in which each BS independently conducts its own multi-channel auction, and user equipments (UEs) strategically decide both their association and bid values. Within this setting, UEs operate under budget constraints and repeated interactions, transforming resource allocation into a long-term economic decision rather than a one-shot optimization problem. The proposed framework enables the evaluation of diverse bidding behaviors -from classical myopic and greedy policies to LLM-based agents capable of reasoning over historical outcomes, anticipating competition, and adapting their bidding strategy across episodes. Simulation results reveal that the LLM-empowered UE consistently achieves higher channel access frequency and improved budget efficiency compared to benchmarks. These findings highlight the potential of reasoning-enabled agents in future decentralized wireless networks markets and pave the way for lightweight, edge-deployable LLMs to support intelligent resource allocation in next-generation HetNets.
Related papers
- Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition [51.68340973140949]
Multimodal Named Entity Recognition (GMNER) aims to extract text-based entities, assign them semantic categories, and ground them to corresponding visual regions.<n> MLLMs exhibit $textbfmodality bias$, including visual bias and textual bias, which stems from their tendency to take unimodal shortcuts.<n>We propose Modality-aware Consistency Reasoning ($bfMCR$), which enforces structured cross-modal reasoning.
arXiv Detail & Related papers (2026-02-04T12:12:49Z) - DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs [21.30516760599435]
Large Language Models offer a promising alternative for AIGB.<n>They lack the numerical precision required for fine-grained optimization.<n>We propose DARA, a novel dual-phase framework that decomposes the decision-making process into two stages.<n>Our approach consistently outperforms existing baselines in terms of cumulative advertiser value under budget constraints.
arXiv Detail & Related papers (2026-01-21T06:58:44Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards [60.2998874976509]
We propose advantage-weighted policy optimization (AWPO) to integrate explicit reasoning rewards to enhance tool-use capability.<n>AWPO incorporates variance-aware gating and difficulty-aware weighting to adaptively modulate advantages from reasoning signals.<n>Experiments demonstrate that AWPO achieves state-of-the-art performance across standard tool-use benchmarks.
arXiv Detail & Related papers (2025-12-22T08:07:00Z) - LLM-Auction: Generative Auction towards LLM-Native Advertising [10.695066036409274]
We propose a learning-based generative auction mechanism that integrates auction and LLM generation for LLM-native advertising.<n>We introduce Iterative Reward-Preference Optimization (IRPO) algorithm that alternately optimize the reward model and the LLM.<n>We show that LLM-Auction significantly outperforms existing baselines in allocation efficiency, while achieving the desired mechanism properties.
arXiv Detail & Related papers (2025-12-11T11:31:20Z) - Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks [63.541114376141735]
Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios.<n>However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks.<n>We propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism.
arXiv Detail & Related papers (2025-11-04T15:14:58Z) - Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents [28.145430029174577]
Large language model (LLM)-based agents are increasingly trained with reinforcement learning (RL) to enhance their ability to interact with external environments.<n>Existing approaches typically rely on outcome-based rewards that are only provided at the final answer.<n>In this paper, we propose Information Gain-based Policy Optimization (IGPO), a simple yet effective RL framework that provides dense and intrinsic supervision for multi-turn agent training.
arXiv Detail & Related papers (2025-10-16T17:59:32Z) - Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - Hierarchical Multi-agent Meta-Reinforcement Learning for Cross-channel Bidding [4.741091524027138]
Real-time bidding (RTB) plays a pivotal role in online advertising ecosystems.<n>Traditional approaches cannot effectively manage the dynamic budget allocation problem.<n>We propose a hierarchical multi-agent reinforcement learning framework for multi-channel bidding optimization.
arXiv Detail & Related papers (2024-12-26T05:26:30Z) - Large Language Model as a Catalyst: A Paradigm Shift in Base Station Siting Optimization [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering.<n>Our proposed framework incorporates retrieval-augmented generation (RAG) to enhance the system's ability to acquire domain-specific knowledge and generate solutions.
arXiv Detail & Related papers (2024-08-07T08:43:32Z) - Optimal Bidding Strategy without Exploration in Real-time Bidding [14.035270361462576]
maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems.
Previous works ignore the losing auctions to alleviate the difficulty with censored states.
We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
arXiv Detail & Related papers (2020-03-31T20:43:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.