Related papers: Large Language Models as Bidding Agents in Repeated HetNet Auction

Large Language Models as Bidding Agents in Repeated HetNet Auction

URL: http://arxiv.org/abs/2603.04455v1
Date: Mon, 02 Mar 2026 07:30:01 GMT
Title: Large Language Models as Bidding Agents in Repeated HetNet Auction
Authors: Ismail Lotfi, Ali Ghrayeb, Samson Lasaulce, Merouane Debbah,
Abstract summary: This paper investigates the integration of large language models (LLMs) as reasoning agents in repeated spectrum auctions within heterogeneous networks (HetNets)<n>We propose a distributed auction-based framework in which each base station (BS) independently conducts its own multi-channel auction, and user equipments (UEs) strategically decide both their association and bid values.<n> Simulation results reveal that the LLM-empowered UE achieves consistently higher channel access frequency and improved budget efficiency compared to benchmarks.
Score: 4.305340565419997
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper investigates the integration of large language models (LLMs) as reasoning agents in repeated spectrum auctions within heterogeneous networks (HetNets). While auction-based mechanisms have been widely employed for efficient resource allocation, most prior works assume one-shot auctions, static bidder behavior, and idealized conditions. In contrast to traditional formulations where base station (BS) association and power allocation are centrally optimized, we propose a distributed auction-based framework in which each BS independently conducts its own multi-channel auction, and user equipments (UEs) strategically decide both their association and bid values. Within this setting, UEs operate under budget constraints and repeated interactions, transforming resource allocation into a long-term economic decision rather than a one-shot optimization problem. The proposed framework enables the evaluation of diverse bidding behaviors -from classical myopic and greedy policies to LLM-based agents capable of reasoning over historical outcomes, anticipating competition, and adapting their bidding strategy across episodes. Simulation results reveal that the LLM-empowered UE consistently achieves higher channel access frequency and improved budget efficiency compared to benchmarks. These findings highlight the potential of reasoning-enabled agents in future decentralized wireless networks markets and pave the way for lightweight, edge-deployable LLMs to support intelligent resource allocation in next-generation HetNets.

Related papers

Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition [51.68340973140949]
Multimodal Named Entity Recognition (GMNER) aims to extract text-based entities, assign them semantic categories, and ground them to corresponding visual regions.<n> MLLMs exhibit $textbfmodality bias$, including visual bias and textual bias, which stems from their tendency to take unimodal shortcuts.<n>We propose Modality-aware Consistency Reasoning ($bfMCR$), which enforces structured cross-modal reasoning.
arXiv Detail & Related papers (2026-02-04T12:12:49Z)
DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs [21.30516760599435]
Large Language Models offer a promising alternative for AIGB.<n>They lack the numerical precision required for fine-grained optimization.<n>We propose DARA, a novel dual-phase framework that decomposes the decision-making process into two stages.<n>Our approach consistently outperforms existing baselines in terms of cumulative advertiser value under budget constraints.
arXiv Detail & Related papers (2026-01-21T06:58:44Z)
MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z)
AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards [60.2998874976509]
We propose advantage-weighted policy optimization (AWPO) to integrate explicit reasoning rewards to enhance tool-use capability.<n>AWPO incorporates variance-aware gating and difficulty-aware weighting to adaptively modulate advantages from reasoning signals.<n>Experiments demonstrate that AWPO achieves state-of-the-art performance across standard tool-use benchmarks.
arXiv Detail & Related papers (2025-12-22T08:07:00Z)
LLM-Auction: Generative Auction towards LLM-Native Advertising [10.695066036409274]
We propose a learning-based generative auction mechanism that integrates auction and LLM generation for LLM-native advertising.<n>We introduce Iterative Reward-Preference Optimization (IRPO) algorithm that alternately optimize the reward model and the LLM.<n>We show that LLM-Auction significantly outperforms existing baselines in allocation efficiency, while achieving the desired mechanism properties.
arXiv Detail & Related papers (2025-12-11T11:31:20Z)
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks [63.541114376141735]
Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios.<n>However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks.<n>We propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism.
arXiv Detail & Related papers (2025-11-04T15:14:58Z)
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents [28.145430029174577]
Large language model (LLM)-based agents are increasingly trained with reinforcement learning (RL) to enhance their ability to interact with external environments.<n>Existing approaches typically rely on outcome-based rewards that are only provided at the final answer.<n>In this paper, we propose Information Gain-based Policy Optimization (IGPO), a simple yet effective RL framework that provides dense and intrinsic supervision for multi-turn agent training.
arXiv Detail & Related papers (2025-10-16T17:59:32Z)
Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
Hierarchical Multi-agent Meta-Reinforcement Learning for Cross-channel Bidding [4.741091524027138]
Real-time bidding (RTB) plays a pivotal role in online advertising ecosystems.<n>Traditional approaches cannot effectively manage the dynamic budget allocation problem.<n>We propose a hierarchical multi-agent reinforcement learning framework for multi-channel bidding optimization.
arXiv Detail & Related papers (2024-12-26T05:26:30Z)
Large Language Model as a Catalyst: A Paradigm Shift in Base Station Siting Optimization [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering.<n>Our proposed framework incorporates retrieval-augmented generation (RAG) to enhance the system's ability to acquire domain-specific knowledge and generate solutions.
arXiv Detail & Related papers (2024-08-07T08:43:32Z)
Optimal Bidding Strategy without Exploration in Real-time Bidding [14.035270361462576]
maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems. Previous works ignore the losing auctions to alleviate the difficulty with censored states. We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
arXiv Detail & Related papers (2020-03-31T20:43:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.