DecisionLLM: Large Language Models for Long Sequence Decision Exploration
- URL: http://arxiv.org/abs/2601.10148v1
- Date: Thu, 15 Jan 2026 07:42:02 GMT
- Title: DecisionLLM: Large Language Models for Long Sequence Decision Exploration
- Authors: Xiaowei Lv, Zhilin Zhang, Yijun Li, Yusen Huo, Siyuan Ju, Xuyan Li, Chunxiang Hong, Tianyu Wang, Yongcai Wang, Peng Sun, Chuan Yu, Jian Xu, Bo Zheng,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable success in complex reasoning and planning tasks.<n>This work investigates the application of LLMs to offline decision making tasks.<n>By learning to align trajectory data with natural language task descriptions, our model can autoregressively predict future decisions.
- Score: 26.033533195580933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-sequence decision-making, which is usually addressed through reinforcement learning (RL), is a critical component for optimizing strategic operations in dynamic environments, such as real-time bidding in computational advertising. The Decision Transformer (DT) introduced a powerful paradigm by framing RL as an autoregressive sequence modeling problem. Concurrently, Large Language Models (LLMs) have demonstrated remarkable success in complex reasoning and planning tasks. This inspires us whether LLMs, which share the same Transformer foundation, but operate at a much larger scale, can unlock new levels of performance in long-horizon sequential decision-making problem. This work investigates the application of LLMs to offline decision making tasks. A fundamental challenge in this domain is the LLMs' inherent inability to interpret continuous values, as they lack a native understanding of numerical magnitude and order when values are represented as text strings. To address this, we propose treating trajectories as a distinct modality. By learning to align trajectory data with natural language task descriptions, our model can autoregressively predict future decisions within a cohesive framework we term DecisionLLM. We establish a set of scaling laws governing this paradigm, demonstrating that performance hinges on three factors: model scale, data volume, and data quality. In offline experimental benchmarks and bidding scenarios, DecisionLLM achieves strong performance. Specifically, DecisionLLM-3B outperforms the traditional Decision Transformer (DT) by 69.4 on Maze2D umaze-v1 and by 0.085 on AuctionNet. It extends the AIGB paradigm and points to promising directions for future exploration in online bidding.
Related papers
- Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach [37.78174504569736]
Iterative Regret-Minimization Fine-Tuning is a post-training procedure that distills low-regret decision trajectories back into the base model.<n>This reliance on model-generated reasoning avoids rigid output engineering and provides more flexible, natural-language training signals.<n>Iterative RMFT improves LLMs' DM performance across diverse models.
arXiv Detail & Related papers (2025-11-06T14:21:22Z) - Large Multimodal Models-Empowered Task-Oriented Autonomous Communications: Design Methodology and Implementation Challenges [31.57528074626831]
Large language models (LLMs) and large multimodal models (LMMs) have achieved unprecedented breakthrough.<n>This article focuses on task-oriented autonomous communications with LLMs/LMMs.<n>We show that the proposed LLM/LMM-aided autonomous systems significantly outperform conventional and discriminative deep learning (DL) model-based techniques.
arXiv Detail & Related papers (2025-10-23T15:08:58Z) - Feedback-Induced Performance Decline in LLM-Based Decision-Making [6.5990946334144756]
Large Language Models (LLMs) can extract context from natural language problem descriptions.<n>This paper studies the behaviour of these models within a Markov Decision Process (MDPs)
arXiv Detail & Related papers (2025-07-20T10:38:56Z) - DecisionFlow: Advancing Large Language Model as Principled Decision Maker [49.088778182807395]
DecisionFlow is a novel decision modeling framework that guides models to reason over structured representations of actions, attributes, and constraints.<n>Rather than predicting answers directly from prompts, DecisionFlow builds a semantically grounded decision space and infers a latent utility function.<n> Empirical results show that DecisionFlow achieves up to 30% accuracy gains over strong prompting baselines.
arXiv Detail & Related papers (2025-05-27T16:23:53Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making.<n>Existing evaluations tend to rely solely on a final success rate.<n>We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z) - Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM.
We propose a reasoning-decision alignment constraint between the paired CoTs and planning results.
We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [84.31119464141631]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.<n>Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.