Related papers: ARIES: Autonomous Reasoning with LLMs on Interactive Thought Graph Environments

ARIES: Autonomous Reasoning with LLMs on Interactive Thought Graph Environments

URL: http://arxiv.org/abs/2502.21208v1
Date: Fri, 28 Feb 2025 16:28:13 GMT
Title: ARIES: Autonomous Reasoning with LLMs on Interactive Thought Graph Environments
Authors: Pedro Gimenes, Zeyu Cao, Jeffrey Wong, Yiren Zhao,
Abstract summary: We introduce ARIES, a multi-agent architecture for reasoning with LLMs.<n>We observe that using off-the-shelf LLMs as policy agents with no supervised fine-tuning (SFT) can yield up to $29%$ higher accuracy on HumanEval.<n>We also conduct a thorough analysis of observed failure modes, highlighting that limitations on LLM sizes and the depth of problem decomposition can be seen as challenges to scaling LLM-guided reasoning.
Score: 7.508204100423766
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent research has shown that LLM performance on reasoning tasks can be enhanced by scaling test-time compute. One promising approach, particularly with decomposable problems, involves arranging intermediate solutions as a graph on which transformations are performed to explore the solution space. However, prior works rely on pre-determined, task-specific transformation schedules which are subject to a set of searched hyperparameters. In this work, we view thought graph transformations as actions in a Markov decision process, and implement policy agents to drive effective action policies for the underlying reasoning LLM agent. In particular, we investigate the ability for another LLM to act as a policy agent on thought graph environments and introduce ARIES, a multi-agent architecture for reasoning with LLMs. In ARIES, reasoning LLM agents solve decomposed subproblems, while policy LLM agents maintain visibility of the thought graph states, and dynamically adapt the problem-solving strategy. Through extensive experiments, we observe that using off-the-shelf LLMs as policy agents with no supervised fine-tuning (SFT) can yield up to $29\%$ higher accuracy on HumanEval relative to static transformation schedules, as well as reducing inference costs by $35\%$ and avoid any search requirements. We also conduct a thorough analysis of observed failure modes, highlighting that limitations on LLM sizes and the depth of problem decomposition can be seen as challenges to scaling LLM-guided reasoning.

Related papers

Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM [15.260794368585692]
We propose OR-LLM-Agent, an AI agent framework built on reasoning LLMs for automated Operations Research problem solving.<n>We show that OR-LLM-Agent utilizing DeepSeek-R1 in its framework outperforms advanced methods, including GPT-o3, Gemini 2.5 Pro, DeepSeek-R1, and ORLM, by at least 7% in accuracy.
arXiv Detail & Related papers (2025-03-13T03:40:50Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives [8.713076928533846]
Decomposing hard problems into subproblems often makes them easier and more efficient to solve. This paper argues that analysis with LLM primitives is needed to reason about the efficiency of such systems.
arXiv Detail & Related papers (2025-02-04T20:47:43Z)
Systematic Analysis of LLM Contributions to Planning: Solver, Verifier, Heuristic [6.687149103409949]
We provide a systematic analysis of how large language models (LLMs) contribute to solving planning problems.<n>Our analysis reveals that although it is difficult for LLMs to generate correct plans out-of-the-box, LLMs are much better at providing feedback signals to intermediate/incomplete solutions.
arXiv Detail & Related papers (2024-12-12T18:16:46Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search [37.16633337724158]
DOTS is an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search. Our method consistently outperforms static reasoning techniques and the vanilla instruction tuning approach.
arXiv Detail & Related papers (2024-10-04T18:58:09Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
DiLA: Enhancing LLM Tool Learning with Differential Logic Layer [11.810200077863172]
We propose a novel differential logic layer-aided language modeling (DiLA) approach, where logical constraints are integrated into the forward and backward passes of a network layer. We evaluate the performance of DiLA on two classic reasoning problems and empirically demonstrate its consistent outperformance against existing prompt-based and solver-aided approaches.
arXiv Detail & Related papers (2024-02-19T07:38:57Z)
LgTS: Dynamic Task Sampling using LLM-generated sub-goals for Reinforcement Learning Agents [10.936460061405157]
We propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs. Our approach does not assume access to a propreitary or a fine-tuned LLM, nor does it require pre-trained policies that achieve the sub-goals proposed by the LLM.
arXiv Detail & Related papers (2023-10-14T00:07:03Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.