Related papers: Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement

Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement

URL: http://arxiv.org/abs/2511.05931v1
Date: Sat, 08 Nov 2025 08:49:38 GMT
Title: Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement
Authors: Hiroaki Hayashi, Bo Pang, Wenting Zhao, Ye Liu, Akash Gokul, Srijan Bansal, Caiming Xiong, Semih Yavuz, Yingbo Zhou,
Abstract summary: Large language model (LLM) based agents are increasingly used to tackle software engineering tasks.<n>We propose Self-Abstraction from Grounded Experience (SAGE), a framework that enables agents to learn from their own task executions.
Score: 61.35824395228412
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language model (LLM) based agents are increasingly used to tackle software engineering tasks that require multi-step reasoning and code modification, demonstrating promising yet limited performance. However, most existing LLM agents typically operate within static execution frameworks, lacking a principled mechanism to learn and self-improve from their own experience and past rollouts. As a result, their performance remains bounded by the initial framework design and the underlying LLM's capabilities. We propose Self-Abstraction from Grounded Experience (SAGE), a framework that enables agents to learn from their own task executions and refine their behavior through self-abstraction. After an initial rollout, the agent induces a concise plan abstraction from its grounded experience, distilling key steps, dependencies, and constraints. This learned abstraction is then fed back as contextual guidance, refining the agent's policy and supporting more structured, informed subsequent executions. Empirically, SAGE delivers consistent performance gains across diverse LLM backbones and agent architectures. Notably, it yields a 7.2% relative performance improvement over the strong Mini-SWE-Agent baseline when paired with the GPT-5 (high) backbone. SAGE further achieves strong overall performance on SWE-Bench Verified benchmark, reaching 73.2% and 74% Pass@1 resolve rates with the Mini-SWE-Agent and OpenHands CodeAct agent framework, respectively.

Related papers

Demonstration-Free Robotic Control via LLM Agents [0.0]
We introduce FAEA (Frontier Agent as Embodied Agent), which applies an LLM agent framework directly to embodied manipulation without modification.<n>With privileged environment state access, FAEA achieves success rates of 84.9%, 85.7%, and 96%, respectively.<n>Our results indicate that general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning.
arXiv Detail & Related papers (2026-01-28T07:49:35Z)
A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge [1.932555230783329]
Lightweight, open-source Python framework designed to democratize the construction of LLM-driven autonomous agents.<n>AgentForge introduces three key innovations: (1) a composable skill abstraction that enables fine-grained task decomposition with formally defined input-output contracts, (2) a unified backend interface supporting seamless switching between cloud-based APIs and local inference engines, and (3) a declarative YAML-based configuration system that separates agent logic from implementation details.
arXiv Detail & Related papers (2026-01-19T20:33:26Z)
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z)
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z)
EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle [26.048906477714937]
Current Large Language Model (LLM) agents show strong performance in tool use, but lack the capability to systematically learn from their own experiences.<n>We introduce EvolveR, a framework designed to enable agent to self-improve through a complete, closed-loop experience lifecycle.<n>We demonstrate the effectiveness of EvolveR on complex multi-hop question-answering benchmarks, where it achieves superior performance over strong agentic baselines.
arXiv Detail & Related papers (2025-10-17T12:03:16Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
Blueprint First, Model Second: A Framework for Deterministic LLM Workflow [3.9886771197662925]
We introduce the Source Code Agent framework, a new paradigm built on the "Blueprint First, Model Second" philosophy.<n>Our framework decouples the workflow logic from the generative model.<n>Our work enables the verifiable and reliable deployment of autonomous agents in applications governed by strict procedural logic.
arXiv Detail & Related papers (2025-08-01T03:10:00Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning [56.887047551101574]
We present DS-Agent, a novel framework that harnesses large language models (LLMs) agent and case-based reasoning (CBR) In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle. In the deployment stage, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm, significantly reducing the demand on foundational capabilities of LLMs.
arXiv Detail & Related papers (2024-02-27T12:26:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.