Related papers: Structurally Aligned Subtask-Level Memory for Software Engineering Agents

Structurally Aligned Subtask-Level Memory for Software Engineering Agents

URL: http://arxiv.org/abs/2602.21611v1
Date: Wed, 25 Feb 2026 06:13:25 GMT
Title: Structurally Aligned Subtask-Level Memory for Software Engineering Agents
Authors: Kangning Shen, Jingyuan Zhang, Chenxi Sun, Wencong Zeng, Yang Yue,
Abstract summary: Large Language Models (LLMs) have demonstrated significant potential as autonomous software engineering (SWE) agents.<n>Recent work has explored augmenting these agents with memory mechanisms to support long-horizon reasoning.<n>We propose Structurally Aligned Subtask-Level Memory to align memory storage, retrieval, and updating with the agent's functional decomposition.
Score: 15.239652771593663
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated significant potential as autonomous software engineering (SWE) agents. Recent work has further explored augmenting these agents with memory mechanisms to support long-horizon reasoning. However, these approaches typically operate at a coarse instance granularity, treating the entire problem-solving episode as the atomic unit of storage and retrieval. We empirically demonstrate that instance-level memory suffers from a fundamental granularity mismatch, resulting in misguided retrieval when tasks with similar surface descriptions require distinct reasoning logic at specific stages. To address this, we propose Structurally Aligned Subtask-Level Memory, a method that aligns memory storage, retrieval, and updating with the agent's functional decomposition. Extensive experiments on SWE-bench Verified demonstrate that our method consistently outperforms both vanilla agents and strong instance-level memory baselines across diverse backbones, improving mean Pass@1 over the vanilla agent by +4.7 pp on average (e.g., +6.8 pp on Gemini 2.5 Pro). Performance gains grow with more interaction steps, showing that leveraging past experience benefits long-horizon reasoning in complex software engineering tasks.

Related papers

E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory [4.8183840404266185]
E-mem is a framework shifting from Memory Preprocessing to Episodic Context Reconstruction.<n>E-mem achieves over 54% F1, surpassing the state-of-the-art GAM by 7.75%, while reducing token cost by over 70%.
arXiv Detail & Related papers (2026-01-29T13:42:42Z)
AMA: Adaptive Memory via Multi-Agent Collaboration [54.490349689939166]
We propose Adaptive Memory via Multi-Agent Collaboration (AMA), a novel framework that leverages coordinated agents to manage memory across multiple granularities.<n>AMA significantly outperforms state-of-the-art baselines while reducing token consumption by approximately 80% compared to full-context methods.
arXiv Detail & Related papers (2026-01-28T08:09:49Z)
Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management [63.48041801851891]
Fine-Mem is a unified framework designed for fine-grained feedback alignment.<n> Experiments on Memalpha and MemoryAgentBench demonstrate that Fine-Mem consistently outperforms strong baselines.
arXiv Detail & Related papers (2026-01-13T11:06:17Z)
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory [57.517214479414726]
ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences.<n>At test time, an agent retrieves relevant memories from ReasoningBank to inform its interaction and then integrates new learnings back, enabling it to become more capable over time.<n>We introduce memory-aware test-time scaling (MaTTS), which accelerates and diversifies this learning process by scaling up the agent's interaction experience.
arXiv Detail & Related papers (2025-09-29T17:51:03Z)
H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents [3.9054156855794973]
Large language model (LLM)-based agents have shown strong potential in multi-task scenarios.<n>Existing approaches often treat prior experiences and knowledge as monolithic units, leading to inefficient and coarse-grained knowledge transfer.<n>We propose a novel hierarchical memory architecture that enables fine-grained knowledge transfer.
arXiv Detail & Related papers (2025-09-16T08:30:08Z)
Memp: Exploring Agent Procedural Memory [72.41472703974935]
Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters.<n>We propose Memp that distills past agent trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions.<n>We show that as the memory repository is refined, agents achieve steadily higher success rates and greater efficiency on analogous tasks.
arXiv Detail & Related papers (2025-08-08T16:20:56Z)
Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents [19.04968632268433]
We propose a hierarchical memory architecture for Large Language Model Agents (LLM Agents)<n>Each memory vector is embedded with a positional index encoding pointing to its semantically related sub-memories in the next layer.<n>During the reasoning phase, an index-based routing mechanism enables efficient, layer-by-layer retrieval without performing exhaustive similarity computations.
arXiv Detail & Related papers (2025-07-23T12:45:44Z)
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z)
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model [39.169389255970806]
HiAgent is a framework that leverages subgoals as memory chunks to manage the working memory of Large Language Model (LLM)-based agents hierarchically. Results show that HiAgent achieves a twofold increase in success rate and reduces the average number of steps required by 3.8.
arXiv Detail & Related papers (2024-08-18T17:59:49Z)
A Survey on the Memory Mechanism of Large Language Model based Agents [66.4963345269611]
Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems. The key component to support agent-environment interactions is the memory of the agents.
arXiv Detail & Related papers (2024-04-21T01:49:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.