Structurally Aligned Subtask-Level Memory for Software Engineering Agents
- URL: http://arxiv.org/abs/2602.21611v1
- Date: Wed, 25 Feb 2026 06:13:25 GMT
- Title: Structurally Aligned Subtask-Level Memory for Software Engineering Agents
- Authors: Kangning Shen, Jingyuan Zhang, Chenxi Sun, Wencong Zeng, Yang Yue,
- Abstract summary: Large Language Models (LLMs) have demonstrated significant potential as autonomous software engineering (SWE) agents.<n>Recent work has explored augmenting these agents with memory mechanisms to support long-horizon reasoning.<n>We propose Structurally Aligned Subtask-Level Memory to align memory storage, retrieval, and updating with the agent's functional decomposition.
- Score: 15.239652771593663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated significant potential as autonomous software engineering (SWE) agents. Recent work has further explored augmenting these agents with memory mechanisms to support long-horizon reasoning. However, these approaches typically operate at a coarse instance granularity, treating the entire problem-solving episode as the atomic unit of storage and retrieval. We empirically demonstrate that instance-level memory suffers from a fundamental granularity mismatch, resulting in misguided retrieval when tasks with similar surface descriptions require distinct reasoning logic at specific stages. To address this, we propose Structurally Aligned Subtask-Level Memory, a method that aligns memory storage, retrieval, and updating with the agent's functional decomposition. Extensive experiments on SWE-bench Verified demonstrate that our method consistently outperforms both vanilla agents and strong instance-level memory baselines across diverse backbones, improving mean Pass@1 over the vanilla agent by +4.7 pp on average (e.g., +6.8 pp on Gemini 2.5 Pro). Performance gains grow with more interaction steps, showing that leveraging past experience benefits long-horizon reasoning in complex software engineering tasks.
Related papers
- E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory [4.8183840404266185]
E-mem is a framework shifting from Memory Preprocessing to Episodic Context Reconstruction.<n>E-mem achieves over 54% F1, surpassing the state-of-the-art GAM by 7.75%, while reducing token cost by over 70%.
arXiv Detail & Related papers (2026-01-29T13:42:42Z) - AMA: Adaptive Memory via Multi-Agent Collaboration [54.490349689939166]
We propose Adaptive Memory via Multi-Agent Collaboration (AMA), a novel framework that leverages coordinated agents to manage memory across multiple granularities.<n>AMA significantly outperforms state-of-the-art baselines while reducing token consumption by approximately 80% compared to full-context methods.
arXiv Detail & Related papers (2026-01-28T08:09:49Z) - Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management [63.48041801851891]
Fine-Mem is a unified framework designed for fine-grained feedback alignment.<n> Experiments on Memalpha and MemoryAgentBench demonstrate that Fine-Mem consistently outperforms strong baselines.
arXiv Detail & Related papers (2026-01-13T11:06:17Z) - ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory [57.517214479414726]
ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences.<n>At test time, an agent retrieves relevant memories from ReasoningBank to inform its interaction and then integrates new learnings back, enabling it to become more capable over time.<n>We introduce memory-aware test-time scaling (MaTTS), which accelerates and diversifies this learning process by scaling up the agent's interaction experience.
arXiv Detail & Related papers (2025-09-29T17:51:03Z) - H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents [3.9054156855794973]
Large language model (LLM)-based agents have shown strong potential in multi-task scenarios.<n>Existing approaches often treat prior experiences and knowledge as monolithic units, leading to inefficient and coarse-grained knowledge transfer.<n>We propose a novel hierarchical memory architecture that enables fine-grained knowledge transfer.
arXiv Detail & Related papers (2025-09-16T08:30:08Z) - Memp: Exploring Agent Procedural Memory [72.41472703974935]
Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters.<n>We propose Memp that distills past agent trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions.<n>We show that as the memory repository is refined, agents achieve steadily higher success rates and greater efficiency on analogous tasks.
arXiv Detail & Related papers (2025-08-08T16:20:56Z) - Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents [19.04968632268433]
We propose a hierarchical memory architecture for Large Language Model Agents (LLM Agents)<n>Each memory vector is embedded with a positional index encoding pointing to its semantically related sub-memories in the next layer.<n>During the reasoning phase, an index-based routing mechanism enables efficient, layer-by-layer retrieval without performing exhaustive similarity computations.
arXiv Detail & Related papers (2025-07-23T12:45:44Z) - MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z) - HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model [39.169389255970806]
HiAgent is a framework that leverages subgoals as memory chunks to manage the working memory of Large Language Model (LLM)-based agents hierarchically.
Results show that HiAgent achieves a twofold increase in success rate and reduces the average number of steps required by 3.8.
arXiv Detail & Related papers (2024-08-18T17:59:49Z) - A Survey on the Memory Mechanism of Large Language Model based Agents [66.4963345269611]
Large language model (LLM) based agents have recently attracted much attention from the research and industry communities.
LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems.
The key component to support agent-environment interactions is the memory of the agents.
arXiv Detail & Related papers (2024-04-21T01:49:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.