Related papers: Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

URL: http://arxiv.org/abs/2602.23008v1
Date: Thu, 26 Feb 2026 13:50:57 GMT
Title: Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Authors: Zeyuan Liu, Jeonghye Kim, Xufang Luo, Dongsheng Li, Yuqing Yang,
Abstract summary: Exploration remains the key bottleneck for large language model agents trained with reinforcement learning.<n>We propose Exploratory Memory-Augmented On- and Off-Policy Optimization (EMPO$2$), a hybrid RL framework that leverages memory for exploration.<n>On ScienceWorld and WebShop, EMPO$2$ achieves 128.6% and 11.3% improvements over GRPO, respectively.
Score: 34.50047418642433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Exploration remains the key bottleneck for large language model agents trained with reinforcement learning. While prior methods exploit pretrained knowledge, they fail in environments requiring the discovery of novel states. We propose Exploratory Memory-Augmented On- and Off-Policy Optimization (EMPO$^2$), a hybrid RL framework that leverages memory for exploration and combines on- and off-policy updates to make LLMs perform well with memory while also ensuring robustness without it. On ScienceWorld and WebShop, EMPO$^2$ achieves 128.6% and 11.3% improvements over GRPO, respectively. Moreover, in out-of-distribution tests, EMPO$^2$ demonstrates superior adaptability to new tasks, requiring only a few trials with memory and no parameter updates. These results highlight EMPO$^2$ as a promising framework for building more exploratory and generalizable LLM-based agents.

Related papers

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning [78.46301394559903]
Large Language Models (LLMs) are increasingly used for long-duration tasks.<n>Current methods face a trade-off between cost and accuracy.<n>MemSifter is a novel framework that offloads the memory retrieval process to a small-scale proxy model.
arXiv Detail & Related papers (2026-03-03T02:57:38Z)
Towards Autonomous Memory Agents [8.294673275138122]
We propose autonomous memory agents that actively acquire, validate, and curate knowledge at a minimum cost.<n>U-Mem materializes this idea via (i) a cost-aware knowledge-extraction cascade that escalates from cheap self/teacher signals to tool-verified research.<n>On both verifiable and non-verifiable benchmarks, U-Mem consistently beats prior memory baselines and can surpass RL-based optimization.
arXiv Detail & Related papers (2026-02-25T20:59:44Z)
Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates [53.3717573880076]
We introduce Just-In-Time Reinforcement Learning (JitRL), a training-free framework that enables test-time policy optimization without any gradient updates.<n>JitRL maintains a dynamic, non-parametric memory of experiences and retrieves relevant trajectories to estimate action advantages on-the-fly.<n>Experiments on WebArena and Jericho demonstrate that JitRL establishes a new state-of-the-art among training-free methods.
arXiv Detail & Related papers (2026-01-26T14:16:51Z)
SWE-Tester: Training Open-Source LLMs for Issue Reproduction in Real-World Repositories [4.70019882353957]
SWE-Tester is a novel pipeline for training open-source LLMs to generate issue reproduction tests.<n>First, we curate a high-quality training dataset of 41K instances from 2.6K open-source GitHub repositories.<n>The fine-tuned models achieve absolute improvements of up to 10% in success rate and 21% in change coverage on SWT-Bench Verified.
arXiv Detail & Related papers (2026-01-20T08:10:56Z)
Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z)
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs [36.3424780932712]
We introduce a novel learning paradigm for Adaptive Large Language Model (LLM) agents.<n>Our method enables low-cost continual adaptation via memory-based online reinforcement learning.<n>We instantiate our agent model in the deep research setting, namely emphMemento, which attains top-1 on GAIA validation.
arXiv Detail & Related papers (2025-08-22T07:25:30Z)
Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework [33.739298910759544]
We propose to optimize LLM-based agents with an adaptive and data-driven memory framework by modeling memory cycles.<n>Specifically, we design an MoE gate function to facilitate memory retrieval, propose a learnable aggregation process to improve memory utilization, and develop task-specific reflection to adapt memory storage.
arXiv Detail & Related papers (2025-08-15T12:22:52Z)
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection [72.92366526004464]
Retrieval-Augmented Generation (RAG) has proven effective in enabling Large Language Models (LLMs) to produce more accurate and reliable responses.<n>We propose a novel Self-Selection RAG framework, where the LLM is made to select from pairwise responses generated with internal parametric knowledge solely.
arXiv Detail & Related papers (2025-02-10T04:29:36Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning [56.887047551101574]
We present DS-Agent, a novel framework that harnesses large language models (LLMs) agent and case-based reasoning (CBR) In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle. In the deployment stage, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm, significantly reducing the demand on foundational capabilities of LLMs.
arXiv Detail & Related papers (2024-02-27T12:26:07Z)
LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination [20.269899169364397]
Large Language Models (LLMs) have exhibited remarkable proficiency in comprehending and generating natural language. We propose a novel computational bionic memory mechanism, equipped with a parameter-efficient fine-tuning (PEFT) schema, to personalize medical assistants.
arXiv Detail & Related papers (2023-09-21T00:34:33Z)
Large Language Models Are Semi-Parametric Reinforcement Learning Agents [15.908831573619842]
REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals. Reinforcement Learning with Experience Memory (RLEM) is introduced to update the memory. Experiments are conducted on two RL task sets to evaluate the proposed framework.
arXiv Detail & Related papers (2023-06-09T08:08:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.