Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation
- URL: http://arxiv.org/abs/2510.19897v1
- Date: Wed, 22 Oct 2025 17:58:03 GMT
- Title: Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation
- Authors: Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, Estevam Hruschka,
- Abstract summary: We investigate how agents built on pretrained large language models can learn target classification functions from labeled examples without parameter updates.<n>Our framework uses episodic memory to store instance-level critiques and distill these into reusable, task-level guidance.<n>Our findings highlight the promise of memory-driven, reflective learning for building more adaptive and interpretable LLM agents.
- Score: 11.819481846962447
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We investigate how agents built on pretrained large language models can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often costly, inflexible, and opaque, we propose a memory-augmented framework that leverages both labeled data and LLM-generated critiques. Our framework uses episodic memory to store instance-level critiques-capturing specific past experiences-and semantic memory to distill these into reusable, task-level guidance. Across a diverse set of tasks, incorporating critiques yields up to a 24.8 percent accuracy improvement over retrieval-based (RAG-style) baselines that rely only on labels. Through extensive empirical evaluation, we uncover distinct behavioral differences between OpenAI and opensource models, particularly in how they handle fact-oriented versus preference-based data. To interpret how models respond to different representations of supervision encoded in memory, we introduce a novel metric, suggestibility. This helps explain observed behaviors and illuminates how model characteristics and memory strategies jointly shape learning dynamics. Our findings highlight the promise of memory-driven, reflective learning for building more adaptive and interpretable LLM agents.
Related papers
- EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory [63.84216832544323]
EvolMem is a new benchmark for assessing multi-session memory capabilities of large language models (LLMs) and agent systems.<n>To construct the benchmark, we introduce a hybrid data synthesis framework that consists of topic-initiated generation and narrative-inspired transformations.<n>Extensive evaluation reveals that no LLM consistently outperforms others across all memory dimensions.
arXiv Detail & Related papers (2026-01-07T03:14:42Z) - Memento 2: Learning by Stateful Reflective Memory [4.7052412989773975]
We study continual learning in large language model (LLM) based agents that integrate episodic memory with reinforcement learning.<n>We focus on reflection, the ability of an agent to revisit past experience and adjust how it selects future actions.<n>We introduce the Stateful Reflective Decision Process (SRDP), in which an agent maintains and updates episodic memory and alternates between writing new experiences to memory and reading relevant cases to guide decisions.
arXiv Detail & Related papers (2025-12-27T22:15:03Z) - Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework [33.739298910759544]
We propose to optimize LLM-based agents with an adaptive and data-driven memory framework by modeling memory cycles.<n>Specifically, we design an MoE gate function to facilitate memory retrieval, propose a learnable aggregation process to improve memory utilization, and develop task-specific reflection to adapt memory storage.
arXiv Detail & Related papers (2025-08-15T12:22:52Z) - MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents [26.647812147336538]
We construct a more comprehensive dataset and benchmark to evaluate the memory capability of LLM-based agents.<n>Our dataset incorporates factual memory and reflective memory as different levels, and proposes participation and observation as various interactive scenarios.<n>Based on our dataset, we present a benchmark, named MemBench, to evaluate the memory capability of LLM-based agents from multiple aspects, including their effectiveness, efficiency, and capacity.
arXiv Detail & Related papers (2025-06-20T10:09:23Z) - Retrospective Memory for Camouflaged Object Detection [18.604039107883317]
We propose a recall-augmented COD architecture, namely RetroMem, which dynamically modulates camouflage pattern perception and inference.<n>In the recall stage, we propose a dynamic memory mechanism and an inference pattern reconstruction.<n>Our RetroMem significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-06-18T08:22:19Z) - How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior [65.70584076918679]
Memory is a critical component in large language model (LLM)-based agents.<n>This paper studies how memory management choices impact the LLM agents' behavior, especially their long-term performance.
arXiv Detail & Related papers (2025-05-21T22:35:01Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - In-Memory Learning: A Declarative Learning Framework for Large Language
Models [56.62616975119192]
We propose a novel learning framework that allows agents to align with their environment without relying on human-labeled data.
This entire process transpires within the memory components and is implemented through natural language.
We demonstrate the effectiveness of our framework and provide insights into this problem.
arXiv Detail & Related papers (2024-03-05T08:25:11Z) - Intuitive or Dependent? Investigating LLMs' Behavior Style to
Conflicting Prompts [9.399159332152013]
This study investigates the behaviors of Large Language Models (LLMs) when faced with conflicting prompts versus their internal memory.
This will help to understand LLMs' decision mechanism and also benefit real-world applications, such as retrieval-augmented generation (RAG)
arXiv Detail & Related papers (2023-09-29T17:26:03Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Meta-Learning with Variational Semantic Memory for Word Sense
Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting.
Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork.
We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z) - Learning to Learn Variational Semantic Memory [132.39737669936125]
We introduce variational semantic memory into meta-learning to acquire long-term knowledge for few-shot learning.
The semantic memory is grown from scratch and gradually consolidated by absorbing information from tasks it experiences.
We formulate memory recall as the variational inference of a latent memory variable from addressed contents.
arXiv Detail & Related papers (2020-10-20T15:05:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.