Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
- URL: http://arxiv.org/abs/2511.21726v1
- Date: Thu, 20 Nov 2025 22:45:57 GMT
- Title: Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
- Authors: Yicong Zheng, Kevin L. McKee, Thomas Miconi, Zacharie Bugaud, Mick van Gelderen, Jed McCaleb,
- Abstract summary: How to enable human-like long-term memory in large language models (LLMs) has been a central question.<n>We present SUMER, an end-to-end reinforcement learning agent with verifiable reward (RLVR)<n>We demonstrate that a simple search method applied to raw data outperforms goal-agnostic and biased compression algorithms in current long-context memory tasks.
- Score: 2.7708222692419735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How to enable human-like long-term memory in large language models (LLMs) has been a central question for unlocking more general capabilities such as few-shot generalization. Existing memory frameworks and benchmarks focus on finding the optimal memory compression algorithm for higher performance in tasks that require recollection and sometimes further reasoning. However, such efforts have ended up building more human bias into the compression algorithm, through the search for the best prompts and memory architectures that suit specific benchmarks, rather than finding a general solution that would work on other data distributions. On the other hand, goal-directed search on uncompressed information could potentially exhibit superior performance because compression is lossy, and a predefined compression algorithm will not fit all raw data distributions. Here we present SUMER (Search in Uncompressed Memory via Experience Replay), an end-to-end reinforcement learning agent with verifiable reward (RLVR) that learns to use search tools to gather information and answer a target question. On the LoCoMo dataset for long-context conversation understanding, SUMER with Qwen2.5-7B-Instruct learned to use search tools and outperformed all other biased memory compression approaches and also the full-context baseline, reaching SOTA performance (43% gain over the prior best). We demonstrate that a simple search method applied to raw data outperforms goal-agnostic and biased compression algorithms in current long-context memory tasks, arguing for new paradigms and benchmarks that are more dynamic and autonomously scalable. Code for SUMER and all implemented baselines is publicly available at https://github.com/zycyc/SUMER.
Related papers
- Multi-Vector Index Compression in Any Modality [73.7330345057813]
Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos.<n>We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC)<n>AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation.
arXiv Detail & Related papers (2026-02-24T18:57:33Z) - Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning [47.87361916374891]
We propose a framework for efficient long-context inference based on chunk-wise compression and selective memory recall.<n>The framework segments long inputs into chunks and encodes each chunk into compressed memory representations using a learned compressor.<n>It achieves up to a 2 times reduction in peak GPU memory usage and a 6 times inference speedup over MemAgent.
arXiv Detail & Related papers (2026-02-09T08:33:11Z) - Forward Index Compression for Learned Sparse Retrieval [15.629655228398567]
We focus on the size of a data structure that is common to all algorithmic flavors and that constitutes a substantial fraction of the overall index size: the forward index.<n>In particular, we seek compression techniques to reduce the storage footprint of the forward index without compromising search quality or inner product computation latency.
arXiv Detail & Related papers (2026-02-05T08:35:17Z) - MALLOC: Benchmarking the Memory-aware Long Sequence Compression for Large Sequential Recommendation [84.53415999381203]
MALLOC is a benchmark for memory-aware long sequence compression.<n>It is integrated into state-of-the-art recommenders, enabling a reproducible and evaluation platform.
arXiv Detail & Related papers (2026-01-28T04:11:50Z) - MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning [73.27233666920618]
We propose MemSearcher, an agent workflow that iteratively maintains a compact memory and combines the current turn with it.<n>At each turn, MemSearcher fuses the user's question with the memory to generate reasoning traces, perform search actions, and update memory to retain only information essential for solving the task.<n>We introduce multi-context GRPO, an end-to-end RL framework that jointly optimize reasoning, search strategies, and memory management of MemSearcher Agents.
arXiv Detail & Related papers (2025-11-04T18:27:39Z) - UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression [86.33995240043936]
UniGist is a sequence-level long-context compression framework for large language models.<n>It efficiently preserves context information by replacing raw tokens with special compression tokens (gists) in a fine-grained manner.<n>Our scheme also supports flexible inference by allowing the actual removal of compressed tokens, resulting in real-time memory savings.
arXiv Detail & Related papers (2025-09-19T08:47:37Z) - CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning [22.93037884068796]
Retrieval-Augmented Generation (RAG) has emerged as a promising approach to enhance the timeliness of knowledge updates and the factual accuracy of responses in large language models.<n>Existing approaches to document compression tailored for RAG often degrade task performance.<n>We propose CORE, a novel method for lossless context compression in RAG.
arXiv Detail & Related papers (2025-08-24T12:21:50Z) - A Universal Framework for Compressing Embeddings in CTR Prediction [68.27582084015044]
We introduce a Model-agnostic Embedding Compression (MEC) framework that compresses embedding tables by quantizing pre-trained embeddings.<n>Our approach consists of two stages: first, we apply popularity-weighted regularization to balance code distribution between high- and low-frequency features.<n> Experiments on three datasets reveal that our method reduces memory usage by over 50x while maintaining or improving recommendation performance.
arXiv Detail & Related papers (2025-02-21T10:12:34Z) - ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference [61.412894960600205]
Large Language Models (LLMs) require significant GPU memory when processing long texts.<n>ChunkKV reimagines KV cache compression by treating semantic chunks as basic compression units.<n>Result: ChunkKV outperforms state-of-the-art methods by up to 8.7% in precision.
arXiv Detail & Related papers (2025-02-01T03:49:47Z) - Generalized compression and compressive search of large datasets [0.0]
panCAKES is a novel approach to compressive search, i.e., a way to perform $k$-NN and $rho$-NN search on compressed data.<n> panCAKES assumes the manifold hypothesis and leverages the low-dimensional structure of the data to compress and search it efficiently.<n>We benchmark panCAKES on a variety of datasets, including genomic, proteomic, and set data.
arXiv Detail & Related papers (2024-09-18T17:25:31Z) - FETCH: A Memory-Efficient Replay Approach for Continual Learning in Image Classification [7.29168794682254]
Class-incremental continual learning is an important area of research.
In previous works, promising results were achieved using replay and compressed replay techniques.
This work is to evaluate compressed replay in the pipeline of GDumb.
arXiv Detail & Related papers (2024-07-17T07:54:03Z) - AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval [1.099532646524593]
We propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads compressed vectors to the SSD index.<n>Our method achieves $sim$10 MB memory usage in query search with billion-scale datasets without critical latency degradation.
arXiv Detail & Related papers (2024-04-09T04:20:27Z) - Memory Replay with Data Compression for Continual Learning [80.95444077825852]
We propose memory replay with data compression to reduce the storage cost of old training samples.
We extensively validate this across several benchmarks of class-incremental learning and in a realistic scenario of object detection for autonomous driving.
arXiv Detail & Related papers (2022-02-14T10:26:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.