TeleMem: Building Long-Term and Multimodal Memory for Agentic AI
- URL: http://arxiv.org/abs/2601.06037v4
- Date: Thu, 22 Jan 2026 13:48:29 GMT
- Title: TeleMem: Building Long-Term and Multimodal Memory for Agentic AI
- Authors: Chunliang Chen, Ming Guan, Xiao Lin, Jiaxu Li, Luxi Lin, Qiyi Wang, Xiangyu Chen, Jixiang Luo, Changzhi Sun, Dell Zhang, Xuelong Li,
- Abstract summary: Large language models (LLMs) excel at many NLP tasks but struggle to sustain long-term interactions due to limited attention over extended dialogue histories.<n>We propose TeleMem, a unified long-term and multimodal memory system that maintains coherent user profiles through narrative dynamic extraction.<n>TeleMem surpasses the state-of-the-art Mem0 baseline with 19% higher accuracy, 43% fewer tokens, and a 2.1x speedup on the ZH-4O long-term role-play gaming benchmark.
- Score: 43.36544433800511
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) excel at many NLP tasks but struggle to sustain long-term interactions due to limited attention over extended dialogue histories. Retrieval-augmented generation (RAG) mitigates this issue but lacks reliable mechanisms for updating or refining stored memories, leading to schema-driven hallucinations, inefficient write operations, and minimal support for multimodal reasoning.To address these challenges, we propose TeleMem, a unified long-term and multimodal memory system that maintains coherent user profiles through narrative dynamic extraction, ensuring that only dialogue-grounded information is preserved. TeleMem further introduces a structured writing pipeline that batches, retrieves, clusters, and consolidates memory entries, substantially improving storage efficiency, reducing token usage, and accelerating memory operations. Additionally, a multimodal memory module combined with ReAct-style reasoning equips the system with a closed-loop observe, think, and act process that enables accurate understanding of complex video content in long-term contexts. Experimental results show that TeleMem surpasses the state-of-the-art Mem0 baseline with 19% higher accuracy, 43% fewer tokens, and a 2.1x speedup on the ZH-4O long-term role-play gaming benchmark.
Related papers
- Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents [76.76004970226485]
Long-term memory is a critical capability for multimodal large language model (MLLM) agents.<n>Mem-Gallery is a new benchmark for evaluating multimodal long-term conversational memory in MLLM agents.
arXiv Detail & Related papers (2026-01-07T02:03:13Z) - MemVerse: Multimodal Memory for Lifelong Learning Agents [35.218549149012844]
We introduce MemVerse, a model-agnostic, plug-and-play memory framework.<n>MemVerse bridges fast parametric recall with hierarchical retrieval-based memory.<n>It enables scalable and adaptive multimodal intelligence.
arXiv Detail & Related papers (2025-12-03T10:06:14Z) - WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning [66.24870234484668]
We introduce WorldMM, a novel multimodal memory agent that constructs and retrieves from multiple complementary memories.<n>WorldMM significantly outperforms existing baselines across five long video question-answering benchmarks.
arXiv Detail & Related papers (2025-12-02T05:14:52Z) - SGMem: Sentence Graph Memory for Long-Term Conversational Agents [14.89396085814917]
We introduce SGMem (Sentence Graph Memory), which represents dialogue as sentence-level graphs within chunked units.<n>We show that SGMem consistently improves accuracy and outperforms strong baselines in long-term conversational question answering.
arXiv Detail & Related papers (2025-09-25T14:21:44Z) - InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions [104.90258030688256]
This project introduces disentangled streaming perception, reasoning, and memory mechanisms, enabling real-time interaction with streaming video and audio input.<n>This project simulates human-like cognition, enabling multimodal large language models to provide continuous and adaptive service over time.
arXiv Detail & Related papers (2024-12-12T18:58:30Z) - LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory [68.97819665784442]
We introduce LongMemEval, a benchmark designed to evaluate five core long-term memory abilities of chat assistants.<n>LongMemEval presents a significant challenge to existing long-term memory systems.<n>We present a unified framework that breaks down the long-term memory design into three stages: indexing, retrieval, and reading.
arXiv Detail & Related papers (2024-10-14T17:59:44Z) - SCM: Enhancing Large Language Model with Self-Controlled Memory Framework [54.33686574304374]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.<n>We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.