Fugu-MT 論文翻訳(概要): LMEB: Long-horizon Memory Embedding Benchmark

論文の概要: LMEB: Long-horizon Memory Embedding Benchmark

arxiv url: http://arxiv.org/abs/2603.12572v1
Date: Fri, 13 Mar 2026 02:09:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:11.845123
Title: LMEB: Long-horizon Memory Embedding Benchmark
Title（参考訳）: LMEB:Long-Horizon Memory Embedding Benchmark
Authors: Xinping Zhao, Xinshuo Hu, Jiaxin Xu, Danyu Tang, Xin Zhang, Mengjia Zhou, Yan Zhong, Yao Zhou, Zifei Shan, Meishan Zhang, Baotian Hu, Min Zhang,
Abstract要約: 埋め込みモデルの能力を評価する包括的なフレームワークであるLong-Horizon Memory Embedding Benchmark (LMEB)を紹介する。 LMEBは4つのメモリタイプにまたがる22のデータセットと193のゼロショット検索タスクにまたがる。我々は、数億から100億のパラメータを含む、広く使われている15の埋め込みモデルを評価した。
参考スコア（独自算出の注目度）: 49.57481835614834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Memory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage retrieval and fail to assess models' ability to handle long-horizon memory retrieval tasks involving fragmented, context-dependent, and temporally distant information. To address this, we introduce the Long-horizon Memory Embedding Benchmark (LMEB), a comprehensive framework that evaluates embedding models' capabilities in handling complex, long-horizon memory retrieval tasks. LMEB spans 22 datasets and 193 zero-shot retrieval tasks across 4 memory types: episodic, dialogue, semantic, and procedural, with both AI-generated and human-annotated data. These memory types differ in terms of level of abstraction and temporal dependency, capturing distinct aspects of memory retrieval that reflect the diverse challenges of the real world. We evaluate 15 widely used embedding models, ranging from hundreds of millions to ten billion parameters. The results reveal that (1) LMEB provides a reasonable level of difficulty; (2) Larger models do not always perform better; (3) LMEB and MTEB exhibit orthogonality. This suggests that the field has yet to converge on a universal model capable of excelling across all memory retrieval tasks, and that performance in traditional passage retrieval may not generalize to long-horizon memory retrieval. In summary, by providing a standardized and reproducible evaluation framework, LMEB fills a crucial gap in memory embedding evaluation, driving further advancements in text embedding for handling long-term, context-dependent memory retrieval. LMEB is available at https://github.com/KaLM-Embedding/LMEB.
Abstract（参考訳）: メモリ埋め込みは、OpenClawのようなメモリ拡張システムにとって重要であるが、現在のテキスト埋め込みベンチマークでは、その評価が過小評価されている。そこで本稿では,複雑な長期メモリ検索タスクの処理において,埋め込みモデルの能力を評価する包括的なフレームワークであるLong-Horizon Memory Embedding Benchmark(LMEB)を紹介する。 LMEBは、エピソード、対話、セマンティック、手続きという4つのメモリタイプにまたがる、22のデータセットと193のゼロショット検索タスクに、AI生成データと人間アノテーションデータの両方で対応している。これらのメモリタイプは抽象化と時間的依存のレベルで異なり、現実世界の様々な課題を反映したメモリ検索の異なる側面を捉えている。我々は、数億から100億のパラメータを含む、広く使われている15の埋め込みモデルを評価した。その結果,(1)LMEBの難易度は合理的であり,(2)大規模モデルは必ずしも良好ではない,(3)LMEBとMTEBは直交性を示すことがわかった。このことは、フィールドが全てのメモリ検索タスクで優れたユニバーサルモデルにはまだ収束していないことを示唆し、従来のパス検索のパフォーマンスが長期記憶検索に一般化できないことを示唆している。要約すると、LMEBは、標準化され再現可能な評価フレームワークを提供することにより、メモリ埋め込み評価において重要なギャップを埋め、長期のコンテキスト依存メモリ検索を扱うためのテキスト埋め込みのさらなる進歩を推進している。 LMEBはhttps://github.com/KaLM-Embedding/LMEBで入手できる。

論文の概要: LMEB: Long-horizon Memory Embedding Benchmark

関連論文リスト