Fugu-MT 論文翻訳(概要): MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

論文の概要: MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

arxiv url: http://arxiv.org/abs/2605.06132v2
Date: Thu, 14 May 2026 06:42:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-16 03:05:58.789154
Title: MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
Title（参考訳）: MemReranker: Reasoning-Aware Re rank for Agent Memory Retrieval
Authors: Chunyu Li, Mengyuan Zhang, Jingyi Kang, Ding Chen, Jiajun Shen, Bo Tang, Xuanhe Zhou, Feiyu Xiong, Zhiyu Li,
Abstract要約: 本報告では,Qwen3-Reranker を用いた多段 LLM 知識蒸留によるリグレードモデルファミリ MemReranker (0.6B/4B) について紹介する。メモリ検索ベンチマークでは、MemReranker-0.6BはBGE-Rerankerを大きく上回り、オープンソースの4B/8BモデルとGPT-4o-miniをキーメトリクスでマッチングする。 MemReranker-4B はさらに 0.737 MAP を達成し、Gemini-3-Flash と同等のメトリクスを持つ一方で、推論遅延を10-20%の大型モデルで維持している。
参考スコア（独自算出の注目度）: 37.54115468116941
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory. Most systems adopt the "retrieve-then-rerank" two-stage paradigm, but generic reranking models rely on semantic similarity matching and lack genuine reasoning capabilities, leading to a problem where recalled results are semantically highly relevant yet do not contain the key information needed to answer the question. This deficiency manifests in memory scenarios as three specific problems. First, relevance scores are miscalibrated, making threshold-based filtering difficult. Second, ranking degrades when facing temporal constraints, causal reasoning, and other complex queries. Third, the model cannot leverage dialogue context for semantic disambiguation. This report introduces MemReranker, a reranking model family (0.6B/4B) built on Qwen3-Reranker through multi-stage LLM knowledge distillation. Multi-teacher pairwise comparisons generate calibrated soft labels, BCE pointwise distillation establishes well-distributed scores, and InfoNCE contrastive learning enhances hard-sample discrimination. Training data combines general corpora with memory-specific multi-turn dialogue data covering temporal constraints, causal reasoning, and coreference resolution. On the memory retrieval benchmark, MemReranker-0.6B substantially outperforms BGE-Reranker and matches open-source 4B/8B models as well as GPT-4o-mini on key metrics. MemReranker-4B further achieves 0.737 MAP, with several metrics on par with Gemini-3-Flash, while maintaining inference latency at only 10--20% of large models. On finance and healthcare vertical-domain benchmarks, the models preserve generalization capabilities on par with mainstream large-parameter rerankers.
Abstract（参考訳）: エージェントメモリシステムでは、リランクモデルはユーザクエリと長期メモリを接続する重要なブリッジとして機能する。ほとんどのシステムでは「検索-then-rerank」という2段階のパラダイムが採用されているが、一般的なリグレードモデルは意味的類似性マッチングに頼っており、真の推論能力がないため、リコールされた結果が意味的に非常に関連性が高く、質問に答えるために必要な重要な情報を含んでいないという問題に繋がる。この欠損は記憶のシナリオにおいて3つの特定の問題として現れる。まず、関連度スコアが誤校正され、しきい値に基づくフィルタリングが困難になる。第二に、時間的制約や因果推論、その他の複雑なクエリに直面するとランクが低下する。第三に、モデルは意味的曖昧さに対話コンテキストを活用できない。本報告では,Qwen3-Reranker を用いた多段 LLM 知識蒸留によるリグレードモデルファミリ MemReranker (0.6B/4B) について紹介する。マルチ教師ペアワイズ比較は校正されたソフトラベルを生成し、BCEポイントワイズ蒸留はよく分散したスコアを確立し、InfoNCEコントラスト学習はハードサンプルの識別を促進する。トレーニングデータは、一般的なコーパスと、時間的制約、因果推論、コア参照解決を含むメモリ固有のマルチターン対話データを組み合わせる。メモリ検索ベンチマークでは、MemReranker-0.6BはBGE-Rerankerを大きく上回り、オープンソースの4B/8BモデルとGPT-4o-miniをキーメトリクスでマッチングする。 MemReranker-4B はさらに 0.737 MAP を達成し、Gemini-3-Flash と同等のメトリクスを出力し、推論遅延を10-20%の大型モデルで維持する。ファイナンスとヘルスケアの垂直ドメインベンチマークでは、モデルはメインストリームの大規模リランカーに匹敵する一般化能力を保っている。

論文の概要: MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

関連論文リスト