Fugu-MT 論文翻訳(概要): Closing the Indexing-Decoding Gap in Multimodal Generative Retrieval via Prefix Retention Optimization

論文の概要: Closing the Indexing-Decoding Gap in Multimodal Generative Retrieval via Prefix Retention Optimization

arxiv url: http://arxiv.org/abs/2606.09241v2
Date: Tue, 09 Jun 2026 08:19:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-10 13:21:50.814077
Title: Closing the Indexing-Decoding Gap in Multimodal Generative Retrieval via Prefix Retention Optimization
Title（参考訳）: プレフィックス保持最適化によるマルチモーダル生成検索におけるインデックスデコードギャップの閉鎖
Authors: Yufei Chen, Zihan Wang, Yubao Tang, Yukun Zhao, Maarten de Rijke, Zhaochun Ren,
Abstract要約: マルチモーダル生成検索式は、複数のモーダル検索を離散識別子生成として定義し、外部埋め込みよりも明示的な類似性探索を不要とする。既存の手法では、残差量子化によって識別子を構築し、トリエ制約ビームサーチでデコードする。この組み合わせは、インデックス化とデコーディングのギャップを導入している: 識別子学習の目的は、再構成や対照的な損失を含むが、デコーディング中にプレフィックスの識別性を明示的に強制しない。 1)プレフィックスの格付け蒸留は、リストワイドロスを用いた事前量子化埋め込みによって誘導されるプレフィックスと整合する; (ii)語彙スケジューリングは、コードブックを増大させる。
参考スコア（独自算出の注目度）: 68.48718919047127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal generative retrieval formulates multimodal retrieval as discrete identifier generation, eliminating the need for explicit similarity search over external embeddings. Existing approaches construct identifiers via residual quantization and decode them with trie-constrained beam search. This combination introduces an indexing-decoding gap: identifier learning objectives, including reconstruction and contrastive losses, do not explicitly enforce prefix discriminability during decoding. As a result, even well-optimized identifiers can be irreversibly pruned early in beam search due to low-rank prefixes. We theoretically characterize this gap and derive a survival bound that relates prefix retention to three controllable factors in indexing and decoding. Building on this bound, we propose PRO, prefix retention optimization, a unified framework comprising three mechanisms: (i) prefix ranking distillation aligns quantized prefix rankings with those induced by pre-quantization embeddings using a listwise loss; (ii) vocabulary scheduling increases codebook sizes from shallow to deep residual quantization levels to reduce early competition from non-target prefixes; and (iii) geometric score fusion vectorizes each candidate prefix and incorporates its similarity to the query into beam search scoring, further reducing the indexing-decoding mismatch. Experiments on nine multimodal retrieval tasks show that PRO improves retention of target identifier prefixes and outperforms existing multimodal generative retrieval baselines.
Abstract（参考訳）: マルチモーダル生成検索式は、複数のモーダル検索を離散識別子生成として定義し、外部埋め込みよりも明示的な類似性探索を不要とする。既存の手法では、残差量子化によって識別子を構築し、トリエ制約ビームサーチでデコードする。この組み合わせは、インデックス化とデコーディングのギャップを導入している: 識別子学習の目的は、再構成や対照的な損失を含むが、デコーディング中にプレフィックスの識別性を明示的に強制しない。その結果、最適化された識別子でさえ、低ランクのプレフィックスにより、ビームサーチの早期に不可逆的にプルーニングすることができる。我々は、このギャップを理論的に特徴づけ、インデックス化と復号化において、プレフィックス保持を3つの制御可能な要因に関連付ける生存境界を導出する。このバウンダリに基づいて,3つのメカニズムから構成される統一フレームワークであるpremated Retention Optimization(POP)を提案する。 (i)前置格付け蒸留は、リストワイドロスを用いた前置格付けにより誘導される量化前置格付けと整合する。 (二)語彙スケジューリングは、非ターゲットプレフィックスとの早期競争を減らし、コードブックのサイズを浅い残量化レベルから深い残量化レベルに引き上げる。三幾何スコア融合は、各候補プレフィックスをベクトル化し、クエリと類似性をビームサーチスコアに組み込み、インデックス化復号ミスマッチをさらに低減させる。 9つのマルチモーダル検索タスクの実験では、ProPはターゲット識別子のプレフィックスの保持を改善し、既存のマルチモーダル生成検索ベースラインを上回っている。

論文の概要: Closing the Indexing-Decoding Gap in Multimodal Generative Retrieval via Prefix Retention Optimization

関連論文リスト