Fugu-MT 論文翻訳(概要): Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

論文の概要: Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

arxiv url: http://arxiv.org/abs/2511.02358v1
Date: Tue, 04 Nov 2025 08:24:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 18:47:05.852625
Title: Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
Title（参考訳）: 適応型クエリ拡張によるクエリ拡張のタイミングをマルチモーダルエンベッドダーが学べる
Authors: Wongyu Kim, Hochang Lee, Sanghak Lee, Yoonsung Kim, Jaehyun Park,
Abstract要約: 本稿では,M-Solomonを提案する。M-Solomonは汎用なマルチモーダル埋め込みシステムで,クエリをいつ拡張するかを適応的に決定できる。我々は,M-ソロモンが拡張を伴わずにベースラインを超えただけでなく,拡張を常に用いたベースラインよりも優れていたことを示す。
参考スコア（独自算出の注目度）: 3.765602121469129
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Query augmentation makes queries more meaningful by appending further information to the queries to find relevant documents. Current studies have proposed Large Language Model (LLM)-based embedders, which learn representation for embedding and generation for query augmentation in a multi-task manner by leveraging the generative capabilities of LLM. During inference, these jointly trained embedders have conducted query augmentation followed by embedding, showing effective results. However, augmenting every query leads to substantial embedding latency and query augmentation can be detrimental to performance for some queries. Also, previous methods have not been explored in multimodal environments. To tackle these problems, we propose M-Solomon, a universal multimodal embedder that can adaptively determine when to augment queries. Our approach first divides the queries of the training datasets into two groups at the dataset level. One includes queries that require augmentation and the other includes queries that do not. Then, we introduces a synthesis process that generates appropriate augmentations for queries that require them by leveraging a powerful Multimodal LLM (MLLM). Next, we present adaptive query augmentation. Through this step, M-Solomon can conduct query augmentation only when necessary by learning to generate synthetic augmentations with the prefix /augment for queries that demand them and to generate the simple string /embed for others. Experimental results showed that M-Solomon not only surpassed the baseline without augmentation by a large margin but also outperformed the baseline that always used augmentation, providing much faster embedding latency.
Abstract（参考訳）: クエリ拡張は、クエリにさらに情報を加えて関連ドキュメントを見つけることで、クエリをより意味のあるものにします。大規模言語モデル(LLM)に基づく組込みモデルを提案し,LLMの生成能力を生かし,マルチタスク方式でクエリ拡張のための埋め込みと生成の表現を学習している。推論の間、これらの共同訓練された埋め込みは、クエリ拡張を行い、その後に埋め込みを行い、効果的な結果を示した。しかし、全てのクエリを増大させることで、かなりの埋め込みレイテンシとクエリの増大は、一部のクエリのパフォーマンスを損なう可能性がある。また,従来の手法はマルチモーダル環境では研究されていない。これらの問題に対処するため,M-Solomonを提案する。当社のアプローチはまず,トレーニングデータセットのクエリを,データセットレベルで2つのグループに分割する。 1つは拡張を必要とするクエリを含み、もう1つは不要なクエリを含んでいる。そこで我々は,強力なマルチモーダル LLM (MLLM) を活用することで,それらを必要とするクエリに対して適切な拡張を生成する合成プロセスを提案する。次に,適応型クエリ拡張を提案する。このステップを通じて、M-Solomonは、要求するクエリのプレフィックス/オーグメンテーションで合成オーグメンテーションを生成し、他のクエリに埋め込まれた単純な文字列/組込みを生成することで、必要な時にのみクエリオーグメンテーションを実行することができる。実験結果から,M-Solomonは拡張を伴わずにベースラインを上回っただけでなく,拡張を常に用いたベースラインよりも優れており,組込み遅延がはるかに高速であることがわかった。

論文の概要: Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

関連論文リスト