Fugu-MT 論文翻訳(概要): A Sanity Check on Composed Image Retrieval

論文の概要: A Sanity Check on Composed Image Retrieval

arxiv url: http://arxiv.org/abs/2604.12904v1
Date: Tue, 14 Apr 2026 15:52:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.543889
Title: A Sanity Check on Composed Image Retrieval
Title（参考訳）: 合成画像検索における正当性検査
Authors: Yikun Liu, Jiangchao Yao, Weidi Xie, Yanfeng Wang,
Abstract要約: Composed Image Retrieval (CIR) は、参照画像からなるクエリと、所望の修正を指定する相対的なキャプションに基づいて、ターゲット画像を取得することを目的としている。 FISD(Fully-Informed Semantically-Diverse benchmark)は、参照ターゲット画像対の変数を正確に制御するために生成モデルを利用する。本稿では,対話型シナリオにおける既存モデルの可能性を探るためのマルチラウンドエージェント評価フレームワークを提案する。
参考スコア（独自算出の注目度）: 91.95275287747499
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is not well characterized by existing benchmarks, which inherently contain indeterminate queries degrading the evaluation (i.e., multiple candidate images, rather than solely the target image, meet the query criteria), and have not considered their effectiveness in the context of the multi-round system. Motivated by this, we consider improving the evaluation procedure from two aspects: 1) we introduce FISD, a Fully-Informed Semantically-Diverse benchmark, which employs generative models to precisely control the variables of reference-target image pairs, enabling a more accurate evaluation of CIR methods across six dimensions, without query ambiguity; 2) we propose an automatic multi-round agentic evaluation framework to probe the potential of the existing models in the interactive scenarios. By observing how models adapt and refine their choices over successive rounds of queries, this framework provides a more realistic appraisal of their efficacy in practical applications. Extensive experiments and comparisons prove the value of our novel evaluation on typical CIR methods.
Abstract（参考訳）: Composed Image Retrieval (CIR) は、参照画像からなるクエリと、所望の修正を指定する相対的なキャプションに基づいて、ターゲット画像を取得することを目的としている。 CIRモデルの急速な開発にもかかわらず、その性能は既存のベンチマークではあまり特徴付けられていない。これは本質的には、評価を劣化させる不確定なクエリ(すなわち、ターゲット画像だけでなく、複数の候補画像がクエリ基準を満たしている)を含んでおり、マルチラウンドシステムのコンテキストにおいてそれらの効果を考慮していない。これを受けて、評価手順の改善を2つの側面から検討する。 1) FISD(Fully-Informed Semantically-Diverseベンチマーク)を導入し,参照対象画像対の変数を正確に制御し,クエリのあいまいさを伴わずに6次元のCIR手法をより正確に評価できるようにする。 2) 対話型シナリオにおける既存モデルの可能性を調べるために, 自動多ラウンドエージェント評価フレームワークを提案する。このフレームワークは、連続するクエリに対してモデルがどのように適応し、選択を洗練するかを観察することによって、実践的な応用におけるモデルの有効性をより現実的に評価する。実験および比較実験により, 典型的なCIR法における新しい評価法の有効性が証明された。

論文の概要: A Sanity Check on Composed Image Retrieval

関連論文リスト