Fugu-MT 論文翻訳(概要): Retrieving Counterfactuals Improves Visual In-Context Learning

論文の概要: Retrieving Counterfactuals Improves Visual In-Context Learning

arxiv url: http://arxiv.org/abs/2603.16737v1
Date: Tue, 17 Mar 2026 16:18:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.407536
Title: Retrieving Counterfactuals Improves Visual In-Context Learning
Title（参考訳）: 対物検索は視覚的インテクスト学習を改善する
Authors: Guangzhi Xiong, Sanchit Sinha, Zhenghao He, Aidong Zhang,
Abstract要約: In-context Learning (ICL)は、視覚言語モデルが新しいタスクに適応するための有望な道を提供する。既存の検索強化アプローチは、パッシブ類似性に基づく検索に依存している。本稿では,実証セットを積極的に構築する新しいフレームワークであるCIRCLESを紹介する。
参考スコア（独自算出の注目度）: 41.6338086518055
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-language models (VLMs) have achieved impressive performance across a wide range of multimodal reasoning tasks, but they often struggle to disentangle fine-grained visual attributes and reason about underlying causal relationships. In-context learning (ICL) offers a promising avenue for VLMs to adapt to new tasks, but its effectiveness critically depends on the selection of demonstration examples. Existing retrieval-augmented approaches typically rely on passive similarity-based retrieval, which tends to select correlated but non-causal examples, amplifying spurious associations and limiting model robustness. We introduce CIRCLES (Composed Image Retrieval for Causal Learning Example Selection), a novel framework that actively constructs demonstration sets by retrieving counterfactual-style examples through targeted, attribute-guided composed image retrieval. By incorporating counterfactual-style examples, CIRCLES enables VLMs to implicitly reason about the causal relations between attributes and outcomes, moving beyond superficial correlations and fostering more robust and grounded reasoning. Comprehensive experiments on four diverse datasets demonstrate that CIRCLES consistently outperforms existing methods across multiple architectures, especially on small-scale models, with pronounced gains under information scarcity. Furthermore, CIRCLES retrieves more diverse and causally informative examples, providing qualitative insights into how models leverage in-context demonstrations for improved reasoning. Our code is available at https://github.com/gzxiong/CIRCLES.
Abstract（参考訳）: 視覚言語モデル(VLM)は、様々なマルチモーダル推論タスクにおいて印象的なパフォーマンスを達成しているが、細粒度の視覚的特性と根底にある因果関係の推論を乱すのにしばしば苦労している。 In-context Learning (ICL)は、VLMが新しいタスクに適応するための有望な道を提供するが、その効果は実演例の選択に大きく依存する。既存の検索強化アプローチは、典型的にはパッシブ類似性に基づく検索に依存しており、相関しているが非因果的な例を選択し、刺激的な関連を増幅し、モデルロバスト性を制限する傾向にある。本稿では,CIRCLES(Composed Image Retrieval for Causal Learning Example Selection)を紹介する。 CIRCLESは、対物的な例を取り入れることで、VLMが属性と結果の間の因果関係を暗黙的に推論し、表面的相関を超えて、より堅牢で基礎的な推論を促進することを可能にする。 4つの多様なデータセットに関する総合的な実験により、CIRCLESは複数のアーキテクチャ、特に小規模モデルにおいて既存のメソッドを一貫して上回り、情報不足下では顕著に向上することを示した。さらに、CIRCLESはより多様で因果的な事例を検索し、モデルが推論を改善するためにコンテキスト内デモをどのように活用するかに関する質的な洞察を提供する。私たちのコードはhttps://github.com/gzxiong/CIRCLES.comで公開されています。

論文の概要: Retrieving Counterfactuals Improves Visual In-Context Learning

関連論文リスト