Fugu-MT 論文翻訳(概要): Knowledge-Based Counterfactual Queries for Visual Question Answering

論文の概要: Knowledge-Based Counterfactual Queries for Visual Question Answering

arxiv url: http://arxiv.org/abs/2303.02601v1
Date: Sun, 5 Mar 2023 08:00:30 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-07 18:42:41.370895
Title: Knowledge-Based Counterfactual Queries for Visual Question Answering
Title（参考訳）: 視覚的質問応答のための知識に基づく反事実クエリ
Authors: Theodoti Stoikou, Maria Lymperaiou, Giorgos Stamou
Abstract要約: 本稿では,VQAモデルの動作を説明するための系統的手法を提案する。そこで我々は,言語的モダリティをターゲットとした決定論的,最適,制御可能な単語レベルの置換を行うために,構造化知識ベースを利用する。次に、そのような反実的な入力に対するモデルの応答を評価する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Visual Question Answering (VQA) has been a popular task that combines vision and language, with numerous relevant implementations in literature. Even though there are some attempts that approach explainability and robustness issues in VQA models, very few of them employ counterfactuals as a means of probing such challenges in a model-agnostic way. In this work, we propose a systematic method for explaining the behavior and investigating the robustness of VQA models through counterfactual perturbations. For this reason, we exploit structured knowledge bases to perform deterministic, optimal and controllable word-level replacements targeting the linguistic modality, and we then evaluate the model's response against such counterfactual inputs. Finally, we qualitatively extract local and global explanations based on counterfactual responses, which are ultimately proven insightful towards interpreting VQA model behaviors. By performing a variety of perturbation types, targeting different parts of speech of the input question, we gain insights to the reasoning of the model, through the comparison of its responses in different adversarial circumstances. Overall, we reveal possible biases in the decision-making process of the model, as well as expected and unexpected patterns, which impact its performance quantitatively and qualitatively, as indicated by our analysis.
Abstract（参考訳）: VQA(Visual Question Answering)は、視覚と言語と文学における多くの関連する実装を組み合わせた一般的なタスクである。 VQAモデルにおける説明可能性と堅牢性の問題にアプローチする試みはいくつかあるが、モデルに依存しない方法でそのような課題を探索する手段として、反ファクトリクスを用いるものはごくわずかである。そこで本研究では, vqaモデルの挙動を体系的に説明し, 反事実摂動によるロバスト性を検討する手法を提案する。そこで我々は,言語的モダリティを対象とする決定論的,最適,制御可能な単語レベルの置換を行うために,構造化知識ベースを利用する。最後に,VQAモデルの振る舞いを解釈するための知見として,対実応答に基づく局所的・大域的説明を質的に抽出する。様々な摂動型を実行し、入力された質問の音声の異なる部分をターゲットにすることで、異なる状況下での応答の比較を通じて、モデルの推論に対する洞察を得る。全体として, モデル意思決定過程におけるバイアス, 予測されたパターン, 予期せぬパターンが, 定量的, 質的にその性能に影響を及ぼす可能性を明らかにする。

関連論文リスト

CounterVQA: Evaluating and Improving Counterfactual Reasoning in Vision-Language Models for Video Understanding [13.628041236679229]
視覚言語モデル(VLM)は近年,映像理解の大幅な進歩を見せている。ビデオベースのベンチマークであるCounterVQAを導入する。本研究は,言語モーダルから対実的推論能力を蒸留することにより,モデルの視覚的対実的推論能力を高めるポストトレーニング手法CFGPTを開発した。
論文参考訳（メタデータ） (2025-11-25T04:59:55Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
ベンチマークはさまざまなバイアス、アーティファクト、リークに悩まされている。モデルは、調査の不十分な障害モードのため、信頼できない振る舞いをする可能性がある。因果関係はこれらの課題を体系的に解決するための理想的な枠組みを提供します
論文参考訳（メタデータ） (2025-02-07T17:01:37Z)
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions [75.45274978665684]
VLU(Vision-Language Understanding)ベンチマークには、提供されたコンテキストによってサポートされない仮定に答えが依存するサンプルが含まれている。サンプル毎にコンテキストデータを収集し,エビデンスに基づくモデル予測を促進するためにコンテキスト選択モジュールをトレーニングする。我々は,十分なコンテキストを欠いたサンプルを同定し,モデル精度を向上させる汎用なコンテキスト・アワレ認識検出器を開発した。
論文参考訳（メタデータ） (2024-05-18T02:21:32Z)
Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
本稿では, モデル決定を中間的人間法的な説明に分解する設計モデルを提案する。我々は、我々の本質的に解釈可能なシステムは、推論に焦点をあてた質問において、同等のブラックボックスシステムよりも4.64%改善できることを示した。
論文参考訳（メタデータ） (2023-05-24T08:33:15Z)
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
本研究は,VQAモデルのロバスト性を評価するために,基本質問と呼ばれる意味的関連質問を利用する新しい手法を提案する。実験により,提案手法はVQAモデルのロバスト性を効果的に解析することを示した。
論文参考訳（メタデータ） (2023-04-06T15:32:35Z)
Logical Implications for Visual Question Answering Consistency [2.005299372367689]
本稿では,VQAモデルに広く適用可能な新しい整合損失項を提案する。本稿では,これらの論理的関係を専用言語モデルを用いて推論し,一貫性損失関数として用いることを提案する。我々は、VQAイントロスペクションとDMEデータセットに関する広範な実験を行い、我々の手法が最先端のVQAモデルに改善をもたらすことを示す。
論文参考訳（メタデータ） (2023-03-16T16:00:18Z)
COIN: Counterfactual Image Generation for VQA Interpretation [5.994412766684842]
本稿では,VQAモデルに対する対実画像の生成による解釈可能性のアプローチを提案する。単一画像上でのVQAモデルの結果の解釈に加えて、得られた結果と議論は、VQAモデルの振る舞いに関する広範な説明を提供する。
論文参考訳（メタデータ） (2022-01-10T13:51:35Z)
Latent Variable Models for Visual Question Answering [34.9601948665926]
視覚質問応答に対する潜在変数モデルを提案する。余分な情報(例) キャプションと回答カテゴリ)は推論を改善するために潜在変数として組み込まれます。 VQA v2.0ベンチマークデータセットの実験は、提案されたモデルの有効性を示している。
論文参考訳（メタデータ） (2021-01-16T08:21:43Z)
Learning from Lexical Perturbations for Consistent Visual Question Answering [78.21912474223926]
既存のVisual Question Answering (VQA)モデルは、しばしば脆弱で入力のバリエーションに敏感である。本稿では,モジュール型ネットワークに基づく新たなアプローチを提案し,言語摂動による2つの疑問を提起する。 VQA Perturbed Pairings (VQA P2) も提案する。
論文参考訳（メタデータ） (2020-11-26T17:38:03Z)
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View [129.392671317356]
本稿では,クラス不均衡の観点から,VQAにおける言語先行問題を理解することを提案する。これは、なぜVQAモデルが頻繁に、そして明らかに間違った答えをもたらすのかを明確に示している。また,顔認識や画像分類などの他のコンピュータビジョンタスクに対して,クラス不均衡解釈方式の有効性を正当化する。
論文参考訳（メタデータ） (2020-10-30T00:57:17Z)
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering [58.30291671877342]
MUTANTは、モデルが知覚的に類似しているが意味的に異なる入力の変異に露出する訓練パラダイムである。 MUTANTは、VQA-CPに新しい最先端の精度を確立し、10.57%$改善した。
論文参考訳（メタデータ） (2020-09-18T00:22:54Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。