Fugu-MT 論文翻訳(概要): An Entropy Clustering Approach for Assessing Visual Question Difficulty

論文の概要: An Entropy Clustering Approach for Assessing Visual Question Difficulty

arxiv url: http://arxiv.org/abs/2004.05595v3
Date: Fri, 2 Sep 2022 07:01:22 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-14 05:32:56.424338
Title: An Entropy Clustering Approach for Assessing Visual Question Difficulty
Title（参考訳）: 視覚的質問難度評価のためのエントロピークラスタリング手法
Authors: Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shun'ichi Satoh
Abstract要約: 複数の異なるVQAモデルの振る舞いに基づいて視覚的質問の難しさを分析する。我々は,各クラスタの回答分布の精度とエントロピーを決定するために,最先端の手法を用いている。我々の手法は、最先端の手法で正しく答えられていない難解な視覚的質問のクラスタを特定することができる。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of visual questions based on the behavior of multiple different VQA models. We propose to cluster the entropy values of the predicted answer distributions obtained by three different models: a baseline method that takes as input images and questions, and two variants that take as input images only and questions only. We use a simple k-means to cluster the visual questions of the VQA v2 validation set. Then we use state-of-the-art methods to determine the accuracy and the entropy of the answer distributions for each cluster. A benefit of the proposed method is that no annotation of the difficulty is required, because the accuracy of each cluster reflects the difficulty of visual questions that belong to it. Our approach can identify clusters of difficult visual questions that are not answered correctly by state-of-the-art methods. Detailed analysis on the VQA v2 dataset reveals that 1) all methods show poor performances on the most difficult cluster (about 10\% accuracy), 2) as the cluster difficulty increases, the answers predicted by the different methods begin to differ, and 3) the values of cluster entropy are highly correlated with the cluster accuracy. We show that our approach has the advantage of being able to assess the difficulty of visual questions without ground-truth (\ie, the test set of VQA v2) by assigning them to one of the clusters. We expect that this can stimulate the development of novel directions of research and new algorithms.
Abstract（参考訳）: 本稿では,視覚的質問応答(VQA)における視覚的質問の難易度を,その難易度に対する直接的な監督や注釈なしで識別する手法を提案する。先行研究は、人間のアノテーターの真正解の多様性を考察してきた。対照的に、複数の異なるVQAモデルの振る舞いに基づいて視覚的質問の難しさを分析する。本稿では,3つのモデルから得られた予測解のエントロピー値をクラスタリングし,入力画像と質問を対象とするベースライン法と,入力画像のみと質問のみを対象とする2つの変種を提案する。 VQA v2検証セットの視覚的質問をクラスタリングするために、簡単なk-meansを使用します。次に,各クラスタに対する回答分布の精度とエントロピーを決定するために,最先端手法を用いる。提案手法の利点は,各クラスタの精度がそれに属する視覚的質問の難易度を反映しているため,難易度への注釈は不要である。このアプローチは,最先端の手法で正しく答えられていない難解な視覚的質問のクラスタを識別できる。 vqa v2データセットの詳細な分析によって 1) 最も難しいクラスタ(約10\%の精度)では,すべてのメソッドのパフォーマンスが低い。 2) クラスタの難易度が高まるにつれて, 異なる手法で予測される回答が相違し始める。 3) クラスタエントロピーの値は, クラスタの精度と強く相関している。提案手法は,VQA v2のテストセットであるVQA v2をクラスタの1つに割り当てることなく,視覚的質問の難易度を評価することができるという利点がある。これは、新しい研究方向と新しいアルゴリズムの開発を促進できると期待している。

関連論文リスト

Understanding Complexity in VideoQA via Visual Program Generation [31.207902042321006]
ビデオQA(Video Question Answering)における問合せの複雑さを解析するためのデータ駆動型手法を提案する。我々は、機械学習モデルにおいて、どの質問が難しいかを予測するのに、人間が苦労していることを実験的に示す。複雑な質問を自動的に生成するように拡張し、一般的なNExT-QAの1.9倍難しい新しいベンチマークを構築します。
論文参考訳（メタデータ） (2025-05-19T17:55:14Z)
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
本稿では,応答認識と領域参照を用いた視覚的質問生成のための新しい学習パラダイムを提案する。我々は、追加の人間のアノテーションを導入することなく、視覚的ヒントを自己学習する簡単な手法を開発した。
論文参考訳（メタデータ） (2024-07-06T15:07:32Z)
Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
視覚的質問応答のための質問分解戦略について検討する。モデル記述分解の素早い適用は性能を損なう可能性があることを示す。モデル駆動選択的分解手法を導入し,予測と誤りの訂正を行う。
論文参考訳（メタデータ） (2023-10-25T23:23:57Z)
SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering [10.749155815447127]
画像特徴に着目した自己教師付き対実測位学習法(SC-ML)を提案する。 SC-MLは、質問関連視覚特徴を適応的に選択し、質問関連視覚特徴の負の影響を低減できる。
論文参考訳（メタデータ） (2023-04-04T09:05:11Z)
FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering [18.89421715778728]
本稿では,この不均衡に対処するため,FVQA 2.0を提案する。従来のFVQAトレインセットで訓練されたシステムは、敵のサンプルに対して脆弱であることを示す。
論文参考訳（メタデータ） (2023-03-19T16:07:42Z)
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding [140.5911760063681]
VQAモデル評価のためのナレッジルーティング視覚質問推論という新しいデータセットを提案する。視覚ゲノムシーングラフと外部知識ベースの両方に基づいて,制御プログラムを用いて質問応答対を生成する。
論文参考訳（メタデータ） (2020-12-14T00:33:44Z)
Contrast and Classify: Training Robust VQA Models [60.80627814762071]
本稿では,クロスエントロピーとコントラスト損失の両方を最適化する新しいトレーニングパラダイム(ConClaT)を提案する。双方の損失を -- 交互に,あるいは共同で -- 最適化することが,効果的なトレーニングの鍵であることに気付きました。
論文参考訳（メタデータ） (2020-10-13T00:23:59Z)
Hierarchical Deep Multi-modal Network for Medical Visual Question Answering [25.633660028022195]
本稿では,エンドユーザの質問/問い合わせを分析し,分類する階層的なディープマルチモーダルネットワークを提案する。我々は、QSモデルを階層的な深層多モードニューラルネットワークに統合し、医用画像に関するクエリに対する適切な回答を生成する。
論文参考訳（メタデータ） (2020-09-27T07:24:41Z)
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering [26.21870452615222]
FVQAは、画像に関する質問に答えるために、可視コンテンツ以外の外部知識を必要とする。問題指向で情報補完的な証拠をどうやって捉えるかは、この問題を解決する上で重要な課題である。与えられた問題に最も関係のある異なる層から証拠を捉えるために,モダリティを考慮した異種グラフ畳み込みネットワークを提案する。
論文参考訳（メタデータ） (2020-06-16T11:03:37Z)
C3VQG: Category Consistent Cyclic Visual Question Generation [51.339348810676896]
視覚質問生成(VQG)は、画像に基づいて自然な質問を生成するタスクである。本稿では,画像内の様々な視覚的手がかりと概念を利用して,基底的答えを伴わずに,変分オートエンコーダ(VAE)を用いて質問を生成する。提案手法は,既存のVQGシステムにおける2つの大きな欠点を解消する: (i) 監督レベルを最小化し, (ii) 一般的な質問をカテゴリ関連世代に置き換える。
論文参考訳（メタデータ） (2020-05-15T20:25:03Z)
Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing [20.117014315684287]
我々は、知識ギャップ(KG)の分類を用いて、質問を1つまたは複数のタイプのKGでタグ付けする。次に,各KGに対する質問の分布のスキューについて検討する。これらの新しい質問は、既存のVQAデータセットに追加することで、質問の多様性を高め、スキューを減らすことができる。
論文参考訳（メタデータ） (2020-04-08T00:27:43Z)
SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
現状のVQAモデルでは、知覚や推論の問題に答える上で同等の性能を持つが、一貫性の問題に悩まされていることを示す。この欠点に対処するため、サブクエスト対応ネットワークチューニング(SQuINT)というアプローチを提案する。我々は,SQuINTがモデル一貫性を5%向上し,VQAにおける推論問題の性能も改善し,注意マップも改善したことを示す。
論文参考訳（メタデータ） (2020-01-20T01:02:36Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。