Fugu-MT 論文翻訳(概要): Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering

論文の概要: Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering

arxiv url: http://arxiv.org/abs/2103.05568v1
Date: Tue, 9 Mar 2021 17:19:50 GMT
ステータス: 翻訳完了
システム内更新日: 2021-03-10 15:08:01.839391
Title: Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Title（参考訳）: Select, Substitute, Search: 知識を付加したビジュアル質問回答の新しいベンチマーク
Authors: Aman Jain, Mayank Kothyari, Vishwajeet Kumar, Preethi Jyothi, Ganesh Ramakrishnan, Soumen Chakrabarti
Abstract要約: テキストコーパス、知識グラフ、画像にまたがるマルチモーダルIRは、近年の関心事である。驚くほど多くのクエリは、クロスモーダル情報を統合する能力を評価しません。我々は新しいデータセットを構築し、OKVQA,viz., S3における重要な構造イディオムに挑戦する。
参考スコア（独自算出の注目度）: 35.855792706139525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal IR, spanning text corpus, knowledge graph and images, called outside knowledge visual question answering (OKVQA), is of much recent interest. However, the popular data set has serious limitations. A surprisingly large fraction of queries do not assess the ability to integrate cross-modal information. Instead, some are independent of the image, some depend on speculation, some require OCR or are otherwise answerable from the image alone. To add to the above limitations, frequency-based guessing is very effective because of (unintended) widespread answer overlaps between the train and test folds. Overall, it is hard to determine when state-of-the-art systems exploit these weaknesses rather than really infer the answers, because they are opaque and their 'reasoning' process is uninterpretable. An equally important limitation is that the dataset is designed for the quantitative assessment only of the end-to-end answer retrieval task, with no provision for assessing the correct(semantic) interpretation of the input query. In response, we identify a key structural idiom in OKVQA ,viz., S3 (select, substitute and search), and build a new data set and challenge around it. Specifically, the questioner identifies an entity in the image and asks a question involving that entity which can be answered only by consulting a knowledge graph or corpus passage mentioning the entity. Our challenge consists of (i)OKVQAS3, a subset of OKVQA annotated based on the structural idiom and (ii)S3VQA, a new dataset built from scratch. We also present a neural but structurally transparent OKVQA system, S3, that explicitly addresses our challenge dataset, and outperforms recent competitive baselines.
Abstract（参考訳）: 外部知識視覚質問応答(OKVQA)と呼ばれる、テキストコーパス、知識グラフ、画像にまたがるマルチモーダルIRは、非常に最近の関心事である。しかし、人気のデータセットには深刻な制限があります。驚くほど多くのクエリは、クロスモーダル情報を統合する能力を評価しません。代わりに、画像から独立しているものもあれば、推測に依存しているものもあれば、OCRを必要とするものもある。上記の制限に加えて、列車とテストの折りたたみの間に(意図しない)広範な応答が重なり合うため、周波数ベースの推測は非常に効果的である。全体として、最先端のシステムが実際に答えを推測するのではなく、これらの弱点をいつ悪用するかを判断するのは困難である。同様に重要な制限は、データセットがエンドツーエンドの回答検索タスクのみの定量的評価用に設計されており、入力クエリの正しい(セマンティック)解釈を評価するための規定がないことである。そこで我々は,okvqa,viz.,s3 (select, replacement, search) におけるキー構造イディオムを特定し,新しいデータセットを構築して挑戦する。具体的には、画像中のエンティティを特定し、そのエンティティに言及する知識グラフやコーパスを参照することによってのみ回答できるエンティティを含む質問を行う。 i)OKVQAS3は構造イディオムに基づいて注釈付けされたOKVQAのサブセットであり、(ii)S3VQAはスクラッチから構築された新しいデータセットである。また、私たちのチャレンジデータセットに明示的に対処し、最近の競争ベースラインを上回っている、神経的にも構造的にも透過的なOKVQAシステムS3も提示します。

関連論文リスト

Convincing Rationales for Visual Question Answering Reasoning [14.490692389105947]
VQA(Visual Question Answering)は、画像の内容に関する質問に対する回答を予測するための課題である。 VQA, CRVQAに対して, 与えられた画像/問合せ対の予測解に隣接する視覚的, テキスト的合理性を生成する。 CRVQAは、ゼロショット評価設定において、一般的なVQAデータセット上での競合性能を達成する。
論文参考訳（メタデータ） (2024-02-06T11:07:05Z)
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
本稿ではUNK-VQAと呼ばれる包括的データセットを提案する。まず、画像または疑問について意図的に摂動することで、既存のデータを拡大する。そこで我々は,新たなマルチモーダル大規模モデルのゼロショット性能と少数ショット性能を広範囲に評価した。
論文参考訳（メタデータ） (2023-10-17T02:38:09Z)
Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
知識に基づく視覚的質問回答(KB-VQA)の目的は、外部知識ベースの助けを借りて質問に対する正しい回答を提供することである。 KB-VQA, Graph pATH ranker (GATHER for brevity) の新しいレトリバーランカパラダイムを提案する。具体的には、グラフの構築、プルーニング、パスレベルのランク付けが含まれており、正確な回答を検索するだけでなく、推論パスを提供して推論プロセスを説明する。
論文参考訳（メタデータ） (2023-10-12T09:12:50Z)
OpenCQA: Open-ended Question Answering with Charts [6.7038829115674945]
我々はOpenCQAと呼ばれる新しいタスクを導入し、そこではグラフに関するオープンな質問にテキストで答えることが目的である。 3つの実践的な設定の下で,一連のベースラインを実装し,評価する。結果から,トップパフォーマンスモデルは通常,流動的かつコヒーレントなテキストを生成することが示された。
論文参考訳（メタデータ） (2022-10-12T23:37:30Z)
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge [39.788346536244504]
A-OKVQAは、約25万の質問からなるクラウドソーシングデータセットである。我々は、この新たなデータセットの可能性について、その内容の詳細な分析を通して示す。
論文参考訳（メタデータ） (2022-06-03T17:52:27Z)
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding [140.5911760063681]
VQAモデル評価のためのナレッジルーティング視覚質問推論という新しいデータセットを提案する。視覚ゲノムシーングラフと外部知識ベースの両方に基づいて,制御プログラムを用いて質問応答対を生成する。
論文参考訳（メタデータ） (2020-12-14T00:33:44Z)
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering [26.21870452615222]
FVQAは、画像に関する質問に答えるために、可視コンテンツ以外の外部知識を必要とする。問題指向で情報補完的な証拠をどうやって捉えるかは、この問題を解決する上で重要な課題である。与えられた問題に最も関係のある異なる層から証拠を捉えるために,モダリティを考慮した異種グラフ畳み込みネットワークを提案する。
論文参考訳（メタデータ） (2020-06-16T11:03:37Z)
ClarQ: A large-scale and diverse dataset for Clarification Question Generation [67.1162903046619]
そこで我々は,スタックエクスチェンジから抽出したポストコメンデーションに基づいて,多様な,大規模な明確化質問データセットの作成を支援する,新しいブートストラップフレームワークを考案した。質問応答の下流タスクに適用することで,新たに作成したデータセットの有用性を定量的に示す。我々はこのデータセットを公開し、ダイアログと質問応答システムの拡張という大きな目標を掲げて、質問生成の分野の研究を促進する。
論文参考訳（メタデータ） (2020-06-10T17:56:50Z)
Robust Question Answering Through Sub-part Alignment [53.94003466761305]
我々はアライメント問題として質問応答をモデル化する。私たちは、SQuAD v1.1でモデルをトレーニングし、いくつかの逆および外ドメインデータセットでそれをテストします。
論文参考訳（メタデータ） (2020-04-30T09:10:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。