Fugu-MT 論文翻訳(概要): One-shot Scene Graph Generation

論文の概要: One-shot Scene Graph Generation

arxiv url: http://arxiv.org/abs/2202.10824v1
Date: Tue, 22 Feb 2022 11:32:59 GMT
ステータス: 翻訳完了
システム内更新日: 2022-02-23 16:23:03.028525
Title: One-shot Scene Graph Generation
Title（参考訳）: ワンショットシーングラフ生成
Authors: Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen
Abstract要約: ワンショットシーングラフ生成タスクに対して,複数の構造化知識(関係知識知識)を提案する。提案手法は既存の最先端手法よりも大きなマージンで優れる。
参考スコア（独自算出の注目度）: 130.57405850346836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As a structured representation of the image content, the visual scene graph (visual relationship) acts as a bridge between computer vision and natural language processing. Existing models on the scene graph generation task notoriously require tens or hundreds of labeled samples. By contrast, human beings can learn visual relationships from a few or even one example. Inspired by this, we design a task named One-Shot Scene Graph Generation, where each relationship triplet (e.g., "dog-has-head") comes from only one labeled example. The key insight is that rather than learning from scratch, one can utilize rich prior knowledge. In this paper, we propose Multiple Structured Knowledge (Relational Knowledge and Commonsense Knowledge) for the one-shot scene graph generation task. Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted from the visual content, e.g., the visual relationships "standing in", "sitting in", and "lying in" may exist between "dog" and "yard", while the Commonsense Knowledge encodes "sense-making" knowledge like "dog can guard yard". By organizing these two kinds of knowledge in a graph structure, Graph Convolution Networks (GCNs) are used to extract knowledge-embedded semantic features of the entities. Besides, instead of extracting isolated visual features from each entity generated by Faster R-CNN, we utilize an Instance Relation Transformer encoder to fully explore their context information. Based on a constructed one-shot dataset, the experimental results show that our method significantly outperforms existing state-of-the-art methods by a large margin. Ablation studies also verify the effectiveness of the Instance Relation Transformer encoder and the Multiple Structured Knowledge.
Abstract（参考訳）: 画像コンテンツの構造化表現として、視覚シーングラフ(視覚関係)は、コンピュータビジョンと自然言語処理の橋渡しとして機能する。シーングラフ生成タスクの既存のモデルは、数十から数百のラベル付きサンプルを必要とすることで悪名高い。対照的に、人間は少数の、あるいは一つの例から視覚的な関係を学ぶことができる。これに触発されて,ワンショットシーングラフ生成 (one-shot scene graph generation) というタスクを設計し,それぞれの関係がトリプルトする(例えば,"dog-has-head")。重要な洞察は、スクラッチから学ぶのではなく、豊富な事前知識を活用できるということです。本稿では,ワンショットシーングラフ生成タスクにおいて,複数の構造化知識(関係知識と常識知識)を提案する。特に、関係知識は、視覚コンテンツから抽出されたエンティティ間の関係に関する事前の知識を表しており、例えば、視覚的な関係性は、"dog" と "yard" の間に存在し、commonsense の知識は "dog can guard yard" のように "sense-making" の知識をエンコードしている。これらの2種類の知識をグラフ構造に整理することにより、グラフ畳み込みネットワーク(GCN)は、エンティティの知識埋め込みセマンティック特徴を抽出する。さらに、Faster R-CNNによって生成された各エンティティから分離された視覚的特徴を抽出する代わりに、インスタンス関係トランスフォーマーエンコーダを使用してコンテキスト情報を完全に探索する。構築したワンショットデータセットに基づいて,実験結果から,既存の最先端手法を大きなマージンで大幅に上回ることを示す。アブレーション研究はまた、インスタンス関係変換器エンコーダと多重構造知識の有効性を検証する。

関連論文リスト

Knowledge-augmented Few-shot Visual Relation Detection [25.457693302327637]
視覚的関係検出(VRD)は、画像理解のためのオブジェクト間の関係を検出することを目的としている。既存のVRD手法の多くは、良好なパフォーマンスを達成するために、各関係の何千ものトレーニングサンプルに依存している。我々は、テキスト知識と視覚的関係知識の両方を活用する、知識を付加した、数発のVRDフレームワークを考案する。
論文参考訳（メタデータ） (2023-03-09T15:38:40Z)
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning [61.57887011165744]
マルチモーダルトランスフォーマーはVisual Commonsense Reasoningのタスクにおいて大きな進歩を遂げた。視覚的なシーングラフを常識的推論に組み込むためのScene Graph Enhanced Image-Text Learningフレームワークを提案する。
論文参考訳（メタデータ） (2021-12-16T03:16:30Z)
Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph [96.95815946327079]
名前付きエンティティの長期分布により、名前付きエンティティと視覚的キューの関係を学習することは困難である。本稿では、視覚オブジェクトと名前付きエンティティを関連付けるために、マルチモーダルな知識グラフを構築する新しいアプローチを提案する。
論文参考訳（メタデータ） (2021-07-26T05:50:41Z)
Entity Context Graph: Learning Entity Representations fromSemi-Structured Textual Sources on the Web [44.92858943475407]
エンティティ中心のテキスト知識ソースを処理してエンティティ埋め込みを学ぶアプローチを提案する。私たちのアプローチから学んだ埋め込みは、(i)高品質で、既知の知識グラフベースの埋め込みに匹敵し、それらをさらに改善するために使用することができます。
論文参考訳（メタデータ） (2021-03-29T20:52:14Z)
Learning Graph Embeddings for Compositional Zero-shot Learning [73.80007492964951]
合成ゼロショット学習では、観察された視覚的原始状態の見えない構成を認識することが目的である。本稿では,画像特徴と視覚的プリミティブの潜在表現をエンドツーエンドに学習するCGEという新しいグラフ定式化を提案する。概念間のセマンティクスを符号化する共同互換性を学習することにより、WordNetのような外部知識ベースに頼ることなく、構成を見えないように一般化することができる。
論文参考訳（メタデータ） (2021-02-03T10:11:03Z)
Learning to Represent Image and Text with Denotation Graph [32.417311523031195]
本稿では,画像とテキスト間の暗黙的・視覚的接地表現の集合から学習表現を提案する。得られた構造的関係を利用して,最先端のマルチモーダル学習モデルをさらに改良できることを示す。
論文参考訳（メタデータ） (2020-10-06T18:00:58Z)
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning [73.0598186896953]
本稿では、知識グラフからのガイダンスを用いて、生テキスト上で学習する2つの自己教師型タスクを提案する。エンティティレベルのマスキング言語モデルに基づいて、最初のコントリビューションはエンティティマスキングスキームです。既存のパラダイムとは対照的に,本手法では事前学習時にのみ,知識グラフを暗黙的に使用する。
論文参考訳（メタデータ） (2020-04-29T14:22:42Z)
Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
本稿では,2つのグラフ間の情報伝達を反復的に行う新しいグラフベースニューラルネットワークを提案する。我々のグラフブリッジネットワークであるGB-Netは、エッジとノードを連続的に推論し、相互接続されたシーンとコモンセンスグラフのリッチでヘテロジニアスな構造を同時に活用し、洗練する。
論文参考訳（メタデータ） (2020-01-07T23:35:52Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。