GINet: Graph Interaction Network for Scene Parsing
- URL: http://arxiv.org/abs/2009.06160v1
- Date: Mon, 14 Sep 2020 02:52:45 GMT
- Title: GINet: Graph Interaction Network for Scene Parsing
- Authors: Tianyi Wu, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, Guodong
Guo
- Abstract summary: We propose a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss) to promote context reasoning over image regions.
The proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.
- Score: 58.394591509215005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, context reasoning using image regions beyond local convolution has
shown great potential for scene parsing. In this work, we explore how to
incorporate the linguistic knowledge to promote context reasoning over image
regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context
Loss (SC-loss). The GI unit is capable of enhancing feature representations of
convolution networks over high-level semantics and learning the semantic
coherency adaptively to each sample. Specifically, the dataset-based linguistic
knowledge is first incorporated in the GI unit to promote context reasoning
over the visual graph, then the evolved representations of the visual graph are
mapped to each local representation to enhance the discriminated capability for
scene parsing. GI unit is further improved by the SC-loss to enhance the
semantic representations over the exemplar-based semantic graph. We perform
full ablation studies to demonstrate the effectiveness of each component in our
approach. Particularly, the proposed GINet outperforms the state-of-the-art
approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.
Related papers
- Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Visual Semantic Parsing: From Images to Abstract Meaning Representation [20.60579156219413]
We propose to leverage a widely-used meaning representation in the field of natural language processing, the Abstract Meaning Representation (AMR)
Our visual AMR graphs are more linguistically informed, with a focus on higher-level semantic concepts extrapolated from visual input.
Our findings point to important future research directions for improved scene understanding.
arXiv Detail & Related papers (2022-10-26T17:06:42Z) - Consensus Graph Representation Learning for Better Grounded Image
Captioning [48.208119537050166]
We propose the Consensus Rraph Representation Learning framework (CGRL) for grounded image captioning.
We validate the effectiveness of our model, with a significant decline in object hallucination (-9% CHAIRi) on the Flickr30k Entities dataset.
arXiv Detail & Related papers (2021-12-02T04:17:01Z) - Exploring Explicit and Implicit Visual Relationships for Image
Captioning [11.82805641934772]
In this paper, we explore explicit and implicit visual relationships to enrich region-level representations for image captioning.
Explicitly, we build semantic graph over object pairs and exploit gated graph convolutional networks (Gated GCN) to selectively aggregate local neighbors' information.
Implicitly, we draw global interactions among the detected objects through region-based bidirectional encoder representations from transformers.
arXiv Detail & Related papers (2021-05-06T01:47:51Z) - Disentangled Motif-aware Graph Learning for Phrase Grounding [48.64279161780489]
We propose a novel graph learning framework for phrase grounding in the image.
We devise the disentangled graph network to integrate the motif-aware contextual information into representations.
Our model achieves state-of-the-art performance on Flickr30K Entities and ReferIt Game benchmarks.
arXiv Detail & Related papers (2021-04-13T08:20:07Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z) - Learning Physical Graph Representations from Visual Scenes [56.7938395379406]
Physical Scene Graphs (PSGs) represent scenes as hierarchical graphs with nodes corresponding intuitively to object parts at different scales, and edges to physical connections between parts.
PSGNet augments standard CNNs by including: recurrent feedback connections to combine low and high-level image information; graph pooling and vectorization operations that convert spatially-uniform feature maps into object-centric graph structures.
We show that PSGNet outperforms alternative self-supervised scene representation algorithms at scene segmentation tasks.
arXiv Detail & Related papers (2020-06-22T16:10:26Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.