Transformer-based Dual Relation Graph for Multi-label Image Recognition
- URL: http://arxiv.org/abs/2110.04722v2
- Date: Tue, 12 Oct 2021 02:09:17 GMT
- Title: Transformer-based Dual Relation Graph for Multi-label Image Recognition
- Authors: Jiawei Zhao, Ke Yan, Yifan Zhao, Xiaowei Guo, Feiyue Huang, Jia Li
- Abstract summary: We propose a novel Transformer-based Dual Relation learning framework.
We explore two aspects of correlation, i.e., structural relation graph and semantic relation graph.
Our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks.
- Score: 56.12543717723385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The simultaneous recognition of multiple objects in one image remains a
challenging task, spanning multiple events in the recognition field such as
various object scales, inconsistent appearances, and confused inter-class
relationships. Recent research efforts mainly resort to the statistic label
co-occurrences and linguistic word embedding to enhance the unclear semantics.
Different from these researches, in this paper, we propose a novel
Transformer-based Dual Relation learning framework, constructing complementary
relationships by exploring two aspects of correlation, i.e., structural
relation graph and semantic relation graph. The structural relation graph aims
to capture long-range correlations from object context, by developing a
cross-scale transformer-based architecture. The semantic graph dynamically
models the semantic meanings of image objects with explicit semantic-aware
constraints. In addition, we also incorporate the learnt structural
relationship into the semantic graph, constructing a joint relation graph for
robust representations. With the collaborative learning of these two effective
relation graphs, our approach achieves new state-of-the-art on two popular
multi-label recognition benchmarks, i.e., MS-COCO and VOC 2007 dataset.
Related papers
- Dual Relation Alignment for Composed Image Retrieval [24.812654620141778]
We argue for the existence of two types of relations in composed image retrieval.
The explicit relation pertains to the reference image & complementary text-target image.
We propose a new framework for composed image retrieval, termed dual relation alignment.
arXiv Detail & Related papers (2023-09-05T12:16:14Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Modelling Multi-relations for Convolutional-based Knowledge Graph
Embedding [0.2752817022620644]
It is considered that such approaches disconnect the semantic connection of multi-relations between an entity pair.
We propose a convolutional and multi-relational learning model, ConvMR.
We show that ConvMR is efficient to deal with less frequent entities.
arXiv Detail & Related papers (2022-10-21T03:43:06Z) - Unsupervised Multimodal Change Detection Based on Structural
Relationship Graph Representation Learning [40.631724905575034]
Unsupervised multimodal change detection is a practical and challenging topic that can play an important role in time-sensitive emergency applications.
We take advantage of two types of modality-independent structural relationships in multimodal images.
We present a structural relationship graph representation learning framework for measuring the similarity of the two structural relationships.
arXiv Detail & Related papers (2022-10-03T13:55:08Z) - Scenes and Surroundings: Scene Graph Generation using Relation
Transformer [13.146732454123326]
This work proposes a novel local-context aware architecture named relation transformer.
Our hierarchical multi-head attention-based approach efficiently captures contextual dependencies between objects and predicts their relationships.
In comparison to state-of-the-art approaches, we have achieved an overall mean textbf4.85% improvement.
arXiv Detail & Related papers (2021-07-12T14:22:20Z) - Tensor Composition Net for Visual Relationship Prediction [115.14829858763399]
We present a novel Composition Network (TCN) to predict visual relationships in images.
The key idea of our TCN is to exploit the low rank property of the visual relationship tensor.
We show our TCN's image-level visual relationship prediction provides a simple and efficient mechanism for relation-based image retrieval.
arXiv Detail & Related papers (2020-12-10T06:27:20Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z) - Tensor Graph Convolutional Networks for Multi-relational and Robust
Learning [74.05478502080658]
This paper introduces a tensor-graph convolutional network (TGCN) for scalable semi-supervised learning (SSL) from data associated with a collection of graphs, that are represented by a tensor.
The proposed architecture achieves markedly improved performance relative to standard GCNs, copes with state-of-the-art adversarial attacks, and leads to remarkable SSL performance over protein-to-protein interaction networks.
arXiv Detail & Related papers (2020-03-15T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.