Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation
- URL: http://arxiv.org/abs/2203.09811v1
- Date: Fri, 18 Mar 2022 09:14:13 GMT
- Title: Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation
- Authors: Xingning Dong, Tian Gan, Xuemeng Song, Jianlong Wu, Yuan Cheng,
Liqiang Nie
- Abstract summary: Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph.
We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction.
We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
- Score: 62.96628432641806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene Graph Generation, which generally follows a regular encoder-decoder
pipeline, aims to first encode the visual contents within the given image and
then parse them into a compact summary graph. Existing SGG approaches generally
not only neglect the insufficient modality fusion between vision and language,
but also fail to provide informative predicates due to the biased relationship
predictions, leading SGG far from practical. Towards this end, in this paper,
we first present a novel Stacked Hybrid-Attention network, which facilitates
the intra-modal refinement as well as the inter-modal interaction, to serve as
the encoder. We then devise an innovative Group Collaborative Learning strategy
to optimize the decoder. Particularly, based upon the observation that the
recognition capability of one classifier is limited towards an extremely
unbalanced dataset, we first deploy a group of classifiers that are expert in
distinguishing different subsets of classes, and then cooperatively optimize
them from two aspects to promote the unbiased SGG. Experiments conducted on VG
and GQA datasets demonstrate that, we not only establish a new state-of-the-art
in the unbiased metric, but also nearly double the performance compared with
two baselines.
Related papers
- Deep Contrastive Graph Learning with Clustering-Oriented Guidance [61.103996105756394]
Graph Convolutional Network (GCN) has exhibited remarkable potential in improving graph-based clustering.
Models estimate an initial graph beforehand to apply GCN.
Deep Contrastive Graph Learning (DCGL) model is proposed for general data clustering.
arXiv Detail & Related papers (2024-02-25T07:03:37Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Vision Relation Transformer for Unbiased Scene Graph Generation [31.29954125135073]
Current Scene Graph Generation (SGG) methods suffer from an information loss regarding the entities local-level cues during the relation encoding process.
We introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder.
We show that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.
arXiv Detail & Related papers (2023-08-18T11:15:31Z) - Line Graph Contrastive Learning for Link Prediction [4.876567687745239]
We propose a Line Graph Contrastive Learning (LGCL) method to obtain multiview information.
With experiments on six public datasets, LGCL outperforms current benchmarks on link prediction tasks.
arXiv Detail & Related papers (2022-10-25T06:57:00Z) - Interpolation-based Correlation Reduction Network for Semi-Supervised
Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN)
In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries.
By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z) - GraphCoCo: Graph Complementary Contrastive Learning [65.89743197355722]
Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations.
This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue.
arXiv Detail & Related papers (2022-03-24T02:58:36Z) - Structured Sparse R-CNN for Direct Scene Graph Generation [16.646937866282922]
This paper presents a simple, sparse, and unified framework for relation detection, termed as Structured Sparse R-CNN.
The key to our method is a set of learnable triplet queries and structured triplet detectors which could be optimized jointly from the training set in an end-to-end manner.
We perform experiments on two benchmarks: Visual Genome and Open Images, and the results demonstrate that our method achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-06-21T02:24:20Z) - Deepened Graph Auto-Encoders Help Stabilize and Enhance Link Prediction [11.927046591097623]
Link prediction is a relatively under-studied graph learning task, with current state-of-the-art models based on one- or two-layers of shallow graph auto-encoder (GAE) architectures.
In this paper, we focus on addressing a limitation of current methods for link prediction, which can only use shallow GAEs and variational GAEs.
Our proposed methods innovatively incorporate standard auto-encoders (AEs) into the architectures of GAEs, where standard AEs are leveraged to learn essential, low-dimensional representations via seamlessly integrating the adjacency information and node features
arXiv Detail & Related papers (2021-03-21T14:43:10Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Adaptive Graph Convolutional Network with Attention Graph Clustering for
Co-saliency Detection [35.23956785670788]
We present a novel adaptive graph convolutional network with attention graph clustering (GCAGC)
We develop an attention graph clustering algorithm to discriminate the common objects from all the salient foreground objects in an unsupervised fashion.
We evaluate our proposed GCAGC method on three cosaliency detection benchmark datasets.
arXiv Detail & Related papers (2020-03-13T09:35:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.