Integrating Object-aware and Interaction-aware Knowledge for Weakly
Supervised Scene Graph Generation
- URL: http://arxiv.org/abs/2208.01834v1
- Date: Wed, 3 Aug 2022 04:20:17 GMT
- Title: Integrating Object-aware and Interaction-aware Knowledge for Weakly
Supervised Scene Graph Generation
- Authors: Xingchen Li, Long Chen, Wenbo Ma, Yi Yang and Jun Xiao
- Abstract summary: We argue that most existing WSSGG works only focus on object-consistency.
We propose to enhance a simple grounding module with both object-aware and interaction-aware knowledge.
Our method consistently improves WSSGG performance on various kinds of weak supervision.
- Score: 33.15624351965304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, increasing efforts have been focused on Weakly Supervised Scene
Graph Generation (WSSGG). The mainstream solution for WSSGG typically follows
the same pipeline: they first align text entities in the weak image-level
supervisions (e.g., unlocalized relation triplets or captions) with image
regions, and then train SGG models in a fully-supervised manner with aligned
instance-level "pseudo" labels. However, we argue that most existing WSSGG
works only focus on object-consistency, which means the grounded regions should
have the same object category label as text entities. While they neglect
another basic requirement for an ideal alignment: interaction-consistency,
which means the grounded region pairs should have the same interactions (i.e.,
visual relations) as text entity pairs. Hence, in this paper, we propose to
enhance a simple grounding module with both object-aware and interaction-aware
knowledge to acquire more reliable pseudo labels. To better leverage these two
types of knowledge, we regard them as two teachers and fuse their generated
targets to guide the training process of our grounding module. Specifically, we
design two different strategies to adaptively assign weights to different
teachers by assessing their reliability on each training sample. Extensive
experiments have demonstrated that our method consistently improves WSSGG
performance on various kinds of weak supervision.
Related papers
- Taking A Closer Look at Interacting Objects: Interaction-Aware Open Vocabulary Scene Graph Generation [16.91119080704441]
We propose an interaction-aware OVSGG framework INOVA.
During pre-training, INOVA employs an interaction-aware target generation strategy to distinguish interacting objects from non-interacting ones.
INOVA is equipped with an interaction-consistent knowledge distillation to enhance the robustness by pushing interacting object pairs away from the background.
arXiv Detail & Related papers (2025-02-06T08:18:06Z) - SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition [71.90536979421093]
We propose a Split-and-Synthesize Prompting with Gated Alignments (SSPA) framework to amplify the potential of Vision-Language Models (VLMs)
We develop an in-context learning approach to associate the inherent knowledge from LLMs.
Then we propose a novel Split-and-Synthesize Prompting (SSP) strategy to first model the generic knowledge and downstream label semantics individually.
arXiv Detail & Related papers (2024-07-30T15:58:25Z) - Focus on Your Target: A Dual Teacher-Student Framework for
Domain-adaptive Semantic Segmentation [210.46684938698485]
We study unsupervised domain adaptation (UDA) for semantic segmentation.
We find that, by decreasing/increasing the proportion of training samples from the target domain, the 'learning ability' is strengthened/weakened.
We propose a novel dual teacher-student (DTS) framework and equip it with a bidirectional learning strategy.
arXiv Detail & Related papers (2023-03-16T05:04:10Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation [65.78472854070316]
We propose a novel NoIsy label CorrEction and Sample Training strategy for SGG: NICEST.
NICE first detects noisy samples and then reassigns them more high-quality soft predicate labels.
NICEST can be seamlessly incorporated into any SGG architecture to boost its performance on different predicate categories.
arXiv Detail & Related papers (2022-07-27T06:25:47Z) - Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph.
We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction.
We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z) - Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation [48.21846438269506]
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects.
Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data.
We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
arXiv Detail & Related papers (2021-11-26T14:34:12Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.