Compositional Feature Augmentation for Unbiased Scene Graph Generation
- URL: http://arxiv.org/abs/2308.06712v1
- Date: Sun, 13 Aug 2023 08:02:14 GMT
- Title: Compositional Feature Augmentation for Unbiased Scene Graph Generation
- Authors: Lin Li, Guikun Chen, Jun Xiao, Yi Yang, Chunping Wang, Long Chen
- Abstract summary: Scene Graph Generation (SGG) aims to detect all the visual relation triplets sub, pred, obj> in a given image.
Due to the ubiquitous long-tailed predicate distributions, today's SGG models are still easily biased to the head predicates.
We propose a novel Compositional Feature Augmentation (CFA) strategy, which is the first unbiased SGG work to mitigate the bias issue.
- Score: 28.905732042942066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Graph Generation (SGG) aims to detect all the visual relation triplets
<sub, pred, obj> in a given image. With the emergence of various advanced
techniques for better utilizing both the intrinsic and extrinsic information in
each relation triplet, SGG has achieved great progress over the recent years.
However, due to the ubiquitous long-tailed predicate distributions, today's SGG
models are still easily biased to the head predicates. Currently, the most
prevalent debiasing solutions for SGG are re-balancing methods, e.g., changing
the distributions of original training samples. In this paper, we argue that
all existing re-balancing strategies fail to increase the diversity of the
relation triplet features of each predicate, which is critical for robust SGG.
To this end, we propose a novel Compositional Feature Augmentation (CFA)
strategy, which is the first unbiased SGG work to mitigate the bias issue from
the perspective of increasing the diversity of triplet features. Specifically,
we first decompose each relation triplet feature into two components: intrinsic
feature and extrinsic feature, which correspond to the intrinsic
characteristics and extrinsic contexts of a relation triplet, respectively.
Then, we design two different feature augmentation modules to enrich the
feature diversity of original relation triplets by replacing or mixing up
either their intrinsic or extrinsic features from other samples. Due to its
model-agnostic nature, CFA can be seamlessly incorporated into various SGG
frameworks. Extensive ablations have shown that CFA achieves a new
state-of-the-art performance on the trade-off between different metrics.
Related papers
- Leveraging Predicate and Triplet Learning for Scene Graph Generation [31.09787444957997]
Scene Graph Generation (SGG) aims to identify entities and predict the relationship triplets.
We propose a Dual-granularity Relation Modeling (DRM) network to leverage fine-grained triplet cues besides the coarse-grained predicate ones.
Our method establishes new state-of-the-art performance on Visual Genome, Open Image, and GQA datasets.
arXiv Detail & Related papers (2024-06-04T07:23:41Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Towards Unseen Triples: Effective Text-Image-joint Learning for Scene
Graph Generation [30.79358827005448]
Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images.
Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets.
We propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of the SGG models.
arXiv Detail & Related papers (2023-06-23T10:17:56Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Rethinking the Evaluation of Unbiased Scene Graph Generation [31.041074897404236]
Scene Graph Generation (SGG) methods tend to predict frequent predicate categories and fail to recognize rare ones.
Recent research has focused on unbiased SGG and adopted mean Recall@K as the main evaluation metric.
We propose two complementary evaluation metrics for unbiased SGG: Independent Mean Recall (IMR) and weighted IMR (wIMR)
arXiv Detail & Related papers (2022-08-03T08:23:51Z) - Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph.
We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction.
We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z) - Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation [48.21846438269506]
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects.
Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data.
We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
arXiv Detail & Related papers (2021-11-26T14:34:12Z) - Semantic Compositional Learning for Low-shot Scene Graph Generation [122.51930904132685]
Many scene graph generation (SGG) models solely use the limited annotated relation triples for training.
We propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples.
For three recent SGG models, adding our strategy improves their performance by close to 50%, and all of them substantially exceed the current state-of-the-art.
arXiv Detail & Related papers (2021-08-19T10:13:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.