CAME: Context-aware Mixture-of-Experts for Unbiased Scene Graph
Generation
- URL: http://arxiv.org/abs/2208.07109v1
- Date: Mon, 15 Aug 2022 10:39:55 GMT
- Title: CAME: Context-aware Mixture-of-Experts for Unbiased Scene Graph
Generation
- Authors: Liguang Zhou, Yuhongze Zhou, Tin Lun Lam, Yangsheng Xu
- Abstract summary: We present a simple yet effective method called Context-Aware Mixture-of-Experts (CAME) to improve the model diversity and alleviate the biased scene graph generator.
We have conducted extensive experiments on three tasks on the Visual Genome dataset to show that came achieved superior performance over previous methods.
- Score: 10.724516317292926
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The scene graph generation has gained tremendous progress in recent years.
However, its intrinsic long-tailed distribution of predicate classes is a
challenging problem. Almost all existing scene graph generation (SGG) methods
follow the same framework where they use a similar backbone network for object
detection and a customized network for scene graph generation. These methods
often design the sophisticated context-encoder to extract the inherent
relevance of scene context w.r.t the intrinsic predicates and complicated
networks to improve the learning capabilities of the network model for highly
imbalanced data distributions. To address the unbiased SGG problem, we present
a simple yet effective method called Context-Aware Mixture-of-Experts (CAME) to
improve the model diversity and alleviate the biased SGG without a
sophisticated design. Specifically, we propose to use the mixture of experts to
remedy the heavily long-tailed distributions of predicate classes, which is
suitable for most unbiased scene graph generators. With a mixture of relation
experts, the long-tailed distribution of predicates is addressed in a divide
and ensemble manner. As a result, the biased SGG is mitigated and the model
tends to make more balanced predicates predictions. However, experts with the
same weight are not sufficiently diverse to discriminate the different levels
of predicates distributions. Hence, we simply use the build-in context-aware
encoder, to help the network dynamically leverage the rich scene
characteristics to further increase the diversity of the model. By utilizing
the context information of the image, the importance of each expert w.r.t the
scene context is dynamically assigned. We have conducted extensive experiments
on three tasks on the Visual Genome dataset to show that came achieved superior
performance over previous methods.
Related papers
- Ensemble Predicate Decoding for Unbiased Scene Graph Generation [40.01591739856469]
Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that captures semantic information of a given scenario.
The model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias.
This paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation.
arXiv Detail & Related papers (2024-08-26T11:24:13Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Unbiased Scene Graph Generation in Videos [36.889659781604564]
We introduce TEMPURA: TEmporal consistency and Memory-guided UnceRtainty Attenuation for unbiased dynamic SGG.
TEMPURA employs object-level temporal consistencies via transformer sequence modeling, learns to synthesize unbiased relationship representations.
Our method achieves significant (up to 10% in some cases) performance gain over existing methods.
arXiv Detail & Related papers (2023-04-03T06:10:06Z) - Visually-Prompted Language Model for Fine-Grained Scene Graph Generation
in an Open World [67.03968403301143]
Scene Graph Generation (SGG) aims to extract subject, predicate, object> relationships in images for vision understanding.
Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions.
We propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates.
arXiv Detail & Related papers (2023-03-23T13:06:38Z) - Peer Learning for Unbiased Scene Graph Generation [16.69329808479805]
We propose a novel framework dubbed peer learning to deal with the problem of biased scene graph generation (SGG)
This framework uses predicate sampling and consensus voting (PSCV) to encourage different peers to learn from each other.
We have established a new state-of-the-art (SOTA) on the SGCls task by achieving a mean of bf31.6.
arXiv Detail & Related papers (2022-12-31T07:56:35Z) - Unbiased Scene Graph Generation using Predicate Similarities [7.9112365100345965]
Scene Graphs are widely applied in computer vision as a graphical representation of relationships between objects shown in images.
These applications have not yet reached a practical stage of development owing to biased training caused by long-tailed predicate distributions.
We propose a new classification scheme that branches the process to several fine-grained classifiers for similar predicate groups.
The results of extensive experiments on the Visual Genome dataset show that the combination of our method and an existing debiasing approach greatly improves performance on tail predicates in challenging SGCls/SGDet tasks.
arXiv Detail & Related papers (2022-10-03T13:28:01Z) - Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased
Scene Graph Generation [62.96628432641806]
Scene Graph Generation aims to first encode the visual contents within the given image and then parse them into a compact summary graph.
We first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction.
We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.
arXiv Detail & Related papers (2022-03-18T09:14:13Z) - A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE.
AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution.
Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z) - Graph Classification by Mixture of Diverse Experts [67.33716357951235]
We present GraphDIVE, a framework leveraging mixture of diverse experts for imbalanced graph classification.
With a divide-and-conquer principle, GraphDIVE employs a gating network to partition an imbalanced graph dataset into several subsets.
Experiments on real-world imbalanced graph datasets demonstrate the effectiveness of GraphDIVE.
arXiv Detail & Related papers (2021-03-29T14:03:03Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z) - Generative Compositional Augmentations for Scene Graph Prediction [27.535630110794855]
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language.
We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution.
We propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs.
arXiv Detail & Related papers (2020-07-11T12:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.