Biasing Like Human: A Cognitive Bias Framework for Scene Graph
Generation
- URL: http://arxiv.org/abs/2203.09160v1
- Date: Thu, 17 Mar 2022 08:29:52 GMT
- Title: Biasing Like Human: A Cognitive Bias Framework for Scene Graph
Generation
- Authors: Xiaoguang Chang, Teng Wang, Changyin Sun and Wenzhe Cai
- Abstract summary: We propose a novel 3-paradigms framework that simulates how humans incorporate the label linguistic features as guidance of vision-based representations.
Our framework is model-agnostic to any scene graph model.
- Score: 20.435023745201878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene graph generation is a sophisticated task because there is no specific
recognition pattern (e.g., "looking at" and "near" have no conspicuous
difference concerning vision, whereas "near" could occur between entities with
different morphology). Thus some scene graph generation methods are trapped
into most frequent relation predictions caused by capricious visual features
and trivial dataset annotations. Therefore, recent works emphasized the
"unbiased" approaches to balance predictions for a more informative scene
graph. However, human's quick and accurate judgments over relations between
numerous objects should be attributed to "bias" (i.e., experience and
linguistic knowledge) rather than pure vision. To enhance the model capability,
inspired by the "cognitive bias" mechanism, we propose a novel 3-paradigms
framework that simulates how humans incorporate the label linguistic features
as guidance of vision-based representations to better mine hidden relation
patterns and alleviate noisy visual propagation. Our framework is
model-agnostic to any scene graph model. Comprehensive experiments prove our
framework outperforms baseline modules in several metrics with minimum
parameters increment and achieves new SOTA performance on Visual Genome
dataset.
Related papers
- Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Graph Self-supervised Learning with Accurate Discrepancy Learning [64.69095775258164]
We propose a framework that aims to learn the exact discrepancy between the original and the perturbed graphs, coined as Discrepancy-based Self-supervised LeArning (D-SLA)
We validate our method on various graph-related downstream tasks, including molecular property prediction, protein function prediction, and link prediction tasks, on which our model largely outperforms relevant baselines.
arXiv Detail & Related papers (2022-02-07T08:04:59Z) - Neural Belief Propagation for Scene Graph Generation [31.9682610869767]
We propose a novel neural belief propagation method to generate the resulting scene graph.
It employs a structural Bethe approximation rather than the mean field approximation to infer the associated marginals.
It achieves the state-of-the-art performance on various popular scene graph generation benchmarks.
arXiv Detail & Related papers (2021-12-10T18:30:27Z) - ExplaGraphs: An Explanation Graph Generation Task for Structured
Commonsense Reasoning [65.15423587105472]
We present a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction.
Specifically, given a belief and an argument, a model has to predict whether the argument supports or counters the belief and also generate a commonsense-augmented graph that serves as non-trivial, complete, and unambiguous explanation for the predicted stance.
A significant 83% of our graphs contain external commonsense nodes with diverse structures and reasoning depths.
arXiv Detail & Related papers (2021-04-15T17:51:36Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - Generative Compositional Augmentations for Scene Graph Prediction [27.535630110794855]
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language.
We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution.
We propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs.
arXiv Detail & Related papers (2020-07-11T12:11:53Z) - Unbiased Scene Graph Generation via Rich and Fair Semantic Extraction [42.37557498737781]
We propose a new and simple architecture named Rich and Fair semantic extraction network (RiFa)
RiFa predicts subject-object relations based on both the visual and semantic features of entities under certain contextual area.
Experiments on the popular Visual Genome dataset show that RiFa achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-02-01T09:28:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.